Multimedia architecture


When creating multimedia applications with maemo SDK, you need a basic understanding of how everything works in the actual device (Nokia 770) compared to the SDK (in x86 GNU/Linux workstation). This document explains these differences and suggest ways for applications to handle audio and video streams.

You can find an example of multimedia application development and more details about adding multimedia supports to the SDK in Getting started with multimedia development.

Nokia 770 architecture

Figure 1 illustrates the multimedia related architecture in a slightly simplified way.

Nokia 770 multimedia architecture

Figure 1. Nokia 770 multimedia architecture

The architecture consists of the following components:

  • Application stands for the software which uses the multimedia capabilities of the device.
  • Media Server is a daemon running in the device which is used by the platform to handle audio and video. Media Server is not included in the SDK, it exists only in the actual device where it provides media support for pre-installed software.
  • GStreamer is a media processing framework. Developers can write new plugins to add support for new formats to existing applications. GStreamer gives precise control over the stream processing pipeline (for example, for adding effects, seeking and media type detection) on the application level.
  • Plugins are GStreamer components, loadable libraries which provide elements to process audio and video streams.
  • Audio and Video sinks are those GStreamer elements that send media data to the hardware (through kernel driver) for playback.
  • libesd is a library for applications using Esound daemon directly. Through libesd, you can have simple raw PCM audio playback, and can also play cached samples (one-shot game sounds).
  • ESD is the Esound daemon which enables several simultaneous PCM audio streams to use the audio hardware (a software mixer).
  • libasound is the user-space library that the application uses to acces ALSA functionality. Through libasound, you can have simple raw PCM audio playback.
  • ALSA is the modern Linux sound driver system.
  • DSP gateway driver is a kernel driver between userspace code and hardware DSP.
  • Hildon UI is a graphical user interface designed for use on small mobile devices. It is built on top of GTK+.
  • GTK+ is a multi-platform toolkit for creating graphical user interfaces. Applications can use GTK+ directly in addition to using it through Hildon UI.
  • Xlib is a low-level library for displaying GUI elements.
  • X server is the part of the platform which handles drawing graphics on the screen. X server in maemo is optimised for embedded usage and the specific hardware platform.
  • Framebuffer is a memory area where displayable data is written.


Nokia 770 does not contain a separate audio card. The audio streams are forwarded to DSP which handles mixing and prioritising between streams. In addition to basic PCM audio stream, DSP can also receive encoded audio streams, such as MP3, AMR and AAC. Using the DSP saves valuable computing resources of the main processor and increases battery lifetime.

Table 1. Supported formats

Format Decoded with Description
PCM N/A Raw PCM audio
MP2 DSP MPEG audio layer-2
MP3 DSP MPEG audio layer-3
AAC DSP Advanced Audio Coding, only LC and LTP profiles supported
AMR-NB DSP Adaptive Multi-Rate narrowband
AMR-WB DSP Adaptive Multi-Rate wideband
IMA ADPCM CPU Adaptive Differential Pulse Code Modulation
G.711 a-law DSP ITU-T standard for audio companding
G.711 mu-law DSP ITU-T standard for audio companding
WAV - MP3 DSP MP3 audio in WAV container
WAV - PCM DSP PCM audio in WAV container
RM - RA10 DSP RealAudio in RealMedia container. Uses closed source software, no support in GStreamer.

If you are writing a multimedia application and want it to produce some audible output, you have three options: your application can use the GStreamer framework, the ALSA API or the Esound daemon. Whenever you need to handle different formats or do mixing and seeking, use GStreamer.

The Esound daemon will be replaced by ALSA in the future.


Beside audio, the DSP can also handle some encoded video formats while others are decoded with the main CPU. If you are writing a multimedia application and want it to handle video streams, do it by using the GStreamer framework. Demixing container formats is always done on the main CPU.

Table 2. Supported formats and combinations

Container Video format Audio format Video decoded with Description
RM RV10 RA10 CPU RealMedia. Uses closed source software, no support in GStreamer.

x86 SDK architecture

This section explains how the x86 SDK differs from the actual platform.

x86 SDK multimedia architecture

Figure 2. x86 SDK multimedia architecture

Since the x86 SDK does not have DSP nor any way to simulate it, all formats/streams are decoded using the main CPU.

If GStreamer is configured to use OSS or ALSA OSS emulation, osssink is the last audio element before decoded audio stream is sent to the kernel driver.

X server is the graphical windowing system running on the workstation. Usually XFree86 or XOrg.


In the actual device, DSP handles mixing the simultaneous streams and outputting the result to the hardware. This is where the SDK differs the most from the platform. GNU/Linux basically contain two different low-level interfaces to handle sound output, OSS and ALSA API. ALSA is not supported in the current SDK version.

OSS does not support multiple simultaneous sound sources, therefore it is not possible to use Esound daemon and generate audio with GStreamer at the same time.

Make sure that no process outside the SDK (outside Scratchbox) has exclusively reserved the audio hardware.

Table 3. Supported formats

Format Description
WAV - PCM PCM audio in WAV container


GStreamer is a framework that allows constructing graphs of media-handling components, ranging from simple audio playback to complex audio (mixing) and video (non-linear editing) processing. Applications can benefit of advances in codec and filter technology transparently. Developers can add new codecs and filters by writing plugins. For more information, see

Nokia 770 uses GStreamer version 0.10.X plus some device specific elements (for accessing the DSP).

For more information about GStreamer, see its documentation at:

Esound daemon

The Esound daemon is possibly removed in the future.

For more information about Esound, download the source tarball from The currently used version is 0.2.35 .

From the application's point of view, the Esound interface is identical in both the target platform and the SDK.


For more information about ALSA, see its documentation at:

From the application's point of view, the libasound interface is identical in both the target platform and the SDK.

Improve this page