Maemo has no audio UI (AUI)

Posted on 2010-02-22 18:29 UTC by David Falkayn. Status: Under consideration, Categories: User Experience.

Executive summary: Maemo has no text-to-voice or built-in voice command ("handsfree") functionality.

Most advanced phones and all consumer computer operating systems have some form of voice recognition capability for the seeing-impaired and for cases where the graphical user interface (GUI) can't be used (e.g. when driving). There are also many cases where text-to-speech (computer speech) is available.

Maemo has few or none of these features.

A complete solution to this problem would be a core AUI that provides an API that all apps can use to support some version of spoken user interface and uses context to maintain the language model at a manageable size. This is complicated by the lack of acoustic or language models for all the languages Maemo supports, the huge amount of work that it would take to add it to all the core apps, and the complexities of spoken language itself. Due to these factors, this is unlikely to ever be implemented by Nokia as a core feature.

Let's find a less extreme but still useful solution that can reasonably be implemented.

Talk thread

GSoC Proposal

Solutions for this brainstorm


Solution #1: Handsfree phone mode and text-to-speech app

Posted on 2010-02-22 18:29 UTC by David Falkayn.

A minimal solution would be handsfree mode for the phone and a text-to-speech app for English and other common languages.This wouldn't need much language understanding and could possibly be language-independent in the recognition component (needing only to try to understand and speak names).

Surely we can do better than this...


Solution #2: Pocket Jeeves: interactive voice commands and message reader

Posted on 2010-02-22 18:51 UTC by David Falkayn.

A reasonable solution would be an app, consisting of daemon and settings UI, that responds to limited voice commands after a trigger is pressed (headset button, camera button, etc.) to control common core apps (phone and media player), with similarly limited text-to-speech support integrated with other core apps (media player (playlist, etc), SMS, IM, email). English support is probably the best bet for first implementation as it is the easiest due to available models and has a wide user base.

This can be done with components presently available. I have compiled pocket sphinx for Maemo 5 and it works on the N900. Flite is in the repos. Dbus can handle the necessary control of the core apps (i believe) without any modifications.

This has been submitted as a proposal for the 2010 Google Summer of Code (GSoC).

An API to allow new apps to register for support by the AUI would be nice.


Solution #3: Implement the google voice to test protocol

Posted on 2010-02-23 13:45 UTC by Zaheer Merali.

From network dumps, capture the protocol used by google on their android to send audio to them and get it translated to test. Implement this on Maemo 5.

Latest activities to brainstorm Maemo has no audio UI (AUI)