Speech recognition

From OLPC
Revision as of 03:07, 17 August 2007 by Ricardo (talk | contribs) (Removed 'stub'. Added 'Overcoming the limitations of speech recognition software'. Tidied formatting.)
Jump to: navigation, search

This article is for collecting ideas and resources for using speech recognition on the XO.

Existing speech recognition software

While limited, there are FOSS (Free Open Source Software) Speech-Recognition packages which run on devices comparable to the XO, they may be sub-realtime, have limited vocabulary, non-continuous, and/or require training.

Using the embedded mic, rather than a higher quality plugin one, may be a challenge. And we are much more concerned with 'Localization' than is typical. But still, there are posiblilities we should explore. Talking to your XO could be very neat!

Overcoming the limitations of speech recognition software

A good training video

To allow interactive sessions, where children dictate text word-by-word and issue editing voice-commands, it would be useful to have a good training video. It would explain how to speak in a way that fits in with the limitations of the software (slowly with gaps), how to train the software, speak in a consistent way, etc.

Pre-processing the speech

For large passages of text, each sentence could be recorded as continuous speech, with no pauses between words. A sound-editing program would then be used to mark the boundaries between words and insert gaps. For example, "Passmeanorange" becomes "Pass me an orange". A general purpose sound-editing program could be used, but it would be quicker to create a specialized program that just needs one click to introduce a gap. It could also filter-out noise and normalize the volume.

Integrating the speech recognition software into the sound-editor

To minimize the time spent on sound pre-processing, the sound editor could have built-in speech recognition to check whether the sentence is recognizable yet, after each sound-processing action (insertion of gap, etc), so that people don't do more pre-processing work than is necessary.

Re-recording each sentence before recognition

If someone else's speech has to be processed, one option is to re-record each sentence in a clear voice. Someone who has already trained the software would listen to each sentence and re-record it in a clear voice, slowly, with gaps between words. The software should then make a better job of recognizing it.

For example, if a child records a member of the community telling a story and they want to turn this into text, then speech recognition software may have problems. The software hasn't been trained on that person's voice, it may be fast, continuous speech with no gaps between words, a heavy accent, old and croaky voice, have background chatter, etc. Re-recording each sentence may solve the problem.

So that every child doesn't have to spend ages training the software for their voice or learn about the limitations of speech recognition software, just one or two children in a class or some volunteers on the internet could act as a 'speech recognition bureau'.

--Ricardo 03:07, 17 August 2007 (EDT)

Resources