Speech synthesis: Difference between revisions
Bradpaulsen (talk | contribs) |
|||
Line 74: | Line 74: | ||
*[[Speech recognition]] |
*[[Speech recognition]] |
||
*[[Shtooka Project]] |
*[[Shtooka Project]] |
||
*[[Speak]] A simple but cute activity which animates a face as it reads the words typed by the child |
|||
*[[Talkntype]] Initial draft of an activity based on the Speak&Spell toy, using eSpeak speech synthesis. |
*[[Talkntype]] Initial draft of an activity based on the Speak&Spell toy, using eSpeak speech synthesis. |
||
*[[Free_speech|Free Speech]] Speech recognition and systhesis implemented as D-bus services using the Accessibility Toolkit Service Provider Interface (AT-SPI) |
*[[Free_speech|Free Speech]] Speech recognition and systhesis implemented as D-bus services using the Accessibility Toolkit Service Provider Interface (AT-SPI) |
Revision as of 06:23, 3 February 2008
Scope
This article is for collecting ideas and resources for using text-to-speech (TTS) speech synthesis on the XO.
eSpeak
eSpeak is currently included on the xo. .. But does not work directly to the sound card since the XO uses ALSA instead of OSS as its main Sound System,and enabling OSS Emulation in ALSA is not yet the default. Manually configuring your XO to emulate OSS in ALSA will provide the system devices that you require and allow full espeak functionality - Dking
If you are lacking OSS Emulation on your XO's sound sytstem setup in ALSA, some text can be played by piping espeak's standard output to another file:
$ espeak --stdout "Ello world." | gst-launch fdsrc fd=0 ! wavparse ! alsasink $ espeak --stdout -vpt "Bem-vindo ao wiki da OLPC" | gst-launch fdsrc fd=0 ! wavparse ! alsasink $ espeak --stdout "Using aplay." | aplay -
However, for some initial sounds, espeak fails to output valid audio to standard out. This includes letters c, h, k, p, q, t, v, z and possibly others. For example, this won't work: **this seems to work these days**
$ espeak --stdout "hello world." | aplay
A workaround is to first write the output to a file, then play back the file:
$ espeak -w temp.wav "hello world."; aplay temp.wav
See the bug ticket for more information: http://dev.laptop.org/ticket/4002
Screen Reader is a DBus interface that allows the XO to use eSpeak via Python.
- http://espeak.sourceforge.net/languages.html
- http://sourceforge.net/forum/forum.php?thread_id=1679272&forum_id=538920 Improving the Brasilian portuguese voice.
Festival
- http://festvox.org/festival/ multi-lingual speech synthesis
- http://www.speech.cs.cmu.edu/flite/ Festival-lite is a small, fast run-time synthesis engine.
- http://festlang.berlios.de/ wiki
- http://festvox.org/ building of new synthetic voices
- http://tcts.fpms.ac.be/synthesis/mbrola.html The MBROLA Project - Towards a Freely Available Multilingual Speech Synthesizer
Flite is not currently included on the xo. Unless that changes, it would have to come out of your activity's space budget.
First, run /sbin/init 3 so yum doesn't run out of memory. After yum, reboot. $ yum install flite $ flite -t 'Hello, world!'
- Does it always sound this bad, or is just the default voice that works poorly? MitchellNCharity 16:42, 22 October 2007 (EDT)
Festival is not currently included on the xo. Unless that changes, it would have to come out of your activity's space budget.
First, run /sbin/init 3 so yum doesn't run out of memory. After yum, reboot. $ yum install festival $ echo 'Hello, world!' | festival --tts
Existing software
There are FOSS Free Open Source Software Speech-Synthesis packages which run on devices comparable to the XO. We are much more concerned with localization than is typical. And dialects can be a political issue. But TTS would help with Accessibility. And could be very cool.
Speech synthesis has a set of complex tradoffs of synthesizer size versus fidelity versus effort to localize a new language. The Wikipedia speech synthesis article discusses software that is available, which includes festival, flite, and espeak.
Espeak is small enough for us to often bundle and covers quite a few languages: ~10 languages currently supported tuned by native speakers. Localization to ten more languages is underway.
Synthesis is essential for accessibility to content by people with vision problems, and will need to be integrated with the ATK library used, as well as literacy training, other uses as part of a GUI. Full localization therefore involves selection of a suitable synthesis system and integration into the ATK framework, along with localization of that system for the particular language involved.
Speech synthesis is usually not a good guide for pronunciation – but it may be better than a poor teacher who has never had the opportunity to learn from a native speaker of that language.
The state of the art
Commercial Text-To-Speech programs are getting very good now. The examples at the Digital Future Software Company site are very clear. They use AT&T technology and provide examples of Male and Female speech in English, French and Spanish. The XO needs open-source software that can approach this quality in a wide range of languages.--Ricardo 04:07, 17 August 2007 (EDT)
Resources
See also
- Screen Reader
- Speech recognition
- Shtooka Project
- Speak A simple but cute activity which animates a face as it reads the words typed by the child
- Talkntype Initial draft of an activity based on the Speak&Spell toy, using eSpeak speech synthesis.
- Free Speech Speech recognition and systhesis implemented as D-bus services using the Accessibility Toolkit Service Provider Interface (AT-SPI)