Your Voice on XO

From OLPC
Jump to navigation Jump to search

Proposal

This is a proposal for the creation of a new activity for the XO that would advance localization efforts in TTS development, as well as promote the involvement of the local community overall. "Your voice on XO" would consist of a long-term, community-based project to build and/or further development of a synthetic voice for the language used locally (for more on synthetic-voice building, go here, and here).

Implementation details

This activity would entail integrating the voice-building capabilities of eSpeak, or perhaps # Festival, into Sugar on the XO, and working to facilitate synthetic-voice building in a classroom, or community setting (for an overall view of how the voice building process might proceed, go here). This effort would be carried out with a focus on a GUI that would be easy to use, as well as on the integration of the activity with all TTS and pertinent language-related activities—e.g., TalknType, Orca, and E-Book Reader. As such, the use of a high-level, device independent platform such as Speech Dispacher would be ideal. In fact, speechd supports Festival, eSpeak, and several other TTS engines, and is in current use at OLPC.

The activity would consist of component sto facilitate voice recordings, phonetic data manipulation and callibration, and dictionary file management. The phonetic and dictionary components would follow the overall scheme layed out by eSpeak for adding a language. In particular, the phonetic data manipulation component might be based around an interface that would mimic the eSpeak editor interface. Finally, the voice recording component might exploit existing resources, such as those offered by the Record activity.


Activity

The overall activity would proceed as follows. A teacher in a community supported by OLPC that uses a language or dialect that is not available via TTS—or any regional or dialectical form of a language that is already supported, for that (e.g., French in W. Africa, or Haiti)—would work with a child, teacher or community volunteers to build a synthetic voice using Festival. Such an undertaking would involve a considerable amount of speech recordings and textual corpora for any given language, but given access to the Internet (and particularly, reasonably large mutilingual corpora, such as those at Wikipedia), and with some effort, this activity could turn into a reasonably accessible endeavor for communities supported by OLPC.


Long-term impact

Ultimately, this project seeks to not only improve, but also localize the voice quality of existing languages via the efforts of the local community, to increase the phoneme data that helps improve speech synthesis quality, and to increase involvement in the community via OLPC. Localization efforts include the addition of previously unsupported languages and dialects to the XO's linguistic repertoire, the fine-tuning of languages already present, and the “naturalization” of existing TTS languages. This last item refers particularly to languages with wide geographic spread, and/or considerably diverse forms, such as Spanish, English, French, Arabic and Chinese. Despite their diverse “spoken” forms, such languages would benefit from the existence of a relatively uniform orthographic form, as is the case with the examples provided. On the other hand, TTS efforts should aim to rein in new languages and improve existing ones via the addition of new phonetic data provided by the linguistic community. Finally, the involvement of the community would be fostered not only directly, but also indirectly via this new activity. One example that comes to mind concerns traditionally oral languages, such as Amerindian languages, but also dialectical forms of a given language. In these instances, textual corpora would come primarily from reseach efforts by trained linguists. Such scenearios would obviously necessitate involvement on behalf of the OLPC program, or perhaps local NGO's involved in projects with the impacted community. Indeed, the benefits of an activity such as “Your voice on XO” would be numerous and far-reaching, not only given the direct impact that it would have on all of the activities that make use of TTS, but esp. based on the potential of such localization efforts with regards to the increased involvement and education of the community.


Contraints

One of the main concerns in this endeavor is the procurement of sufficient storage space for audio recordings. Such a constraint may be surmounted through one of several creative means, and it may not be so difficult to get around the limitations imposed by the storage capacity on the XO. One alternative that comes to mind is via some form of external storage, ideally in the form of a solid-state drive (SSD). Of course, more affordable and integrated solutions may be preferable, especially given the high storage-space-to-cost ratio of SSD technology. One solution might involve the recent efforts to introduce School servers with increased, community storage space via the OLPC program. At the same time, given the efforts necessary to develop a synthetic voice—speech recordings, corpora building, and overall project management—it is easy to see how such an activity would require a considerable degree of planning based around community-driven resources, including securing a suitable recording environment, the involvement time and commitment required of a teacher, mature student, or collection of individuals coordinating the activity, as well as the involvement of any other interested members of the community. With this in mind, then, it is conceivable that such efforts to localize the XO's TTS resources, even on a national scale, would be limited to and based largely on the interest of a given community. As such, the added cost of a simple, portable storage solution beyond what is offered by the XO would not be considerable when seen on a regional, or national level. Finally, the introduction of external storage space can be see as an added-value not just for language-related localization efforts, but also in other fruitful realms, particularly as concerns educational media. Indeed, video, sound and photographic media would benefit considerably from an expansion in storage space, and such media would aid immesurably in efforts to foster a higher level of interaction from, and education for, end-users and, most importantly, the communities of users impacted by the OLPC's mission.


Feedback

Please feel free to leave feedback regarding “Your voice on XO”.


Resources

  1. eSpeak
  2. eSpeak – supported languages
  3. eSpeak – adding languages
  4. Festival
  5. Festival – building synthetic voices
  6. Speech Dispatcher
  7. Speech Synthesis
  8. Screen Reader