Listen and Spell

(Redirected from Projects/Listen Spell)
Jump to: navigation, search
OlpcProject.png Assim Deodia
?Sugar icon}}    This activity is now hosted at the Sugar Activity Library.

The information here is likely to be out-of-date. Consult the new pages for "Listen and Spell" first:


The idea is to develop an application which would help children to learn new words, improve their vocabulary and pronunciation of words. The activity would speak out a randomly selected word from a predefined set of words and the user is expected to spell the word correctly. For voice synthesis activity would be using Speech-Dispatcher and for the list of words it will have a custom dictionary. This activity is an extension of TalknType.


The basic thing needed to learn a language is to learn its building blocks i.e. words and their pronunciation and how they are spelled. Grammar of course had its preference. This project aims to provide an activity which would help children to learn new words, their pronunciation, the way they are spelled and to some extent its meaning also.

Use Case Scenario

A simple use case scenario of the Test Mode is as follows

  • User opens the activity and enters the difficulty level of which he/she would like to hear words.
  • A random word would be selected and spoken out from the corresponding level:word list. e.g. "Spell Ocean" would be spoken out.
  • User is required to spell the word correctly. (Time limit can be optional)
  • The activity would speak out each letter as the user types and the whole word as user submit the word.(This will help user to "feel" the difference between his spelling and the correct one.) This option can be disabled in case of group test(explained further).
  • There would be an option to repeat the word and also for the hint.
  • The hint option will either give user the meaning of the word or its usage in the sentence or image if possible. E.g. for Ocean it can either speak out its usage "The ocean is full of water" or can print its definition on screen i.e. "One of the five large bodies of water separating the continents".
  • User can quit or change the level any time during the game.

To make user experience more lively, sounds for different events (Like activity start, Correct answer, Wrong Answer) would be used.

Level Description

Level of a word is decided primarily on the basis of the number of letters in it. Most of the words would be nouns which can be easily understand by children.

  • Initial level would include three to four letter words
    • e.g. cat, dog, tree, cup, bear etc
  • Medium level would contain five to six letters words
    • e.g. monkey, mouse, earth, plane, toffee etc
  • Hard level: Seven or more letters
    • e.g. computer, Mississippi, dictionary etc
  • professional level (If included) would have complete sentences.

Proposed features

Following are the proposed features for the activity

  1. *Word source:- The word source would be a expandable custom dictionary which would be using word list from Words activity. Words currently support French, German, Italian, Portuguese, and Spanish and of course English. An option would be provided to update dictionary which would search on the network for the dictionary server and update itself.
  2. *Implementation of "Hint" :- This would be using I have explained this further in following sections.
  3. Speech-dispatcher: - The voicing would be done using speech dispatcher which would eventually be using espeak for synthesis. Espeak supports more than 30 international languages.
  4. User defined word list: - This would facilitate users to add their own word list which can help in conducting a small group test. Option to add words through mesh network would be help in large group/class test.
  5. Multiplayer game over mesh network: - Users can challenge each other over the network. One XO will then act as a server which would generate the word list for all the clients. All the users would receive same word list with limited retry option for each word after which next word would be given to user. The one who spelled most correct words in limited time wins. Option to speak each letter aloud would be disabled in this case.
  6. Memory tool (A possible extension):- A tutor mode in which activity repeats the word again and again until the spelling is absorbed into child's mind.
  7. Input Methods :- Input Methods would be exposed externally so that otehr input methods(Like Handwritting and Speech recognization) could be incorporated

Other supportive features

  1. Voice configuration: Option to edit voice configuration like volume, pitch, rate, language of the words and voice, gender of the voice etc.
  2. Preferences to choose level of “Hint”: i.e. to select from word usage or word definition or images if possible.
  3. Option to save the score as well as game also (in Journal) and retrieve later.

Implementation Details

  • For the word source an expandable dictionary would be implemented which would initially use the word list from Words activity ( (Due to rainbow security one activity cannot access data of another activity) it will also contain their definition and usage in an XML format so that it can be accessible through the activity. I am also looking for another open source dictionaries as a possible option. Newer words would be updated manually or through a server on mesh network. This would require one of the laptop to act as a server
  • There would be two ways to implement 'Hint' option
  1. Hint part would be fetched from the implemented dictionary
  2. Using Wiktionary API. These API's provide a direct, high-level access to the data contained in the Media Wiki databases. Using these APIs it would be possible to extract definition or usage of the word easily. These APIs returns the data in many formats like JSON, XML, txt etc. e.g. will give information about "ocean" in XML format. It may look like some absurd text but with proper parsing information like translation to other language, meaning can be extracted
  • Speech-Dispatcher: Speech-Dispatcher ( is a socket-connection based speech server which provides speech APIs in many languages including Python and C. I had a discussion with OLPC developers where considering the need of speech server in XO they agreed to ship this in XO once its RPM is approved by Fedora Package Maintainers. Its RPM is under review process and should get approved soon. I have already got approval for its dependency Dotconf RPM
  • Language of implementation: Python
  • GUI: All the GUI part would be done in PyGTK and Glade
  • Parser for configuration files and dictionary data: OLPC includes many python modules which also include expat xml parser. This module can be used to parse the data and extract the information required
  • To have access over mesh network:- PresenceService DBUS API would be used



The proposed dictionary in this activity would be XML based having 26 XML files for each starting letter. Following fields would be associated with each word

  • Description(string)
    • Definition
    • Usage
  • Level(integer)
  • Image URI(string)
  • IS_INCORRECT(boolean)
  • Phoneme data(string)
  • correct/incorrect(int/int)
  1. Description: Contains the definition of the word or its usage or both. Both these things would be fetched from the server.
  2. Level: This will store the level for each word based upon the number of letters. see Level Description.
  3. Image URI: Contains the location of the Image associated with the word
  4. IS_INCORRECT: This flag is marked by user when the word is pronunced incorrectly by the TTS engine. If this is true activity will use the phoneme data associated with the word for synthesis. If phoneme data is not present it will try to reach the server and download the correct data. If that is also not possible it will set the level as -1 denoting not usable.
  5. Phoneme data: Phoneme data of the word as fetched from the server.
  6. correct/incorrect: Number of times this word has been spelled correctly followed by a delimiter then Number of times it has been spelled incorrectly.

A sample format of a tuple would be like

		<definition>Group of people sharing a common understanding</definition>
	<image>./resource/group of people.gif</image>


Possible ways to update the dictionary

  1. The dictionary will update itself with the words given by the teacher for the group test and words added while playing on mesh network. The definitions would be fetched from wiktionary whenever XO gets access to the network. The different variant of the words (like for "change" : changed, changing etc) will automatically be included.
  2. Another way to update the dictionary is manually. This will require teacher or user himself to upload words manually. Source for the list of words can be generated from many available free dictionaries (Like aspell( or Format conversion will be required for this as all dictionaries in different format.
  3. Dictionary sharing can be a possible way to update dictionaries. This will be actually complete dictionary sharing (words, description, images(image + image URI), level, phoneme data and flags of each word).

A work around for the words which are not pronounced correctly by TTS.

  • The TTS engine used in XO is espeak. It has pre-defined pronunciation rules for each language. However it do have an option to include the words and their phoneme data which are exception to the rules. Adding words to those files requires little linguistic knowledge. But with some practice and little guidance normal user can also add correct pronunciation to those files. Another possible way to achieve this is to store the phoneme data/sound of the words which are not pronounced correctly into the dictionary itself and marking those entires with a flag. This can save the task of editing espeak files and further complication. Speech-Dispatcher provides the API to directly send the phoneme data for syntheses. Wiktionary(with many other online dictionaries) also provides phoneme data for many of the words which are difficult to pronounce. But its format is different from the one used for espeak. A conversion method can be requested to espeak developers regarding these. User can flag those words as incorrect and their correct data would be fetched from the server.

Possible extension

One could be a tutorial for learning languages using this activity like

  1. The activity would teach basic sounding vowels like a as in cat, e as in bed, air as in hair etc
  2. Sounds of consonants like b as in bed, ch as in change, d as in day etc
  3. Teaching the sound of the whole word

It would be great if children enters the words and get to know how to pronounce


I would really appreciate if you can give some suggestion of feedback on my proposal here (

Resources and Project status

  1. TalknType
  2. Speech Dispatcher
  3. Words Activity
  4. Speech Synthesis
  5. Screen Reader
  6. Speak & Spell
  7. Project status
  8. Google project page

This project has begun. As the header suggests, see the latest code and project pages. For earlier (now out of date) project status see Assim's weekly updates. He maintained all the project related updates there for months. A related XO project temporarily named "speak and spell" was worked on in 2009, with support from Seeta.