FreeIconToSpeech

From OLPC
Revision as of 15:23, 8 September 2008 by 76.224.118.215 (talk) (moved from Speech synthesis section)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

FreeIconTospeech, under development

Overview

The goal of FreeIconTospeech is to provide a low-cost assistive / augmentative communication tool for people with speech, motor, and/or developmental challenges. The immediate opportunity is to create open source software to allow a user to select concepts through a menu of icons, and synthesize speech from those selected concepts.

Existing tools in use for this purpose are priced in the thousands of dollars per device, and proprietary.

The OLPC XO platform, while not having a touch screen, is priced in the hundreds of dollars per device, and already contains many of the base components needed (evident in the text-to-speech synthesis activity Speak). The OLPC icon-to-speech need has been expressed by many people independently, including discussions at Talk:Speak#Accessibility and Talk:Accessibility#Augmentative_and_Alternative_Communication.

It appears that a proof of concept could be developed with a small time investment, and potential users are ready to test as soon as this is complete.

A prototype has been posted at Ispeak_(activity): File:Ispeak-1.xo This version runs at full speed on the XO.

User Interface Design

Initial discussions suggest a user interface which allows users to navigate a hierarchy of basic concepts, allowing some variability of detail / zoom, due to the variability of users' motor skills used to select concepts.

3 levels of hierarchy at 7 +/-2 groups/concepts per level would allow selection among hundreds of concepts, which appears to be a useful balance between richness of expression and speed of selection.

Display and navigation of the hierarchy can be a combination of existing concentric & zoomable menu approaches:

We envision three such navigation areas, displayed from left to right across the screen, for the selection of a subject, a verb, and an object of a basic sentence, with no attempt at grammatical accuracy.

Conceptual Content

The concept hierarchy can be synthesized from a careful blend of existing taxonomies. For an initial proof of concept, two useful taxonomies are from sign language and the food pyramid. Use of sign language extends all the way to toddlers, as an increasingly popular supplemental communication before they develop speech abilities, such as the "Sign With Your Baby" materials. 100 basic signs provide some of the most useful concepts for basic living: http://www.lifeprint.com/asl101/pages-layout/concepts.htm . Sign language may be doubly useful in some cases, when motor skills allow for communication with the manual signs. Icon libraries are already established for American Sign Language, and readily available for many of the USDA food pyramid categories: http://openclipart.org/media/tags/vegetable .


Developing appropriate and free and open source icons for this project is a challenge that the community/wiki could take on. Many users of Augmentative and Alternative Communication devices face visual, perceptual, and cognitive challenges. Therefore, icons should be as uncomplicated and transparent as possible. Examples: Mayer-Johnson symbols are widely used in American schools because the stick drawings are easily scalable and widely considered the most transparent for more abstract ideas. They are less concrete than pictures, however, which might pose a problem for early learners. They are also very heavily copyright protected, which does not coincide with OLPC's software freedom standard. www.mayer-johnson.com

Prentke Romich's symbols, also proprietary. support everything from early learning up to sophisticated semantic encoding to increase rate of messages. (i.e. swimming pool icon + color icon = blue or swimming pool + activity icon = swim)

The Tango! by Blink Twice also has a unique encoding system for early learners.

My points are: 1) a large scale Free and Open Source icon library probably needs to be developed. 2) the function of the device also should be considered. For young children and many people with autism and other related conditions,requesting is the first skill worked on -- asking for food/drink -controlling the other's actions to get needs met. For them, pages consisting of simple "I want" then branches to many different food items would be an idea setup.
Other functions of communication include building social closeness with close circle of people, transferring information to others, and participating in social interactions with community ("how are you" / "excuse me" etc.). Each of these functions varies in terms of the importance of the specific content of each message,the importance of the semantics of the message, and whether the communicator will be familiar or unfamiliar (a mom will be able to "read" a nonverbal child's gestures but a police officer might not) The device and page set ups should keep these situations in mind and design accordingly.
(source?)
It's always been my dream to make the XO into a sophisticated communication device. I've seen families spend thousands on devices that do not meet their children's needs and I would love to be involved with the project any way that I can.
Lesley,br. 01:32, 20 April 2008 (EDT)

Additional Enhancements and Uses

  • Input devices:
  • Additional languages & culturally-relevant icons
    • scalability needed for this, in terms of ontology & GUI
    • vectorize artwork - consider method used by www.CopyArtwork.com
  • Add to & change the vocabulary & icons with photos, utilizing the built-in OLPC XO camera.
  • Run on smaller devices, such as mobile phones, music players, and PDAs with adequate speaker output.
  • Ability to operate with more grammatical correctness for more formal situations such as public and educational settings.
  • Teaching of reading & writing in native language.
  • Teaching of second or foreign languages.
  • Selectable foreign language or culture for speech output, enabling basic communication across languages or cultures.
  • Recording the selections as near-ontological content warrants further discussion.
    • could record these in the Journal

User Interface mock-up, as a slide presentation

Open the slide presentation file: http://wiki.laptop.org/images/e/ec/FreeIconToSpeech_UI_text_demo_02.ppt .

[Work in progress: Icons are not drawn into this diagram yet. So for the moment, imagine that each word in black is replaced by an icon representing that concept.]

Click "people", "mom", "create", "cook", "food", and "beans", imagining the interface zooming in to where your pointer travels, for easier selectability.

Then the computer would consider your selections complete, and speak them.

A presentation on an alternate interface: http://wiki.laptop.org/go/Image:FreeIconToSpeech_Alternative_User_Interface.ppt

Thanks for ideas contributed & discussed at PyCon 2008 by Tony Anderson, Lisa Beal, Annie Barkau, Ed Cherlin, & Mel Chua.

- RMattB 2008 03 17

Please add your thoughts. :)