Speech Server: Difference between revisions

From OLPC
Jump to navigation Jump to search
Line 11: Line 11:
*A TTS Control Panel to control various parameters of ESpeak.
*A TTS Control Panel to control various parameters of ESpeak.
*Ability to highlight text anywhere, and synthesize speech from it using a keyboard shortcut or through a button in Sugar UI.
*Ability to highlight text anywhere, and synthesize speech from it using a keyboard shortcut or through a button in Sugar UI.
*A panel to modify the speech parameters like words/min, language, gender, pitch , volume etc.
*Voice recording and playback, to easily record your own voice reading the page in your own language, and create personalized spoken translations.
*Providing ability to user to save their preferences and retrieve them after system reboot.
*Providing the user ability to load the default settings for the speech service.
*Providing the user a play button which would play the highlighted text, and change to a stop button to asynchronously stop playback.
*Providing a parallel activity using the underlying speech server. The activity will allow kids to type in a text box and use the speech server capabilities. Allowing the kid to experiment with the speech parameters can be an extremely interesting activity for them. For example increasing the rate of speaking wordw along with the pitch will yield a very interesting voice which the kids (may) find funny.


=== Target Audience ===
=== Target Audience ===

Revision as of 12:20, 18 December 2007

Screen Reader TTS Service

Objective

Develop a simple and scalable Screen Reader TTS Service for (Text to Speech) Plugin using eSpeak Speech synthesis for XO using python.

Description

The Screen reader will provide the users with the following capabilities :

  • A TTS Control Panel to control various parameters of ESpeak.
  • Ability to highlight text anywhere, and synthesize speech from it using a keyboard shortcut or through a button in Sugar UI.
  • A panel to modify the speech parameters like words/min, language, gender, pitch , volume etc.
  • Providing ability to user to save their preferences and retrieve them after system reboot.
  • Providing the user ability to load the default settings for the speech service.
  • Providing the user a play button which would play the highlighted text, and change to a stop button to asynchronously stop playback.
  • Providing a parallel activity using the underlying speech server. The activity will allow kids to type in a text box and use the speech server capabilities. Allowing the kid to experiment with the speech parameters can be an extremely interesting activity for them. For example increasing the rate of speaking wordw along with the pitch will yield a very interesting voice which the kids (may) find funny.

Target Audience

Students (taken from Book reader feature set)–

  1. A text to speech option can help kids learn to read.
  2. A text to speech option might help kids that do not like to read a lesson but would not mind listening to it at a speed they could understand it.

Existing Tools Present

Elements of Screen Reader Service

  • A python ctypes file to link to libespeak library of espeak.
  • A dbus service to expose the espeak object globally to all xo activities.
  • a python script to accept highlighted data from sugar environment using X11 Primary selection and pass it to the dbus service for synthesis.

Codebase

The code for the project can be accessed in the git repository at | Screen Reader GIT

Team

Core Team :

Mentor : Arjun Sarwal