Speech Server: Difference between revisions

From OLPC
Jump to navigation Jump to search
 
(23 intermediate revisions by 11 users not shown)
Line 1: Line 1:
== Screen Reader TTS Service ==
== Speech Synthesis Server ==


=== Objective ===
=== Description ===


An easy to use api for speech synthesis which would be useful for self-voicing activities.
Develop a simple and scalable Screen Reader TTS Service for (Text to Speech) Plugin using eSpeak [[Speech_synthesis]] for XO using python.


=== Description ===
=== Existing Tools Present ===


*eSpeak [[Speech synthesis]]- TTS engine on XO
The Screen reader will provide the users with the following capabilities :
*speech-dispatcher [[http://www.freebsoft.org/doc/speechd Speech Dispatcher]]


'''''Please note that this is work in progress, and this API is currently NOT STABLE.'''''
*A TTS Control Panel to control various parameters of ESpeak.
*Ability to highlight text anywhere, and synthesize speech from it using a keyboard shortcut or through a button in Sugar UI.
*Voice recording and playback, to easily record your own voice reading the page in your own language, and create personalized spoken translations.


=== Target Audience ===
==== Modifiable Speech Parameters ====
Refer [http://espeak.sourceforge.net/speak_lib.h espeak Library API] to understand usage.
<pre>
espeakRATE = 1
espeakVOLUME = 2
espeakPITCH = 3
espeakRANGE = 4
espeakPUNCTUATION = 5
espeakCAPITALS = 6
</pre>


==== Modifiable Voice Parameters ====
'''Students''' [taken from [[Book_reader_feature_set]]]–
Follow the [http://espeak.sourceforge.net/voices.html espeak Voice Description]
<pre>
language -> language(In standard notation)
name -> a given name for this voice. By default it should be set NULL
identifier -> the filename for this voice within espeak-data/voices.
Should be set NULL by default.
gender -> voice gender( 1 = male, 2 = female, 0 = unknown)
age -> By default age = 0. espeak automatically sets it.
variant -> Used to modify the voice by different variant. Preferably 0.
</pre>


# A text to speech option can help kids learn to read.
==== A speech synthesis dbus daemon service ====
*A dbus-glib based speech server written in C.
# A text to speech option might help kids that do not like to read a lesson but would not mind listening to it at a speed they could understand it.
*Speech server reuses the libespeak library.
*Methods exposed via D-BUS for performing certain tasks globally to all xo activities.


=== Existing Tools Present ===
==== Speech Server Dbus Methods ====


===== SayText() =====
*eSpeak [[Speech_synthesis]]- TTS engine on XO
This Dbus Method accepts an incoming UTF8 string, and plays it back.
<pre>SayText(string text)</pre>


*Speaks the text pointed by string.
=== Elements of Screen Reader Service ===
*Must be a valid UTF 8 String


'''Example Python Code:'''
*A python ctypes file to link to libespeak library of espeak.
<pre>
*A dbus service to expose the espeak object globally to all xo activities.
import dbus
*a python script to accept highlighted data from sugar environment using X11 Primary selection and pass it to the dbus service for synthesis.
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SayText("Hello world! Hey I can talk to you")
</pre>

===== SetVoice() =====
This Dbus Method accepts parameters to set the voice parameters for espeak. Refer to the Modifiable Voice Parameters for more details. This methods can be used to configure the voice properties of espeak.
<pre>
SetVoice(String name,
String languages,
String identifier,
int gender,
int age,
int variant,
)
</pre>
'''Python Example:'''
<pre>
import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SetVoice("","fr", "", 2,0,0)
#Choose a female voice to speak french text
espeak_object.SayText("Je suis une etudiante!")
</pre>

===== SetParameter() =====
This Dbus Method accepts parameters to set the speech parameters for espeak. Refer to the Modifiable Speech Parameters for more details.
<pre>
SetParameter(int PARAMETER_NAME, int PARAMETER_VALUE)
</pre>
'''Python Example:'''
<pre>
import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SetParameter(1, 60)
#Modifies the espeakRATE parameter to speak 60 words per minute
espeak_object.SayText("I am a very lazy speaker!")
</pre>

===== GetConfiguration() =====
This Dbus Method returns a dbus.Structure
This is required for getting the current settings of the espeak service. It is required to display the present settings of espeak in the control panel that will be made available for tuning the espeak parameters.
<pre>
GetConfiguration()
</pre>

'''Python Example:'''
<pre>
import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SetParameter(1, 60)
espeak_object.SetVoice("","en-uk", "", 2,0,0)
val = espeak_object.GetConfiguration()
print val
</pre>

===== SaveConfiguration() =====
This Dbus Method allows the user to save the current espeak parameters.
<pre>
SaveConfiguration()
</pre>

'''Python Example:'''
<pre>
import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SetParameter(1, 60)
espeak_object.SetVoice("","en-uk", "", 2,0,0)
espeak_object.SaveConfiguration()
#Will overwrite the existing user settings
</pre>


===== LoadConfiguration() and LoadDefaultConfiguration() =====
This Dbus Method allows the user to set all espeak parameters to his/her preferences which they are required to save.
<pre>
LoadConfiguration()
or
LoadDefaultConfiguration()
</pre>

'''Python Example:'''
<pre>
import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.LoadConfiguration()
</pre>


=== Installing speech-dispatcher on the xo ===

==== RPM Installation ====

You'll need to download the following RPM packages on a pen drive:

*[[http://mirror.mossc.com/mirror/cnsi/testing/RPMS/i386/dotconf-1.0.13-1.fc5.i386.rpm dotconf libraries]]
*[[http://mirror.mossc.com/mirror/cnsi/testing/RPMS/i386/speech-dispatcher-0.6.1-1.fc5.i386.rpm speech-dispatcher rpm packages]]

Assuming that the packages are in the root of your pen-drive, perform the following commands as root.

<pre>
cd /media/[name of pen drive]/
rpm --install dotconf-1.0.13-1.fc5.i386.rpm
rpm --install --nodeps speech-dispatcher-0.6.1-1.fc5.i386.rpm
</pre>

''''At this stage you may get some warnings/errors about festival/flite and python-abi not being present. You can safely ignore these warnings''''

==== Yum Installation ====

Make yourself root and run the following command:

yum install speech-dispatcher speech-dispatcher-python

Yum needs more disk space than the RPM install, because it downloads several megabytes for the Yum database, plus it downloads the pulse libraries and Festival Lite, neither of which is needed on the XO. You may be able to use RPM to remove these later, but they only use about 7 megabytes.

==== Configuration ====

We now modify the configuration files to make speech-dispatcher use eSpeak (which is part of the XO distribution, so you don't need to install it):

<pre>
vi /etc/speech-dispatcher/speechd.conf
</pre>

If you did a yum install the lines in speechd.conf should look like this, but double check to be sure:

<pre>
# AddModule loads an output module.
# Syntax: AddModule "name" "binary" "configuration" "logfile"
# - name is the name under which you can acces this module
# - binary is the path to the binary executable of this module,
# either relative (to lib/speech-dispatcher-modules/) or absolute
# - configuration is the path to the config file of this module,
# either relative (to etc/speechd/modules/) or absolute

AddModule "espeak" "sd_espeak" "espeak.conf"
#AddModule "festival" "sd_festival" "festival.conf"
#AddModule "flite" "sd_flite" "flite.conf"
#AddModule "espeak-generic" "sd_generic" "espeak-generic.conf"
#AddModule "epos-generic" "sd_generic" "epos-generic.conf"
#AddModule "dtk-generic" "sd_generic" "dtk-generic.conf"
#AddModule "ibmtts" "sd_ibmtts" "ibmtts.conf"
#AddModule "cicero" "sd_cicero" "cicero.conf"

# The output module testing doesn't actually connect to anything. It
# outputs the requested commands to standard output and reads
# responses from stdandard input. This way, Speech Dispatcher's
# communication with output modules can be tested easily.

# AddModule "testing"

# DefaultModule selects which output module is the default. You must
# use one of the names of the modules loaded with AddModule.

DefaultModule espeak
</pre>

If you are running speech-dispatcher on an XO your espeak.conf file should already specify that you are using ALSA output. If you are setting up a test environment running Fedora 10 or another distribution that uses pulse-audio you will still want to use ALSA, because that works better with speech-dispatcher than pulse-audio does, and of course that also makes your test environment as much like the XO as possible. To do this you'll need to remove the RPM (or equivalent) for the alsa-to-pulse-audio-bridge. You should be able to remove this without removing pulse-audio entirely. After that speech-dispatcher should work using ALSA with no problems.

<pre>
rpm -e alsa-plugins-pulseaudio
</pre>

To edit the espeak configuration:

<pre>
vi /etc/speech-dispatcher/modules/espeak-generic.conf
</pre>

The file should have lines looking like this:

<pre>
# Chooses between three possible sound output systems:
# "oss" - Open Sound System
# "alsa" - Advanced Linux Sound System
# "nas" - Network Audio System
# "pulse" - PulseAudio
# ALSA is default and recommended. The recent implementations
# support mixing of multiple streams. OSS is only provided
# for compatibility with architectures that do not include ALSA.
# NAS is an audio server with higher level of control over
# your audio stream, with the possibility to stream your audio
# over the network to a different computer and other advanced
# features. (The NAS backend is not very well tested however.)
# PulseAudio is a sound server for POSIX and WIN32 systems.
#

EspeakAudioOutputMethod "alsa"
</pre>

Now start the speech-dispatcher service and test if it works correctly

<pre>
speech-dispatcher -d
spd-say "Yes this should work"
</pre>

You will need to do this before running any Activity that uses Text to Speech. Read Etexts is an Activity you can use to try out this feature.

=== Voice Files ===

These are some voice samples which give better voice quality than the default ones
To use these files:
* Create a new file in eSpeak\espeak-data\voices folder say with name <tt>testvoice</tt>
* Copy any one of these in that file and save it
* Run on terminal <tt>espeak -vtestvoice "testing new voice"</tt>

<pre>
name english
language en-uk 2
gender male

pitch 82 117
replace 03 I i
replace 03 I2 i
echo 30 30
formant 0 100 100 150
voicing 200
</pre>

<pre>
name english
language en-uk 2
gender female

pitch 82 100
echo 10 25
formant 0 100 100 150
voicing 100
roughness 1
flutter 1
</pre>


=== Codebase ===
=== Codebase ===
Line 36: Line 296:
=== Team ===
=== Team ===



Advised and mentored by Arjun Sarwal
Core Team :


*Assim Deodia
*Assim Deodia
*Cody Lodrige
*[[Hemant_Goyal|Hemant Goyal]]
*Hemant Goyal

Mentor : Arjun Sarwal


[[Category:Accessibility]]
The team would also like to express their gratitude to Cody Lodrige for his assistance during coding. Specifically he wrote the ctypes binding to libespeak and created a dbus service for the same.
[[Category:Speech Synthesis]]

Latest revision as of 15:45, 18 February 2009

Speech Synthesis Server

Description

An easy to use api for speech synthesis which would be useful for self-voicing activities.

Existing Tools Present

Please note that this is work in progress, and this API is currently NOT STABLE.

Modifiable Speech Parameters

Refer espeak Library API to understand usage.

espeakRATE        = 1
espeakVOLUME      = 2
espeakPITCH       = 3
espeakRANGE       = 4
espeakPUNCTUATION = 5
espeakCAPITALS    = 6

Modifiable Voice Parameters

Follow the espeak Voice Description

language   -> language(In standard notation)
name       -> a given name for this voice. By default it should be set NULL
identifier -> the filename for this voice within espeak-data/voices.
              Should be set NULL by default.
gender     -> voice gender( 1 = male, 2 = female, 0 = unknown)
age        -> By default age = 0. espeak automatically sets it.
variant    -> Used to modify the voice by different variant. Preferably 0.

A speech synthesis dbus daemon service

  • A dbus-glib based speech server written in C.
  • Speech server reuses the libespeak library.
  • Methods exposed via D-BUS for performing certain tasks globally to all xo activities.

Speech Server Dbus Methods

SayText()

This Dbus Method accepts an incoming UTF8 string, and plays it back.

SayText(string text)
  • Speaks the text pointed by string.
  • Must be a valid UTF 8 String

Example Python Code:

import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SayText("Hello world! Hey I can talk to you")
SetVoice()

This Dbus Method accepts parameters to set the voice parameters for espeak. Refer to the Modifiable Voice Parameters for more details. This methods can be used to configure the voice properties of espeak.

SetVoice(String name,
	 String languages,
	 String identifier,
	 int gender,
	 int age,
	 int variant,
	)

Python Example:

import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SetVoice("","fr", "", 2,0,0)
#Choose a female voice to speak french text
espeak_object.SayText("Je suis une etudiante!")
SetParameter()

This Dbus Method accepts parameters to set the speech parameters for espeak. Refer to the Modifiable Speech Parameters for more details.

SetParameter(int PARAMETER_NAME, int PARAMETER_VALUE)

Python Example:

import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SetParameter(1, 60)
#Modifies the espeakRATE parameter to speak 60 words per minute
espeak_object.SayText("I am a very lazy speaker!")
GetConfiguration()

This Dbus Method returns a dbus.Structure This is required for getting the current settings of the espeak service. It is required to display the present settings of espeak in the control panel that will be made available for tuning the espeak parameters.

GetConfiguration()

Python Example:

import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SetParameter(1, 60)
espeak_object.SetVoice("","en-uk", "", 2,0,0)
val = espeak_object.GetConfiguration()
print val
SaveConfiguration()

This Dbus Method allows the user to save the current espeak parameters.

SaveConfiguration()

Python Example:

import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.SetParameter(1, 60)
espeak_object.SetVoice("","en-uk", "", 2,0,0)
espeak_object.SaveConfiguration()
#Will overwrite the existing user settings


LoadConfiguration() and LoadDefaultConfiguration()

This Dbus Method allows the user to set all espeak parameters to his/her preferences which they are required to save.

LoadConfiguration()
or 
LoadDefaultConfiguration()

Python Example:

import dbus
bus = dbus.SessionBus()
espeak_object = bus.get_object('org.laptop.Speech','/org/laptop/Speech')
espeak_object.LoadConfiguration()


Installing speech-dispatcher on the xo

RPM Installation

You'll need to download the following RPM packages on a pen drive:

Assuming that the packages are in the root of your pen-drive, perform the following commands as root.

cd /media/[name of pen drive]/
rpm --install dotconf-1.0.13-1.fc5.i386.rpm 
rpm --install --nodeps speech-dispatcher-0.6.1-1.fc5.i386.rpm

'At this stage you may get some warnings/errors about festival/flite and python-abi not being present. You can safely ignore these warnings'

Yum Installation

Make yourself root and run the following command:

yum install speech-dispatcher speech-dispatcher-python

Yum needs more disk space than the RPM install, because it downloads several megabytes for the Yum database, plus it downloads the pulse libraries and Festival Lite, neither of which is needed on the XO. You may be able to use RPM to remove these later, but they only use about 7 megabytes.

Configuration

We now modify the configuration files to make speech-dispatcher use eSpeak (which is part of the XO distribution, so you don't need to install it):

vi /etc/speech-dispatcher/speechd.conf

If you did a yum install the lines in speechd.conf should look like this, but double check to be sure:

# AddModule loads an output module.
#  Syntax: AddModule "name" "binary" "configuration" "logfile"
#  - name is the name under which you can acces this module
#  - binary is the path to the binary executable of this module,
#    either relative (to lib/speech-dispatcher-modules/) or absolute
#  - configuration is the path to the config file of this module,
#    either relative (to etc/speechd/modules/) or absolute

AddModule "espeak"       "sd_espeak"   "espeak.conf"
#AddModule "festival"     "sd_festival"  "festival.conf"
#AddModule "flite"        "sd_flite"     "flite.conf"
#AddModule "espeak-generic" "sd_generic" "espeak-generic.conf"
#AddModule "epos-generic" "sd_generic"   "epos-generic.conf"
#AddModule "dtk-generic"  "sd_generic"   "dtk-generic.conf"
#AddModule "ibmtts"       "sd_ibmtts"    "ibmtts.conf"
#AddModule "cicero"        "sd_cicero"     "cicero.conf"

# The output module testing doesn't actually connect to anything. It
# outputs the requested commands to standard output and reads
# responses from stdandard input. This way, Speech Dispatcher's
# communication with output modules can be tested easily.

# AddModule "testing"

# DefaultModule selects which output module is the default.  You must
# use one of the names of the modules loaded with AddModule.

DefaultModule espeak

If you are running speech-dispatcher on an XO your espeak.conf file should already specify that you are using ALSA output. If you are setting up a test environment running Fedora 10 or another distribution that uses pulse-audio you will still want to use ALSA, because that works better with speech-dispatcher than pulse-audio does, and of course that also makes your test environment as much like the XO as possible. To do this you'll need to remove the RPM (or equivalent) for the alsa-to-pulse-audio-bridge. You should be able to remove this without removing pulse-audio entirely. After that speech-dispatcher should work using ALSA with no problems.

rpm -e alsa-plugins-pulseaudio

To edit the espeak configuration:

vi /etc/speech-dispatcher/modules/espeak-generic.conf

The file should have lines looking like this:

# Chooses between three possible sound output systems:
#       "oss"   - Open Sound System
#       "alsa"  - Advanced Linux Sound System
#       "nas"   - Network Audio System
#       "pulse" - PulseAudio
# ALSA is default and recommended. The recent implementations
# support mixing of multiple streams. OSS is only provided
# for compatibility with architectures that do not include ALSA.
# NAS is an audio server with higher level of control over
# your audio stream, with the possibility to stream your audio
# over the network to a different computer and other advanced
# features. (The NAS backend is not very well tested however.)
# PulseAudio is a sound server for POSIX and WIN32 systems. 
#

EspeakAudioOutputMethod "alsa"

Now start the speech-dispatcher service and test if it works correctly

speech-dispatcher -d
spd-say "Yes this should work"

You will need to do this before running any Activity that uses Text to Speech. Read Etexts is an Activity you can use to try out this feature.

Voice Files

These are some voice samples which give better voice quality than the default ones To use these files:

  • Create a new file in eSpeak\espeak-data\voices folder say with name testvoice
  • Copy any one of these in that file and save it
  • Run on terminal espeak -vtestvoice "testing new voice"
name english
language en-uk  2
gender male

pitch 82 117
replace 03 I i
replace 03 I2 i
echo 30 30
formant 0 100 100 150
voicing 200
name english
language en-uk  2
gender female

pitch 82 100
echo 10 25
formant 0 100 100 150
voicing 100
roughness 1
flutter 1

Codebase

The code for the project can be accessed in the git repository at | Screen Reader GIT

Team

Core Team :

  • Assim Deodia
  • Cody Lodrige
  • Hemant Goyal

Mentor : Arjun Sarwal