Sound: Difference between revisions
(→See also: +Programming_the_camera#GStreamer_101) |
No edit summary |
||
(26 intermediate revisions by 12 users not shown) | |||
Line 1: | Line 1: | ||
<center>''For a list of sounds, see also [[:Category:Audio]].''</center> |
|||
{{stub}} |
|||
A few notes on sound: Ogg vorbis and speex as our preferred codecs for general audio and human speech. We are looking into using Pulse Audio Server, which uas transparency across the network. |
|||
* See also the OLPC XO [[startup sound]] by the Edge. |
|||
Vorbis, Theora, Speex (and FLAC) are all under the xiph.org umbrella, and all can be sent in .ogg files. |
|||
* We also have a popular collection of free [[sound samples]]. |
|||
== |
== Software overview == |
||
Starting from user activities and working down... |
|||
Gstreamer has a speex decoder, so things which use gstreamer, like the totem Browse plug-in, should play it. |
|||
=== Sound playback === |
|||
You play back sound files in [[Browse]]. |
|||
Vorbis and [[Speex]] are our preferred codecs for general audio and human speech. |
|||
These compressed representations of audio are stored in files using the [[Ogg]] container file format. |
|||
Ogg files generally have the extension <tt>.ogg</tt>. |
|||
The [[Totem plugin|Totem browser plugin]] handles .ogg files. |
|||
It uses the [[GStreamer]] multimedia framework to unpack Ogg containers and decode audio and video streams. |
|||
=== Sound generation and music activities === |
|||
You create sounds in the [[TamTam]] suite of music and sound activities. Or you can try building |
|||
your own sound/music activities using [[Csndsugui]]. OLPC activities use the [[Csound]] music and audio signal processing language as their audio "engine". Csound can play Ogg Vorbis files through specialised opcodes. However it is envisaged that the next version of libsndfile (which Csound uses) will allow for a more transparent and flexible IO to this file formats. |
|||
=== Sound recording === |
|||
The [[Record]] activity also uses the [[GStreamer]] multimedia framework to record video and audio from the XO's camera and microphone. |
|||
=== Low-level === |
|||
The XO's low-level sound API and kernel drivers are ALSA ([[wikipedia:Advanced Linux Sound Architecture]]). Some useful commands for audio and MIDI status are |
|||
$ cat /proc/asound/cards |
|||
lists your installed sound cards. This can be useful if you want to use USB cards: if the |
|||
card was recognised, it will be on the list. If you want to set the volume, mute or |
|||
control the sound device you can use the command |
|||
$ alsamixer |
|||
=== MIDI === |
|||
For MIDI, you can use |
|||
$ amidi -l |
|||
to list existing MIDI devices. A number of USB MIDI controllers are known to work with |
|||
the XO: Behringer UMX25; Evolution e-keys mini keyboard; M-Audio Trigger Finger, Oxygen 02 |
|||
and later versions of Oxygen8; in general, all USB (MIDI) that are standards-compliant. |
|||
The amidi command can be used for sending and receiving MIDI data to/from a device as well. |
|||
Similarly, the aconnect command can be used to list and connect MIDI ports. |
|||
By loading up the virmidi module, it is possible to connect software MIDI ports together |
|||
(eg. from a MIDI data source to Csound for playback): |
|||
$ modprobe snd-virmidi |
|||
This will give you four midi patchcords, which can be found by listing them as above |
|||
== Other sound software == |
|||
Some Linux software packages require other sound software such as [[Jack]] (another audio server) and portaudio (an audio library used to access the audio device). |
|||
[[Csound]] can use Jack and as of August 2008 (see next section below), since the software is currently included in [[joyride]] builds. |
|||
Although Csound can also use portaudio, this is unlikely to be used, as its direct ALSA support has better performance. |
|||
Programs can use the snd-pcm-oss module, which emulates <tt>/dev/dsp</tt>. |
|||
This is currently not loaded by default, but can easily be done by the command |
|||
$ modprobe snd-pcm-oss |
|||
If this is working, then it's possible to test it using |
|||
$ cat /dev/dsp > /dev/dsp |
|||
=== Future possibilities === |
|||
Given that ALSA supports only single access to the audio device, it is important that activities and programs share it accordingly. |
|||
This means, for example, releasing the device when out of focus. |
|||
With the adoption of an audio server, this limitation would disappear. |
|||
We are looking into using [[PulseAudio]] sound server in the future (release [[9.1.0]] or beyond), which has transparency across the network. [[Csound]] has an IO module to connect to it, which would allow a simpler migration path to this system. |
|||
Some tests have been carried out with Jack as a server and csound as a client. On one terminal, |
|||
start jack |
|||
$jackd -P 99 -d alsa -r 22050 |
|||
then in a different terminal (or from an activity) run Csound (you will need the latest |
|||
olpcsound package [http://koji.fedoraproject.org/koji/packageinfo?packageID=6247], > 5.08.92.11), using -+rtaudio=jack and |
|||
-odac:system:playback_ |
|||
$csound -odac:system:playback_ -m0 -+rtaudio=jack mymusic.csd |
|||
it is important that the SR used in Csound agrees with the one in Jack (22050 in this case). If your |
|||
csound code is not too demanding, you can open another terminal and start another csound process playing something else. Jack mixes the incoming signals. You can also pump audio from one process to another. |
|||
== Hardware== |
|||
See [[Audio hardware]] |
|||
=== Speaker Capabilities === |
|||
(most data from the leader of the [[TamTam]] project [[User:Ethrop | Ethrop]] who might have more data on the subject) |
|||
The speakers in the XO are from and for cellphone speakers. They are optimized for voice, and have less quality frequency response at the low end of the spectrum. |
|||
The XO speakers have a severely biased frequency response. We have recently performed a thorough analysis of the audio response curve of the machine and there is a spectacular 12dB peak between 3000 and 4500 Hz, this on all models. I suspect these are mobile phone speakers designed for voice clarity. What this means is kids will likely crank up the volume so that they can hear some of the lower frequencies. Since the physical size of the speakers prohibits any frequencies below 350 HZ, as they try to get a decent bandwidth, they will get the "membrane-against-the-casing" distortion (which has the merit of making the kids lower the volume but risks killing the speakers if done routinely). Someone on the hardware side really should look at the long term prospects for audio hardware failure and see what correction we can bring, by limiting signal output and/or equalising the output of the AD1888 (we dont know what can be done on chip...) |
|||
The speakers start rolling off at about 600 Hz and are virtually |
|||
worthless below 400 Hz. |
|||
The hardware has a one-pole highpass filter at about 400 Hz (I forget |
|||
the exact frequency but it doesn't matter much) in order to reduce the |
|||
amount of useless LF energy that is presented to the speakers. The |
|||
rolloff is only in the speaker path; the headphone path has flat |
|||
response across the audio band. |
|||
In my experience, equalization doesn't improve the sound from the |
|||
speakers very much. They sound tinny and weak no matter what you do. |
|||
Taming the big peak in the 4 Khz range is of some value, but most |
|||
program material has little information in that region, so the perceived |
|||
improvement is small. Boosting the bass makes things worse - the |
|||
speakers don't have enough air-moving capacity (cone diameter times |
|||
linear motion range) to render low frequencies, and sending them more |
|||
signal just slams the mechanical structure against its physical limits, |
|||
causing distortion and possible damage. |
|||
In listening to podcasts, certainly headphones sound better. |
|||
== Random bits == |
|||
=== DTMF === |
|||
For your collective interest, the speakers can reproduce DTMF tones |
|||
reliably provided the levels are set down from maximum. |
|||
At lunch today on a B2 with build-debian, the dtmfdial package was used |
|||
to transmit tones over a ham radio for making an IRLP request. The DTMF |
|||
tones include 697 Hz for the top row. |
|||
=== Overcoming lack of Tonic === |
|||
Music activities should thus default to a bassoon. |
|||
The odd thing about a bassoon is that the fundamental |
|||
frequency is nearly absent. The ear-brain system fills |
|||
in this frequency, making the bassoon sound very low |
|||
pitched without actually containing much of the very low |
|||
frequencies. |
|||
At the other extreme, a sine wave is worst case. |
|||
Recorders produce this, and flutes nearly do. |
|||
== See also == |
== See also == |
||
Line 11: | Line 147: | ||
*[[Video]] |
*[[Video]] |
||
*[[Game development#Sound]] |
*[[Game development#Sound]] |
||
*[[GStreamer]] |
|||
*[[Programming_the_camera#GStreamer_101]] |
|||
[[Category:Software]] |
|||
[[Category:Subsystem]] |
|||
[[Category:Cleanup]] |
[[Category:Cleanup]] |
||
[[Category:Audio]] |
Latest revision as of 00:40, 6 July 2010
- See also the OLPC XO startup sound by the Edge.
- We also have a popular collection of free sound samples.
Software overview
Starting from user activities and working down...
Sound playback
You play back sound files in Browse. Vorbis and Speex are our preferred codecs for general audio and human speech. These compressed representations of audio are stored in files using the Ogg container file format. Ogg files generally have the extension .ogg.
The Totem browser plugin handles .ogg files. It uses the GStreamer multimedia framework to unpack Ogg containers and decode audio and video streams.
Sound generation and music activities
You create sounds in the TamTam suite of music and sound activities. Or you can try building your own sound/music activities using Csndsugui. OLPC activities use the Csound music and audio signal processing language as their audio "engine". Csound can play Ogg Vorbis files through specialised opcodes. However it is envisaged that the next version of libsndfile (which Csound uses) will allow for a more transparent and flexible IO to this file formats.
Sound recording
The Record activity also uses the GStreamer multimedia framework to record video and audio from the XO's camera and microphone.
Low-level
The XO's low-level sound API and kernel drivers are ALSA (wikipedia:Advanced Linux Sound Architecture). Some useful commands for audio and MIDI status are
$ cat /proc/asound/cards
lists your installed sound cards. This can be useful if you want to use USB cards: if the card was recognised, it will be on the list. If you want to set the volume, mute or control the sound device you can use the command
$ alsamixer
MIDI
For MIDI, you can use
$ amidi -l
to list existing MIDI devices. A number of USB MIDI controllers are known to work with the XO: Behringer UMX25; Evolution e-keys mini keyboard; M-Audio Trigger Finger, Oxygen 02 and later versions of Oxygen8; in general, all USB (MIDI) that are standards-compliant. The amidi command can be used for sending and receiving MIDI data to/from a device as well. Similarly, the aconnect command can be used to list and connect MIDI ports.
By loading up the virmidi module, it is possible to connect software MIDI ports together (eg. from a MIDI data source to Csound for playback):
$ modprobe snd-virmidi
This will give you four midi patchcords, which can be found by listing them as above
Other sound software
Some Linux software packages require other sound software such as Jack (another audio server) and portaudio (an audio library used to access the audio device).
Csound can use Jack and as of August 2008 (see next section below), since the software is currently included in joyride builds. Although Csound can also use portaudio, this is unlikely to be used, as its direct ALSA support has better performance.
Programs can use the snd-pcm-oss module, which emulates /dev/dsp. This is currently not loaded by default, but can easily be done by the command
$ modprobe snd-pcm-oss
If this is working, then it's possible to test it using
$ cat /dev/dsp > /dev/dsp
Future possibilities
Given that ALSA supports only single access to the audio device, it is important that activities and programs share it accordingly. This means, for example, releasing the device when out of focus. With the adoption of an audio server, this limitation would disappear.
We are looking into using PulseAudio sound server in the future (release 9.1.0 or beyond), which has transparency across the network. Csound has an IO module to connect to it, which would allow a simpler migration path to this system.
Some tests have been carried out with Jack as a server and csound as a client. On one terminal, start jack
$jackd -P 99 -d alsa -r 22050
then in a different terminal (or from an activity) run Csound (you will need the latest olpcsound package [1], > 5.08.92.11), using -+rtaudio=jack and -odac:system:playback_
$csound -odac:system:playback_ -m0 -+rtaudio=jack mymusic.csd
it is important that the SR used in Csound agrees with the one in Jack (22050 in this case). If your csound code is not too demanding, you can open another terminal and start another csound process playing something else. Jack mixes the incoming signals. You can also pump audio from one process to another.
Hardware
See Audio hardware
Speaker Capabilities
(most data from the leader of the TamTam project Ethrop who might have more data on the subject)
The speakers in the XO are from and for cellphone speakers. They are optimized for voice, and have less quality frequency response at the low end of the spectrum. The XO speakers have a severely biased frequency response. We have recently performed a thorough analysis of the audio response curve of the machine and there is a spectacular 12dB peak between 3000 and 4500 Hz, this on all models. I suspect these are mobile phone speakers designed for voice clarity. What this means is kids will likely crank up the volume so that they can hear some of the lower frequencies. Since the physical size of the speakers prohibits any frequencies below 350 HZ, as they try to get a decent bandwidth, they will get the "membrane-against-the-casing" distortion (which has the merit of making the kids lower the volume but risks killing the speakers if done routinely). Someone on the hardware side really should look at the long term prospects for audio hardware failure and see what correction we can bring, by limiting signal output and/or equalising the output of the AD1888 (we dont know what can be done on chip...)
The speakers start rolling off at about 600 Hz and are virtually worthless below 400 Hz.
The hardware has a one-pole highpass filter at about 400 Hz (I forget the exact frequency but it doesn't matter much) in order to reduce the amount of useless LF energy that is presented to the speakers. The rolloff is only in the speaker path; the headphone path has flat response across the audio band.
In my experience, equalization doesn't improve the sound from the speakers very much. They sound tinny and weak no matter what you do. Taming the big peak in the 4 Khz range is of some value, but most program material has little information in that region, so the perceived improvement is small. Boosting the bass makes things worse - the speakers don't have enough air-moving capacity (cone diameter times linear motion range) to render low frequencies, and sending them more signal just slams the mechanical structure against its physical limits, causing distortion and possible damage.
In listening to podcasts, certainly headphones sound better.
Random bits
DTMF
For your collective interest, the speakers can reproduce DTMF tones reliably provided the levels are set down from maximum.
At lunch today on a B2 with build-debian, the dtmfdial package was used to transmit tones over a ham radio for making an IRLP request. The DTMF tones include 697 Hz for the top row.
Overcoming lack of Tonic
Music activities should thus default to a bassoon.
The odd thing about a bassoon is that the fundamental frequency is nearly absent. The ear-brain system fills in this frequency, making the bassoon sound very low pitched without actually containing much of the very low frequencies.
At the other extreme, a sine wave is worst case. Recorders produce this, and flutes nearly do.