Localization: Difference between revisions

From OLPC
Jump to navigation Jump to search
 
(105 intermediate revisions by 43 users not shown)
Line 1: Line 1:
{{OLPC}}
{{OLPC}}
{{Translations}}{{TOCright}}
{{Translations}}
{{L10n-sl-migration}}
Internationalization technology is the technology for representing and composing the languages spoken, taught or used in your countries. Localization is the process of taking software or content and adapting it for local use.
{{l10n-nav}}


Localization involves fonts, script layout, input methods, speech synthesis, musical instrumentation, collating order, number & date formats, dictionaries, and spelling checkers, among other issues.
'''Localization''' (l10n)* is the process of taking software or content and adapting it for local use. It involves fonts, script layout, input methods, speech synthesis, musical instrumentation, collating order, number & date formats, dictionaries, and spellcheckers.


Localization is the process not just of translation to a local language, but of adapting content to other local requirements, whether of law, culture, or custom. In order to localize software, it must first be internationalized. That is, any assumptions derived from the language, culture, and customs of the developers must be removed. Much content can be written in a neutral international manner, but there are specific items that must be programmed so that the local equivalents can be easily substituted.
Linux is already more widely localized than Microsoft Windows since no cooperation from a vendor is required to do so: having said this, cooperation with the free software and content community is vital to reduce overall work required.


We need translators in many languages, including local scripts and dialects. At the moment, the laptop is 100% English, 98% Spanish, 96% German, 95% French, 65% Japanese, and 56% Portuguese. Many other languages are 5% done at best. Translating is fun, quite easy and the rewards are great: here's how you can get started.
The size of the problem is huge. [http://www.ethnologue.org/ethno_docs/distribution.asp?by=size Ethnologue] has extensive information on the languages of the world.


You don't have to wait until XOs are announced for your country to start localizing GNU/Linux, Sugar, and Activities into your languages. Remember you can create a [[LiveCd|Live CD]] in your language to run on any x86 computer, including Macs. If you support a language well, you also support people learning your language.
: See also [http://en.wikipedia.org/wiki/Internationalization_and_localization Wikipedia's definition]


Programmers in the US and Europe for decades were able to assume 1 character = 1 byte, but this is no longer the case. The most common character encoding today is the variable-length UTF-8 form of Unicode. We cannot assume that money is in US dollars. We cannot assume that people have family names, or area codes, or ZIP codes. Even when people have family names, the family name is not always the last name. And so on.
This is an outline of (some of) the core topics and tools, and issues of localization.


OLPC got caught out on this with the first batch of prototypes delivered to Spanish-speaking students. It turns out that the initial login text field was programmed to accept only 7-bit ASCII, so the children with accents in their names were not able to enter them correctly. This in spite of the presence of 8-bit Latin-1 letters on the keytops.
''(If you need to localize the keyboard symbols for a laptop issued during the developer-phase of our program, please refer to the instructions found on the [[Customizing NAND images#Keyboard]] page of the wiki.)''

: * l10n and i18n are abbreviations for the terms ''localization'' and ''internationalization'', where 10 or 18 stands for the number of letters between the first and last letters of the term, respectively. i18n was coined at Digital Equipment Corporation in the 1970s or 80s[http://www.w3.org/2001/12/Glossary#I18N].

{{TOCright}}

== Why it matters ==

OLPC's target population of schoolchildren live in nearly 200 countries, where more than 6,000 languages are spoken. The 100 most common languages would suffice for reaching 99%+ of children in first or second languages, but hundreds more would be needed for education in traditional cultures and languages of large populations. OLPC can play a large part in recording and preserving thousands more languages before they would otherwise disappear forever.

== Getting started ==
You don't need to be a geek or hacker to help translate. You need to know two languages, and be familiar with computers. It helps if you try out Sugar, and it helps if you can think like a child.

To start with, there are user interfaces: [[Localization#Sugar i18n|Sugar]] and other [[XO l10n|XO Activities]].
If you're a bit more technical: before a program can be translated it needs to be prepared by doing [[Python i18n|Python internationalization]] (for sugar or activities).

Read [http://en.wikipedia.org/wiki/Internationalization_and_localization Wikipedia's definition] and [http://www.ethnologue.org/ethno_docs/distribution.asp?by=size Ethnologue's language list]

In addition to reading this page, you can join the [http://lists.laptop.org/listinfo/localization localization mailing list] and find us on the <tt>#olpc-content</tt> [[IRC]] channel and the Library [[Mailing lists|mailing list]].

To start a new language project on Pootle, the person volunteering to be the project administrator should first be registered on

* This Wiki, preferably with a User: page
* The [http://wiki.sugarlabs.org/ Sugar Labs Wiki], also with a User: page
* The Sugar Labs [http://translate.sugarlabs.org/ Pootle localization server]
* The Sugar Labs [http://bugs.sugarlabs.org/ Trac] bug tracker
* The OLPC Localization [http://lists.laptop.org/ mailing list]
* The Sugar Labs It's An Education Project (IAEP) [http://lists.sugarlabs.org mailing list]
* The OLPC Pootle Wiki page [http://wiki.laptop.org/go/Pootle#Sign-up sign-up section].

Localizers should also do the same once a project for their language has been started.

Anyone working on Localization for Etoys should subscribe to the [http://lists.laptop.org/listinfo/etoys Etoys mailing list].

Then the administrator can open a [http://dev.laptop.org/newticket ticket] on Trac and provide the following information:

* Language and country in the ticket title
* Component: Localization
* Who else is volunteering
* Data on the language
* Why this project is starting, which may be that shipments to that community are being scheduled, or just that the community wants it for its own use.

Finally, for script experts, there are [[:Category:Keyboards|keyboards]] and guides for customizing your own [[Customizing NAND images#Keyboard|keyboard language]] if it is not already there. (NB: on recent developer (Joyride) builds you can use the [[Sugar_Control_Panel|sugar-control-panel]] to set language. Otherwise, use setxkbmap at the command line in the Terminal.)

===Recruiting localizers===

For any given target language in wide use, there are likely to be a variety of organized groups interested in helping to get software into their languages. The same goes for [[Content]]. Here are some of the places to start.

* [http://translate.fedoraproject.org/languages/ Fedora Languages]
* [http://mdk.jack.kiev.ua/stats/gui/trunk/team/ Mandriva Localization].
* Ministries of Education
* Local OLPC groups, which you can find through the Grassroots [http://lists.laptop.org/ mailing list] and on the Wiki.
* [[GNU/Linux User Groups]]
* [[NGOs]]
* Churches, mosques, synagogues, temples...
* Teachers organizations in the country and [http://www.teacherswithoutborders.org/ Teachers Without Borders]
* Colleges and universities
* Diaspora groups around the world
* Language students and teachers elsewhere

You should also talk to [[User:Mokurai]], who is recruiting language project administrators and localizers.

In addition to localizing material ourselves, we want to find dictionaries, repositories of literature and other content in the language, sources for textbooks, and so on.

Let us pick a language, say [[Khmer]] for [[Cambodia]], which had more than 10,000 XOs committed through [[G1G1]], but no [[Pootle]] project as of 2008-2-24, and see what we can find. Ethnologue, Google and Wikipedia are your friends, as are the social networks, but first things first. So [[User:Mokurai]] created a [http://dev.laptop.org/ticket/6565 ticket for Khmer], following the instructions above, and [[User:Sayamindu|Sayamindu]] expeditiously created the [http://dev.laptop.org/translate/km Khmer project] on [[Pootle]].

* [http://www.ethnologue.org/show_language.asp?code=khm The Ethnologue entry for Khmer] says that there are about 13 million speakers of Central Khmer, and lists several countries with large Cambodian immigrant populations. "Also spoken in Canada, China, France, Laos, USA, Viet Nam."
* Ethnologue also gives a link to an [http://www.ethnologue.org/show_work.asp?id=21417 English-Khmer medical dictionary].
* Google finds about 277,000 hits on a search for '''khmer dictionary'''. There is even a [http://www.khmeros.info/drupal/?q=en/node/2164 Khmer computer dictionary].
* The Cambodian [http://www.moeys.gov.kh/ Ministry of Education, Youth and Sport] has a plan in development for computers in schools, centered on [http://www.khmeros.info/ KhmerOS]. The Web site is in Khmer, but unfortunately not in [[Unicode]].
* Although there is an [[OLPC Cambodia]] page, no Cambodians are active on it.
* A search for GNU/Linux User Groups in Cambodia turns up [http://www.forum.org.kh/ Open Forum of Cambodia], "Building Cambodia through Information Technology", and the [http://www.khmeros.info/ KhmerOS] project to create a version of SUSE Linux localized into Cambodian. We can mine their localization for ours, and invite their people to work with us. And of course, whatever we contribute upstream in Khmer will be available to them, or we can contribute to KhmerOS directly.
* The place to look for NGOs is [http://www.wiserearth.org/ Wiser Earth], which lists well over 100,000 NGOs worldwide for every purpose. This is left as an exercise for the reader.
* A search on [http://www.linkedin.com/ LinkedIn] turns up more than 500 people with links to Cambodia, including a number of Cambodians. LinkedIn lets you post questions to your network, so we can ask for help with our Khmer project.
* Most Cambodians are Theravada Buddhists, although evangelical Christians active in refugee and reconstruction work are making converts. The Buddhist scriptures in Pali language, Khmer script, and Unicode encoding are available on [http://www.vri.dhamma.org/publications/tpupdate.html CD-ROM] and [http://www.tipitaka.org/khmr/ online]. There are Cambodian Buddhist organizations in the US, such as [http://www.wattkhmer.org/ WattKhmer] — San Jose <nowiki>[CA]</nowiki> Cambodian Buddist Society, Inc.
* A cursory search did not turn up any Web sites for teachers based in Cambodia, but the [http://www.teachersacrossborders.org/Cambodia2008.htm Teachers Across Borders Cambodia Project] has the information.
* [http://www.culturalprofiles.net/Cambodia/Directories/Cambodia_Cultural_Profile/-36.html CulturalProfiles.net] has a summary article on education in Cambodia.

Royal University of Fine Arts (reopened 1980), the Institute
of Technology of Cambodia (1981, formerly the Higher Technical
Institute of Khmer-Soviet Friendship), the Royal University
of Agriculture (1984, formerly the Institute of Agricultural
Engineering), the Royal University of Phnom Penh (1988-1996,
now incorporating Faculties of Pedagogy, Law and Economic
Sciences, Medicine, Pharmacy and Dentistry and Business) and
the Vedic Maharashi Royal University in Prey Veng Province
(1993). In 1995 the Royal School of Administration was
re-established under the control of the Council of Ministers.
Cambodia still has a low participation rate in higher education,
with just 1.2 per cent of the population enrolled, compared
with an average of 20.7 per cent in all the ASEAN countries.

* There are a million or so Cambodians outside Cambodia, mostly refugees from the Khmer Rouge regime. [http://www.searac.org/cambref.html Southeast Asian Refugee Action Council] says, "The largest communities of Cambodian refugees are located in Long Beach, California, and in Lowell, Massachusetts. Sizeable communities also exist in Washington and several other states." SEARAC also provides links to [http://www.searac.org/stats.html statistical data] from other organizations and to a large number of [http://www.searac.org/resource.html NGOs].
* The Khmer language is taught in a few universities and in government training institutions for military and diplomatic purposes. [http://www.khmerstudies.org/ Center for Khmer Studies] has several directories. (Use the [http://www.khmerstudies.org/site%20map.htm site map] for navigation. It is impossible to find many of their resources through the menus and links otherwise.)

Well. That's just a start, but you can see that a moderate amount of work can give you plenty to begin with. More than you can handle, in fact. The next step, therefore, is to contact organizations that have prior contacts with others in the community, so that they can put the word out and invite people to join us. Tell them about, and invite them to participate in,

* The software localization program, described on this page, and the [[activities]] to be localized.
* The laptop project's goals http://laptop.org/
* The program for the target countries where the language is spoken. For example, [[OLPC Cambodia]], [[Khmer]], and the localization project on [http://dev.laptop.org/translate/km Pootle] (You did get one created, didn't you?)
* The content [[Translators|translation program]], for which there is currently no organization.
* [[Translating]] this Wiki
* The [[Communication channels]] page, with links to
** [http://lists.laptop.org/ mailing lists], including [http://lists.laptop.org/listinfo/devel devel] for localization and [http://lists.laptop.org/listinfo/library library] for content.
** The [http://wiki.laptop.org/go/IRC#IRC IRC channels] on Freenode, including olpc-devel for localization and olpc-library for content.
** [http://wiki.laptop.org/go/IRC#Forums Forums], or possibly Fora.
** [http://wiki.laptop.org/go/IRC#Blogs Blogs]

Make a page for the country, if necessary, and the language, if necessary, and get someone to keep them updated. Make pages for the groups you recruit to the cause, and get them to fill in more information about themselves. Let us know how you are doing via the mailing lists.

Then have at it, and remember to prod people to invite more people from time to time, for this and other education projects.

Now we just need to make a [[Content templates |template]] for all of this, gather the information, and send out invitations for all of the languages in Pootle:

[[Afrikaans]], [[Amharic]], [[Arabic]], [[Aymara]], [[Basque]], [[Bengali]], [[Bengali (India)]], [[Bulgarian]], [[Catalan]], [[Chinese (China)]], [[Chinese (Hong Kong)]], [[Chinese (Taiwan)]], [[Czech]], [[Danish]], [[Dari]], [[Dutch]], [[Dzongkha]], [[English]], [[English (South African)]], [[English (US)]], [[Finnish]], [[French]], [[Friulian]], [[Fula]], [[Galician]], [[Georgian]], [[German]], [[Greek]], [[Gujarati]], [[Hausa]], [[Hindi]], [[Icelandic]], [[Igbo]], [[Italian]], [[Japanese]], [[Khmer]], [[Kinyarwanda]], [[Korean]], [[Kreyol]], [[Macedonian]], [[Malayalam]], [[Maltese]], [[Marathi]], [[Mongolian]], [[Nepali]], [[Pashto]], [[Persian]], [[Polish]], [[Portuguese]], [[Portuguese (Brazil)]], [[Punjabi]], [[Quechua]], [[Romanian]], [[Russian]], [[Serbian]], [[Sinhala]], [[Slovenian]], [[Sotho]], [[Spanish]], [[Swedish]], [[Tamil]], [[Telugu]],[[Thai]], [[Turkish]], [[Ukrainian]], [[Urdu]], [[Vietnamese]], [[Wolof]], [[Yoruba]]

and a few others that we know we will need for current target countries, such as Mongolian (Traditional), Hazaragi and Aimaq for Afghanistan, Tigrinya for Ethiopia, and so on, and then the principal languages of any further countries that buy in or receive large donations.

== Internationalization (i18n) ==

Preparing software so that it can be localized

To help others localize bundles and code efficiently, they need to be prepared so that anything which might need localization (strings, images, sounds) is separated out and organized for translators and localizers. This is '''internationalization''' (or <tt>i18n</tt>).

There are specific scripts and tools that help represent and compose the languages spoken, taught or used in various countries: these are internationalization tools.

Issues:

**Cultural and national neutrality
**Unicode
**Writing directions
**Stretching and shrinking of text
**Locales: Formats for numbers, times, dates, currency, names, addresses, phone numbers
**File names
**Grammar issues: gender, number, phrasing
**Punctuation
**Style and usage
**Switching languages in activities
**Mixing languages in documents


== Translation and pootle==
=== Sugar and core activities ===

The basic procedure to translate activities is to [[Pootle#Sign-up|'''sign up''']],
enter the https://translate.sugarlabs.org [[Pootle]] server and work in the available projects. Back in 2007 these included:
* [https://dev.laptop.org/translate/projects/xo_core/ XO-Core] &mdash; activities or components that are central to XO
* [https://dev.laptop.org/translate/projects/xo_bundled/ XO-Bundled] &mdash; activities that are currently being bundled or included in the builds
* [https://dev.laptop.org/translate/projects/packaging/ Packaging] &mdash; other material that needs to be localized
* [https://dev.laptop.org/translate/projects/terminology/ Terminology] &mdash; support translation glossary

[[Translators]] basically have two ways to participate:
* [[Pootle#Opportunistic translator|suggest translations]] &mdash; intended for the casual translator (ie: typo fixer/reporter), or
* [[Pootle#Registered translator|make translations]] &mdash; for those translators that are willing to [[Pootle#Register as a translator|register]] as such.
Other, more [[Pootle#Advanced User Scenarios|committed roles]] are possible, including the ability to make off-line translations with whatever tools you are used to, but that needs to be coordinated with the people in charge.

If you are not already subscribed to <tt>localization@lists.laptop.org</tt> we encourage you to do so.

To add a new language, please file a request in the [http://dev.laptop.org trac system] under the component localization.

See also:
* For more detailed information on the functionality of the translation server and its usage, '''see [[Pootle]]'''.
* For a list of the language teams / administrators '''see [[Pootle#Sign-up]]'''.

==[[Languages]] of [[G1G1]] Target [[Countries]]==

* [[Haiti]]: [[Kreyol Ayisyen]], [[French]]
* [[Rwanda]]: [[Kinyarwanda]], [[French]]
* [[Ethiopia]]: [[Amharic]], [[Tigrinya]], Oromo, Sidamo, Somali
* [[Cambodia]]: [[Khmer]]
* [[Afghanistan]]: [[Dari]] (Eastern [[Farsi]]), [[Pashto]], possibly [[Hazaragi]] and [[Aimaq]]
* [[Uruguay]]: [[Spanish]]

==Other Languages==

Translation project have begun in Afrikaans, Amharic, Arabic, Aymara, Basque, Bengali, Catalan, Chinese (China), Chinese (Hong Kong), Chinese (Taiwan), Czech, Danish, Dutch, English, English (South African), English (US), Finnish, Friulian, Galician, Georgian, German, Greek, Hausa, Hindi, Icelandic, Igbo, Italian, Japanese, Korean, Macedonian, Maori, Maltese, Nepali, Persian, Polish, Portuguese, Portuguese (Brazil), Quechua, Romanian, Russian, Samoan, Serbian, Slovenian, Sotho, Swedish, Thai, Tongan Turkish, Ukrainian, Urdu, Vietnamese, Wolof, Yoruba, pseudo L10n

==Support for Language Learning==

Having an alternate GUI language on an XO is an excellent way to get used to a language, particularly if you can refer to Pootle or to a printout as a bilingual reference. Or even two XOs side by side in the two languages. Making a language part of your daily routine imprints it on your brain in a way that no amount of class time or formal practice can do.

Having friends to talk to in the language is of course the best way of all to learn it, and look! the XO lets you do that, too, all over the world.

==[[Localization]] process==

*po files
*[[POT]] file template
*builds
*Pootle repository for OLPC localization. This was at dev.laptop.org/translate, but moved to http://translate.sugarlabs.org in early 2009.
**[http://translate.sugarlabs.org Localization Web site]
**[http://translate.sugarlabs.org/doc/en/howto.html HowTo]
*community
** [http://lists.laptop.org/listinfo/localization Localization mailing list]
** #olpc-pootle IRC channel
*Localizing your own software
**[[Python i18n]]
**[[Etoys]] projects

=== Keyboarding in your language ===

What good is seeing the interface in a particular language if your keyboard is in another?

* Use My Settings to set keyboard preferences.

* Terminal commands for keyboard used with specific languages are at [[Keyboard layouts]].

* For more technical details, see [[Customizing NAND images#Keyboard]] on how to configure the keyboard.

=== Testing your localization ===
Translating is a pleasure when you can check the results right after you finish. See [[Localization/Testing]] on how to test the localization on virtual XO.


== Basic Localization Topics ==
== Basic Localization Topics ==
Line 27: Line 239:
=== Script Layout ===
=== Script Layout ===


OLPC uses the [http://www.pango.org/ Pango library], which is bable to layout most of the “hard” languages, including: Arabic, the Indic languages, Hebrew, Persian, Thai, etc. It has a modular puggable layout engine and supports vertical text, as well as supporting bi-directional layout. Overall, some issues remain – but overall Pango is in can handle most scripts already; if it cannot, modules can be built to handle new scripts as documented in [http://developer.gnome.org/doc/API/2.0/pango/ Pango's reference manual].
OLPC uses the [http://www.pango.org/ Pango library], which is able to layout most of the “hard” languages, including: Arabic, the Indic languages, Hebrew, Persian, Thai, etc. It has a modular pluggable layout engine and supports vertical text, as well as supporting bi-directional layout. Overall, some issues remain – but overall Pango can handle most scripts already; if it cannot, modules can be built to handle new scripts as documented in [http://developer.gnome.org/doc/API/2.0/pango/ Pango's reference manual].


: See also: [[:Category:Languages (international)]]
: See also: [[:Category:Languages (international)]]
Line 37: Line 249:
The formats of fonts supported on Linux include [http://en.wikipedia.org/wiki/OpenType OpenType], [http://en.wikipedia.org/wiki/TrueType TrueType] and many others: see [http://www.freetype.org/ Freetype] for details. Most of the font formats supported by Freetype are obsolete, and by far the best results on the screen will be had from OpenType and TrueType format fonts, particularly if they are hinted well. [http://en.wikipedia.org/wiki/Type_1_font Type 1 fonts] are useful primarily for printing; the renderer for Type1 fonts in Freetype we have today is not very good, and Type 1 does not support programmatic hinting for low resolution screens.
The formats of fonts supported on Linux include [http://en.wikipedia.org/wiki/OpenType OpenType], [http://en.wikipedia.org/wiki/TrueType TrueType] and many others: see [http://www.freetype.org/ Freetype] for details. Most of the font formats supported by Freetype are obsolete, and by far the best results on the screen will be had from OpenType and TrueType format fonts, particularly if they are hinted well. [http://en.wikipedia.org/wiki/Type_1_font Type 1 fonts] are useful primarily for printing; the renderer for Type1 fonts in Freetype we have today is not very good, and Type 1 does not support programmatic hinting for low resolution screens.


The OLPC XO-1 has a high resolution screen. High resolution helps OLPC considerably, particularly in grayscale mode at 200DPI. [[http://en.wikipedia.org/wiki/Free_software_Unicode_fonts Wikipedia] as usual, is a starting point for free fonts. "Font foundries" are companies who will contract to produce fonts.
The OLPC XO-1 has a high resolution screen. High resolution helps OLPC considerably, particularly in grayscale mode at 200DPI. [http://en.wikipedia.org/wiki/Free_software_Unicode_fonts Wikipedia] as usual, is a starting point for free fonts. "Font foundries" are companies who will contract to produce fonts.


: See also: [[:Category:Fonts]], [[Fonts]], [[OLPC Human Interface Guidelines/The Sugar Interface/Text and Fonts|HIG-The Sugar Interface/Text and Fonts]]
: See also: [[:Category:Fonts]], [[Fonts]], [[OLPC Human Interface Guidelines/The Sugar Interface/Text and Fonts|HIG-The Sugar Interface/Text and Fonts]]
Line 75: Line 287:
=== Speech Synthesis ===
=== Speech Synthesis ===


Speech synthesis has a set of complex tradoffs of synthesizer size versus fidelity versus effort to localize a new languag. The [http://en.wikipedia.org/wiki/Speech_synthesis Wikipedia speech synthesis] article discusses software that is available, which includes [http://www.cstr.ed.ac.uk/projects/festival/ festival], [http://www.speech.cs.cmu.edu/flite/ flite], and [http://espeak.sourceforge.net/ espeak].
Speech synthesis has a set of complex tradoffs of synthesizer size versus fidelity versus effort to localize a new language.
See [[Speech synthesis]].

[http://sourceforge.net/projects/espeak/ Espeak] is small enough for us to often bundle and covers quite a few languages: ~10 languages currently supported tuned by native speakers. Localization to ten more languages is underway.

Synthesis is essential for accessibility to content by people with vision problems, and will need to be integrated with the [http://developer.gnome.org/projects/gap/ ATK library] used, as well as literacy training, other uses as part of a GUI. Full localization therefore involves selection of a suitable synthesis system and integration into the ATK framework, along with localization of that system for the particular language involved.

Speech synthesis is usually not a good guide for pronunciation – but it may be better than a poor teacher who has never had the opportunity to learn from a native speaker of that language.


: See also [[:Category:Accessibility]]
: See also [[:Category:Accessibility]]
Line 95: Line 302:
: See also [[TamTam: Sounds]]
: See also [[TamTam: Sounds]]


=== Dictionaries, Spelling Checkers, Thesaurus ===
=== Dictionaries and Spellcheckers ===


There is existing support for most major languages.
There is existing support for most major languages.
Line 106: Line 313:
* [http://wiki.services.openoffice.org/wiki/Dictionaries Open Office]
* [http://wiki.services.openoffice.org/wiki/Dictionaries Open Office]


Of these, the first three are most immediately interesting to OLPC, as we use versions of these codebases as part of the Sugar environment.
Of these, the first three are most immediately interesting to OLPC: we use versions of these codebases as part of the Sugar environment.


=== Character Recognition ===
=== Character Recognition ===
Line 148: Line 355:
== Next Steps ==
== Next Steps ==


Localization is by nature local: but languages often crosses borders. Please contact [[User:Jg|Jim Gettys]] to identify issues.
Localization is by nature local: but languages often cross borders. Please contact [[User:Jg|Jim Gettys]] and [[User:Mokurai|Mokurai]] to identify issues.


We need identified people/organizations responsible for language, translation, keyboards, speech synthesis, an effective free software community leaders to help with local deployment and "on the ground" knowledge.
We need to identify people/organizations responsible for language, translation, keyboards, and speech synthesis, as well as effective free software community leaders to help with local deployment and "on the ground" knowledge.


=== Sugar Localization ===
=== Sugar Localization ===


Sugar and sugar applications use standard .po files, and can be localized using the usual [[#Tools|tools]].
Sugar and Sugar applications use standard .po files, and can be localized using the usual [[#Tools|tools]]. [[Sugar_18n]] goes into the details of the localization process.


=== General Linux Localization ===
=== General GNU/Linux Localization ===


By looking at the [http://www.gnome.org/i18n/ gnome], [http://www.mozilla.org/projects/l10n/mlp.html mozilla], [http://contributing.openoffice.org/native-lang.html OpenOffice], [http://l10n.kde.org/ KDE] projects, you can get plugged into translating other Linux software of general interest.
By looking at the [http://www.gnome.org/i18n/ gnome], [http://www.mozilla.org/projects/l10n/mlp.html mozilla], [http://contributing.openoffice.org/native-lang.html OpenOffice], [http://l10n.kde.org/ KDE] projects, you can get plugged into translating other GNU/Linux software of general interest.


=== Localization within Python/Pygame ===
=== Localization of [[Python]] ===


See [[Python i18n]] for details and a step-by-step example.
Following the wxpython tutorial below, I added the following code at the top of my application:

<pre>
import gettext
gettext.install('kuku', './locale', unicode=False)

#one line for each language
presLan_en = gettext.translation("kuku", os.path.join(get_bundle_path(),'locale'), languages=['en'])
presLan_sw = gettext.translation("kuku", os.path.join(get_bundle_path(),'locale'), languages=['sw'])

#only install one language - add program logic later
presLan_en.install()
# presLan_sw.install()
</pre>

Here my application is called kuku.py, and I am using 'kuku' to be the domain of my i18n. Now I choose which strings I needed to localize within my application file kuku.py - these strings I surrounded with
_(). For example

<pre>
message = _('Begin!')
</pre>

Next I need to create the i18n files. First I create a directory called 'locale' within my activity directory (this is referred to in the above lines (presLan_en ...). The first step is to make a pot file, which I use pygettext.py to process kuku.py

<pre>
python <path to your python distribution>/Tools/i18n/pygettext.py -o kuku.pot kuku.py
</pre>

which creates kuku.pot. When first created it looks like

<pre>
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-06-19 17:45+EDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: ENCODING\n"
"Generated-By: pygettext.py 1.5\n"


#: kuku.py:501
msgid "Begin!"
msgstr ""
</pre>

The last little bit is the stuff we have to translate. I had to modify the stuff at the top to change the ENCODING and CHARSET. I changed both of these to utf-8, so my file now reads:

<pre>
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-06-19 17:15+EDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: utf-8\n"
"Generated-By: pygettext.py 1.5\n"


#: kuku.py:500
msgid "Begin!"
msgstr ""
</pre>

Now I moved kuku.pot to ./locale . Then for each language I want to localize to, I create subdirectories within ./locale according to their [http://www.w3.org/WAI/ER/IG/ert/iso639.htm language codes]. Within each of these subdirectories, I create subdirectories called LC_MESSAGES. For know I am using english and swahili, so my directory structure looks like

<pre>
locale/
kuku.pot
en/
LC_MESSAGES/
sw/
LC_MESSAGES/
</pre>

Now we do translations. I copied kuku.pot into ./locale/en/LC_MESSAGES/kuku.po and ./locale/sw/LC_MESSAGES/kuku.po, and performed the translations:

<pre>
#./locale/en/LC_MESSAGES/kuku.po
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-06-19 17:15+EDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: utf-8\n"
"Generated-By: pygettext.py 1.5\n"


#: kuku.py:500
msgid "Begin!"
msgstr "Begin!"
</pre>

<pre>
#./locale/sw/LC_MESSAGES/kuku.po
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-06-19 17:15+EDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: utf-8\n"
"Generated-By: pygettext.py 1.5\n"


#: kuku.py:500
msgid "Begin!"
msgstr "Kuanza!"
</pre>

Now my directory structure looks like

<pre>
locale/
kuku.pot
en/
LC_MESSAGES/
kuku.po
sw/
LC_MESSAGES/
kuku.po
</pre>

One last step before we are ready to go. We need to make the binary files used by gettext. We do that with msgfmt.py:

<pre>
cd <project path>/locale/en/LC_MESSAGES/
python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po
cd <project path>/locale/en/LC_MESSAGES/
python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po
</pre>

This creates binary .mo files, and now my directory structure looks like:

<pre>
locale/
kuku.pot
en/
LC_MESSAGES/
kuku.po
kuku.mo
sw/
LC_MESSAGES/
kuku.po
kuku.mo
</pre>

To add new languages, we need to add a subdirectory for each language, perform the translations, create the .mo files, and add the relevant code in the application to select the language.

==== Resources ====
These are the two docs that I used to learn about i18n (with no prior knowledge). Read the WxPython reference first, and instead of using the mki18n.py file mentioned on the WkPython page, use the tools in the python standard distribution: pygettext.py and msgfmt.py.

[http://docs.python.org/lib/node738.html Python Reference]

[http://wiki.wxpython.org/Internationalization WxPython i18n]


== Current l10n projects ==
== Current l10n projects ==
Line 352: Line 376:


* [[Localization/Library|Library strings]] -- header and descriptive strings for an [http://dev.laptop.org/pub/content/Library/ OLPC sample library]. Includes some ''PO-like'' strings for the following sections:
* [[Localization/Library|Library strings]] -- header and descriptive strings for an [http://dev.laptop.org/pub/content/Library/ OLPC sample library]. Includes some ''PO-like'' strings for the following sections:
** [[Localization/Library/sidebar po|sidebar]] &mdash; en | es | ko | pt
** [[Localization/Library/sidebar po|sidebar]] &mdash; en | es | ko | pt | ar
** [[Localization/Library/biology po|biology]] &mdash; en | es | ko &mdash; '''to review:''' pt
** [[Localization/Library/biology po|biology]] &mdash; en | es | ko &mdash; '''to review:''' pt
** [[Localization/Library/books po|books]] &mdash; en | es | ko &mdash; '''''wanted:''''' pt
** [[Localization/Library/books po|books]] &mdash; en | es | ko &mdash; '''''wanted:''''' pt
Line 363: Line 387:
=== activities ===
=== activities ===


* http://translate.sugarlabs.org shows percent completed for each project for each language.
Add / include links to upstream localization where appropriate.
* See (obsolete?) [[:Category:POT wanted]]
* [[Localization/Library/camera po|camera]] &mdash; en | es | ko | pt | zh-CN
* web?
* read?
* write?
* blockparty?


=== games ===


* [[Kuku]]


=== DansGuardian ===
; See also : [[Translators]] & [[Translating]] for the localization of this wiki.


The [[School_server|school server]] uses [http://dansguardian.org DansGuardian] for web filtering. DansGuardian accomplishes filtering using keyword lists. These lists need to be translated.
== i18n & l10n ==


==See Also==
The following table is focused on the list of languages present in the currently 'green status' countries ({{Status green countries}}). Countries with other 'status' may benefit from efforts for the 'green languages', plus add their own set of languages. Each language must be fully supported for the [[Localization]] effort.


* [[Translators]] & [[Translating]] for the [[localization]] of this wiki.
{| border="1" cellspacing="0"
* [[Languages]] for information about them and how they relate to each country and the [[localization]] effort.
* [[Olpc-utils]] and [[XO_l10n]], which describe various utilities for localization and customization on the XO.
* [[Customizing NAND images]], which describes additional customization features.
* [[Reverse Localization]], which has links to Google translation Gadget in many languages to suggest improving information flow from non-OLPC web-pages about OLPC efforts between wider language communities.


|-
! Language !! Green Countries !! Red Countries !! Orange


[[Category:Countries| ]]
|- valign="top"
[[Category:Language support]]
| [[Arabic]]
[[Category:Languages (international)]]
| [[OLPC Libya|Libya]]
[[category:localization]]
|
[[Category:Subsystems]]
| <font size="-1">Bahrain, [[OLPC Egypt|Egypt]], Iraq ([http://en.wikipedia.org/wiki/Irak +]), [[OLPC Israel|Israel]] ([http://en.wikipedia.org/wiki/Israel +]), Jordan, Kuwait, Lebanon ([http://en.wikipedia.org/wiki/Lebanon#Languages +]), Morocco, Oman, Palestine, Saudi Arabia, Sudan ([http://en.wikipedia.org/wiki/Sudan#Official_languages +]), Syria ([http://en.wikipedia.org/wiki/Syria#Languages +]), Tunisia, Yemen</font>


Why localization is important?
|- valign="top"
| [[English]]
| [[OLPC Nigeria|Nigeria]],<br>[[OLPC Rwanda|Rwanda]],<br>[[OLPC USA|USA]] ([http://en.wikipedia.org/wiki/USA#Languages +])
| <font size="-1">Belize ([http://en.wikipedia.org/wiki/Belize +]), [[OLPC Pakistan|Pakistan]] ([http://en.wikipedia.org/wiki/Pakistan +]), [[OLPC Philippines|Philippines]] ([http://en.wikipedia.org/wiki/Philippines#Languages +])</font>
| <font size="-1">Canada ([http://en.wikipedia.org/wiki/Canada +]), Gambia, Guyana, [[OLPC India|India]] ([http://en.wikipedia.org/wiki/India +]), [[OLPC Kenya|Kenya]] ([http://en.wikipedia.org/wiki/Kenya +]), Mauritius ([http://en.wikipedia.org/wiki/Mauritius +]), Namibia ([http://en.wikipedia.org/wiki/Namibia +]), Saint Kitts and Nevis, Sierra Leone, Singapore ([http://en.wikipedia.org/wiki/Singapore#Languages +]), [[OLPC South Africa|South Africa]] ([http://en.wikipedia.org/wiki/South_Africa#Languages +]), St. Lucia, Trinidad and Tobago, Uganda ([http://en.wikipedia.org/wiki/Uganda +]), Zimbabwe ([http://en.wikipedia.org/wiki/Zimbabwe#Language +])</font>


Because.
|- valign="top"
| [[French]]
| [[OLPC Rwanda|Rwanda]]
| <font size="-1">Haiti ([http://en.wikipedia.org/wiki/Haitian_Creole_language +])</font>
| <font size="-1">[[OLPC Benin|Benin]], Cameroon ([http://en.wikipedia.org/wiki/Cameroon +]), Democratic Republic of the Congo ([http://en.wikipedia.org/wiki/Democratic_Republic_of_the_Congo#Languages +]), Gabon, Mali, Niger, Senegal, St. Martin ([http://en.wikipedia.org/wiki/St._Martin +]), Togo</font>

|- valign="top"
| [[Hausa]]
| [[OLPC Nigeria|Nigeria]]

|- valign="top"
| [[Igbo]]
| [[OLPC Nigeria|Nigeria]]

|- valign="top"
| [[Kinyarwanda]]
| [[OLPC Rwanda|Rwanda]]

|- valign="top"
| [[Portuguese]]
| [[OLPC Brazil|Brazil]]
| <font size="-1">Angola</font>
| <font size="-1">Mozambique, Portugal, São Tomé and Príncipe</font>

|- valign="top"
| [[Spanish]]
| [[OLPC Argentina|Argentina]],<br>[[OLPC Peru|Peru]] ([http://en.wikipedia.org/wiki/Peru +]),<br>[[OLPC Uruguay|Uruguay]],<br>[[OLPC USA|USA]] ([http://en.wikipedia.org/wiki/USA#Languages +])
| <font size="-1">Belize, Costa Rica, Dominican Republic, El Salvador, Guatemala ([http://en.wikipedia.org/wiki/Guatemala#Language +]), Honduras, [[OLPC Mexico|México]] ([http://en.wikipedia.org/wiki/Mexico#Languages +]), Nicaragua, Panamá</font>
| <font size="-1">Bolivia ([http://en.wikipedia.org/wiki/Bolivia +]), [[OLPC Chile|Chile]], [[OLPC Colombia|Colombia]], Cuba, [[OLPC Ecuador|Ecuador]], Paraguay ([http://en.wikipedia.org/wiki/Paraguay +]), Puerto Rico ([http://en.wikipedia.org/wiki/Puerto_Rico#Languages +]), Spain ([http://en.wikipedia.org/wiki/Spain#Languages +]), Venezuela ([http://en.wikipedia.org/wiki/Venezuela +])</font>

|- valign="top"
| [[Thai]]
| [[OLPC Thailand|Thailand]]

|- valign="top"
| [[Yoruba]]
| [[OLPC Nigeria|Nigeria]]

|- valign="top"
| colspan="2" | Other non-green languages
| <font size="-1">[[OLPC Ethiopia|Ethiopia]], Indonesia, [[OLPC Philippines|Philippines]] ([http://en.wikipedia.org/wiki/Philippines#Languages +]), [[OLPC Pakistan|Pakistan]] ([http://en.wikipedia.org/wiki/Pakistan +]), Vietnam</font>
| <font size="-1">Afghanistan, [[OLPC Albania|Albania]], Armenia, Azerbaijan, Bangladesh, [[OLPC Bhutan|Bhutan]] ([http://en.wikipedia.org/wiki/Bhutan +]), Bosnia and Herzegovina, [[OLPC Cambodia|Cambodia]], [[OLPC China|China]] ([http://en.wikipedia.org/wiki/China#Languages +]), Croatia, [[OLPC Cyprus|Cyprus]], Eritrea, Estonia, Georgia, [[OLPC Greece|Greece]], Hungary, Iceland, [[OLPC India|India]] ([http://en.wikipedia.org/wiki/India +]), Iran, Italy, [[OLPC Japan|Japan]], Kyrgyzstan, Latvia, Lithuania, Macedonia, Malaysia, Moldova, [[OLPC Mongolia|Mongolia]], Romania, [[OLPC Russia|Russia]], Slovenia, [[OLPC Korea|South Korea]], [[OLPC Sri Lanka|Sri Lanka]], Tajikistan, Tanzania, Turkey, Ukraine, Uzbekistan, Vatican City</font>

|}


The following table presents on a per country base the target languages that must be considered for the [[Localization]] effort of the countries with 'green status' ({{Status green countries}}).

{| style="text-align:top; "

|- style="background:grey; "
! Country !! Target Languages !! Mayor/important languages !! Minor/relevant languages

|- valign="top"
| [[OLPC Argentina|Argentina]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=AR EthnologueAR]</font>
| [spa] [[Spanish]]
| <font size="-1">[quh] Quechua (0.85M - 2.1%)</font>
| See [[OLPC Argentina/Languages]]

|- valign="top" style="background:lightgrey; "
| [[OLPC Brazil|Brazil]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=BR EthnologueBR]</font>
| [por] [[Portuguese]]
| colspan="2" | ''none reported by [http://www.ethnologue.org/show_country.asp?name=BR Ethnologue BR] above 50,000 speakers.''

|- valign="top"
| [[OLPC Libya|Libya]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=LY EthnologueLY]</font>
| [arb] [[Arabic|Arabic, Standard]]
| <font size="-1">[ayl] Arabic, Libyan Spoken (4.2M - 75%),<br>[jbn] Nafusi (0.14M - 2.5%)</font>
| <font size="-1">[rmt] Domari (0.03M - 0.6%)</font>

|- valign="top" style="background:lightgrey; "
| [[OLPC Nigeria|Nigeria]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=NG EthnologueNG]</font>
| [eng] [[English]],<br>[hau] [[Hausa]]<font size="-1"><br>&mdash;(18.5M - 13.5%)</font>,<br>[yor] [[Yoruba]]<font size="-1"><br>&mdash;(18.9M - 13.8%)</font>
| <font size="-1">[bin] [[Edo]] (1.0M - 0.7%) official,<br>[efi] [[Efik]] (0.4M - 0.3%) official,<br>[fub] [[Adamawa Fulfulde|Fulfulde, Adamawa]] (7.6M - 5.6%) official,<br>[fuv] Fulfulde, Nigerian (1.7M - 1.2%),<br>[ibb] Ibibio (1.5M to 2.0M - 1.0-1.5%),<br>[idu] [[Idoma]] (0.6M - 0.4%) official,<br>[ibo] [[Igbo]] (18.0M - 13.1%) official,<br>[knc] [[Central Kanuri|Kanuri, Central]] (3.0M - 2.2%) official,<br>[tiv] Tiv (2.2M - 1.6%)</font>
| See [[OLPC Nigeria/Languages]]

|- valign="top"
| [[OLPC Peru|Peru]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=PE EthnologueNG]</font>
| [spa] [[Spanish]]
| <font size="-1">''pending''</font>
| See [[OLPC Peru/Languages]]

|- valign="top" style="background:lightgrey; "
| [[OLPC Rwanda|Rwanda]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=RW EthnologueRW]</font>
| [kin] [[Kinyarwanda]],<br>[fra] [[French]],<br>[eng] [[English]]
| <font size="-1">[swh] Swahili (0.01M - 1.3%)
|

|- valign="top"
| [[OLPC Thailand|Thailand]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=TH EthnologueTH]</font>
| [[Thai]] (dialects?)
| <font size="-1">[nan] Chinese, Min Nan (1.1M - 1.7%),<br>[kxm] Khmer, Northern (1.1M - 1.8%),<br>[mfa] Malay, Pattani (3.1M - 4.8%),<br>[tha] Thai (20.2M - 32%),<br>[tts] Thai, Northeastern (15.0M - 23%),<br>[nod] Thai, Northern (6.0M - 9.2%),<br>[sou] Thai, Southern (5.0M - 7.7%)</font>
| <font size="-1">[ksw] Karen, S'gaw (0.3M - 0.5%),<br>[kdt] Kuy (0.3M - 0.5%)</font>

|- valign="top" style="background:lightgrey; "
| [[OLPC Uruguay|Uruguay]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=UY EthnologueUY]</font>
| [spa] [[Spanish]]
| colspan="2" | ''none other reported by [http://www.ethnologue.org/show_country.asp?name=UY Ethnologue UY]''

|- valign="top"
| [[OLPC USA|USA]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=US EthnologueUS]</font>
| [eng] [[English]]
| <font size="-1">[spa] Spanish (22.4M - 7.5%),<br>[___] Polish (3.4M - 1.1%),<br>[deu] German, Standard (6.1M - 2.0%),<br>[___] Arabic (3.0M - 1.0%)</font>
| <font size="-1">[___] Armenian (1.1M - 0.4%),<br>[___] Chinese (1.6M - 0.5%),<br>[___] Czech (1.5M - 0.5%),<br>[___] Eastern Yiddish (1.3M - 0.4%),<br>[___] French (1.1M - 0.4%),<br>[frc] French, Cajun (1.0M - 0.3%),<br>[hwc] Hawai'i Creole English (0.6M - 0.2%),<br>[___] Italian (0.9M - 0.3%),<br>[___] Japanese (0.8M - 0.3%),<br>[___] Korean (1.8M - 0.6%),<br>[___] Philippines (1.4M - 0.5%),<br>[___] Portuguese (1.3M - 0.4%),<br>[___] Swedish (0.6M - 0.2%),<br>[___] Ukrainian (0.8M - 0.3%),<br>[___] Vietnamese (0.9M - 0.3%),<br>[___] Vlax Romani (0.7M - 0.2%),<br>[___] Western Farsi (0.9M - 0.3%)</font>
|}

== Country groups and descriptions ==

* [[OLPC Albania | Albania]]
* [[OLPC Argentina/l10n | Argentina]]
* [[OLPC Austria | Austria]]
* [[OLPC Brazil | Brazil]]
* [[OLPC China | China]]
* [[OLPC Colombia | Colombia]]
* [[OLPC Egypt | Egypt]]
* [[OLPC Ethiopia | Ethiopia]]
* [[OLPC Spain | Spain ]]
* [[OLPC France | France]]
* [[OLPC Germany | Germany]]
* [[OLPC Greece | Greece]]
* [[OLPC India | India]]
* [[OLPC Kenya | Kenya]]
* [[OLPC Korea | 한국 (S.Korea)]]
* [[OLPC Korea | 조선 (N.Korea)]]
* [[OLPC Laos | Laos]]
* [[OLPC Libya | Libya]]
* [[OLPC Nepal | Nepal]]
* [[OLPC Nigeria | Nigeria]]
* [[OLPC Poland | Poland]]
* [http://www.olpc.ro/index.php/Main_Page Romania]
* [[OLPC Russia | Russia]]
* [[OLPC South Africa | South Africa]]
* [[OLPC Sri Lanka | Sri Lanka]]
* [[OLPC Thailand | Thailand]]
* [[OLPC Uruguay | Uruguay]]

== Korean-based Communities ==

[[Image:korea_map.gif|right]]
People using Korean as their native language are those in South Korea (한국인) and North Korea (조선인). Some Chinese and those with other nationalities, living in the Nothern part of Korea also are using Korean as their second language, because of some historical issues. They are called as 고려인(Korea-in) and 조선족 (Chosun-zok or Korean Chinese) respectively.

Currently [[OLPC Korea]] (or [[OLPC Korea|XO Korea]]) is covering all those nations and regions. In a near future, we hope there will be regional XO groups for those.


[[Category:Countries]]
[[Category:Language support]]
[[Category:Languages (international)]]

Latest revision as of 17:57, 17 June 2011

  This page is monitored by the OLPC team.
  english | español |日本語 | 한글 HowTo [ID# 257189]  +/-  
Please note that the old Pootle address of http://dev.laptop.org/translate does not work anymore. Please use http://translate.sugarlabs.org instead.

All data and user accounts should work in the new Pootle installation.
Sugar framework
Python framework
Localizing an XO
Keyboards
Changing language
Translators
Getting started
Website translation
modify 

Localization (l10n)* is the process of taking software or content and adapting it for local use. It involves fonts, script layout, input methods, speech synthesis, musical instrumentation, collating order, number & date formats, dictionaries, and spellcheckers.

Localization is the process not just of translation to a local language, but of adapting content to other local requirements, whether of law, culture, or custom. In order to localize software, it must first be internationalized. That is, any assumptions derived from the language, culture, and customs of the developers must be removed. Much content can be written in a neutral international manner, but there are specific items that must be programmed so that the local equivalents can be easily substituted.

We need translators in many languages, including local scripts and dialects. At the moment, the laptop is 100% English, 98% Spanish, 96% German, 95% French, 65% Japanese, and 56% Portuguese. Many other languages are 5% done at best. Translating is fun, quite easy and the rewards are great: here's how you can get started.

You don't have to wait until XOs are announced for your country to start localizing GNU/Linux, Sugar, and Activities into your languages. Remember you can create a Live CD in your language to run on any x86 computer, including Macs. If you support a language well, you also support people learning your language.

Programmers in the US and Europe for decades were able to assume 1 character = 1 byte, but this is no longer the case. The most common character encoding today is the variable-length UTF-8 form of Unicode. We cannot assume that money is in US dollars. We cannot assume that people have family names, or area codes, or ZIP codes. Even when people have family names, the family name is not always the last name. And so on.

OLPC got caught out on this with the first batch of prototypes delivered to Spanish-speaking students. It turns out that the initial login text field was programmed to accept only 7-bit ASCII, so the children with accents in their names were not able to enter them correctly. This in spite of the presence of 8-bit Latin-1 letters on the keytops.

* l10n and i18n are abbreviations for the terms localization and internationalization, where 10 or 18 stands for the number of letters between the first and last letters of the term, respectively. i18n was coined at Digital Equipment Corporation in the 1970s or 80s[1].

Why it matters

OLPC's target population of schoolchildren live in nearly 200 countries, where more than 6,000 languages are spoken. The 100 most common languages would suffice for reaching 99%+ of children in first or second languages, but hundreds more would be needed for education in traditional cultures and languages of large populations. OLPC can play a large part in recording and preserving thousands more languages before they would otherwise disappear forever.

Getting started

You don't need to be a geek or hacker to help translate. You need to know two languages, and be familiar with computers. It helps if you try out Sugar, and it helps if you can think like a child.

To start with, there are user interfaces: Sugar and other XO Activities.

If you're a bit more technical: before a program can be translated it needs to be prepared by doing Python internationalization (for sugar or activities).

Read Wikipedia's definition and Ethnologue's language list

In addition to reading this page, you can join the localization mailing list and find us on the #olpc-content IRC channel and the Library mailing list.

To start a new language project on Pootle, the person volunteering to be the project administrator should first be registered on

Localizers should also do the same once a project for their language has been started.

Anyone working on Localization for Etoys should subscribe to the Etoys mailing list.

Then the administrator can open a ticket on Trac and provide the following information:

  • Language and country in the ticket title
  • Component: Localization
  • Who else is volunteering
  • Data on the language
  • Why this project is starting, which may be that shipments to that community are being scheduled, or just that the community wants it for its own use.

Finally, for script experts, there are keyboards and guides for customizing your own keyboard language if it is not already there. (NB: on recent developer (Joyride) builds you can use the sugar-control-panel to set language. Otherwise, use setxkbmap at the command line in the Terminal.)

Recruiting localizers

For any given target language in wide use, there are likely to be a variety of organized groups interested in helping to get software into their languages. The same goes for Content. Here are some of the places to start.

You should also talk to User:Mokurai, who is recruiting language project administrators and localizers.

In addition to localizing material ourselves, we want to find dictionaries, repositories of literature and other content in the language, sources for textbooks, and so on.

Let us pick a language, say Khmer for Cambodia, which had more than 10,000 XOs committed through G1G1, but no Pootle project as of 2008-2-24, and see what we can find. Ethnologue, Google and Wikipedia are your friends, as are the social networks, but first things first. So User:Mokurai created a ticket for Khmer, following the instructions above, and Sayamindu expeditiously created the Khmer project on Pootle.

  • The Ethnologue entry for Khmer says that there are about 13 million speakers of Central Khmer, and lists several countries with large Cambodian immigrant populations. "Also spoken in Canada, China, France, Laos, USA, Viet Nam."
  • Ethnologue also gives a link to an English-Khmer medical dictionary.
  • Google finds about 277,000 hits on a search for khmer dictionary. There is even a Khmer computer dictionary.
  • The Cambodian Ministry of Education, Youth and Sport has a plan in development for computers in schools, centered on KhmerOS. The Web site is in Khmer, but unfortunately not in Unicode.
  • Although there is an OLPC Cambodia page, no Cambodians are active on it.
  • A search for GNU/Linux User Groups in Cambodia turns up Open Forum of Cambodia, "Building Cambodia through Information Technology", and the KhmerOS project to create a version of SUSE Linux localized into Cambodian. We can mine their localization for ours, and invite their people to work with us. And of course, whatever we contribute upstream in Khmer will be available to them, or we can contribute to KhmerOS directly.
  • The place to look for NGOs is Wiser Earth, which lists well over 100,000 NGOs worldwide for every purpose. This is left as an exercise for the reader.
  • A search on LinkedIn turns up more than 500 people with links to Cambodia, including a number of Cambodians. LinkedIn lets you post questions to your network, so we can ask for help with our Khmer project.
  • Most Cambodians are Theravada Buddhists, although evangelical Christians active in refugee and reconstruction work are making converts. The Buddhist scriptures in Pali language, Khmer script, and Unicode encoding are available on CD-ROM and online. There are Cambodian Buddhist organizations in the US, such as WattKhmer — San Jose [CA] Cambodian Buddist Society, Inc.
  • A cursory search did not turn up any Web sites for teachers based in Cambodia, but the Teachers Across Borders Cambodia Project has the information.
  • CulturalProfiles.net has a summary article on education in Cambodia.
Royal University of Fine Arts (reopened 1980), the Institute 
of Technology of Cambodia (1981, formerly the Higher Technical 
Institute of Khmer-Soviet Friendship), the Royal University 
of Agriculture (1984, formerly the Institute of Agricultural 
Engineering), the Royal University of Phnom Penh (1988-1996, 
now incorporating Faculties of Pedagogy, Law and Economic 
Sciences, Medicine, Pharmacy and Dentistry and Business) and 
the Vedic Maharashi Royal University in Prey Veng Province 
(1993). In 1995 the Royal School of Administration was 
re-established under the control of the Council of Ministers.

Cambodia still has a low participation rate in higher education, 
with just 1.2 per cent of the population enrolled, compared 
with an average of 20.7 per cent in all the ASEAN countries.
  • There are a million or so Cambodians outside Cambodia, mostly refugees from the Khmer Rouge regime. Southeast Asian Refugee Action Council says, "The largest communities of Cambodian refugees are located in Long Beach, California, and in Lowell, Massachusetts. Sizeable communities also exist in Washington and several other states." SEARAC also provides links to statistical data from other organizations and to a large number of NGOs.
  • The Khmer language is taught in a few universities and in government training institutions for military and diplomatic purposes. Center for Khmer Studies has several directories. (Use the site map for navigation. It is impossible to find many of their resources through the menus and links otherwise.)

Well. That's just a start, but you can see that a moderate amount of work can give you plenty to begin with. More than you can handle, in fact. The next step, therefore, is to contact organizations that have prior contacts with others in the community, so that they can put the word out and invite people to join us. Tell them about, and invite them to participate in,

Make a page for the country, if necessary, and the language, if necessary, and get someone to keep them updated. Make pages for the groups you recruit to the cause, and get them to fill in more information about themselves. Let us know how you are doing via the mailing lists.

Then have at it, and remember to prod people to invite more people from time to time, for this and other education projects.

Now we just need to make a template for all of this, gather the information, and send out invitations for all of the languages in Pootle:

Afrikaans, Amharic, Arabic, Aymara, Basque, Bengali, Bengali (India), Bulgarian, Catalan, Chinese (China), Chinese (Hong Kong), Chinese (Taiwan), Czech, Danish, Dari, Dutch, Dzongkha, English, English (South African), English (US), Finnish, French, Friulian, Fula, Galician, Georgian, German, Greek, Gujarati, Hausa, Hindi, Icelandic, Igbo, Italian, Japanese, Khmer, Kinyarwanda, Korean, Kreyol, Macedonian, Malayalam, Maltese, Marathi, Mongolian, Nepali, Pashto, Persian, Polish, Portuguese, Portuguese (Brazil), Punjabi, Quechua, Romanian, Russian, Serbian, Sinhala, Slovenian, Sotho, Spanish, Swedish, Tamil, Telugu,Thai, Turkish, Ukrainian, Urdu, Vietnamese, Wolof, Yoruba

and a few others that we know we will need for current target countries, such as Mongolian (Traditional), Hazaragi and Aimaq for Afghanistan, Tigrinya for Ethiopia, and so on, and then the principal languages of any further countries that buy in or receive large donations.

Internationalization (i18n)

Preparing software so that it can be localized

To help others localize bundles and code efficiently, they need to be prepared so that anything which might need localization (strings, images, sounds) is separated out and organized for translators and localizers. This is internationalization (or i18n).

There are specific scripts and tools that help represent and compose the languages spoken, taught or used in various countries: these are internationalization tools.

Issues:

    • Cultural and national neutrality
    • Unicode
    • Writing directions
    • Stretching and shrinking of text
    • Locales: Formats for numbers, times, dates, currency, names, addresses, phone numbers
    • File names
    • Grammar issues: gender, number, phrasing
    • Punctuation
    • Style and usage
    • Switching languages in activities
    • Mixing languages in documents


Translation and pootle

Sugar and core activities

The basic procedure to translate activities is to sign up, enter the https://translate.sugarlabs.org Pootle server and work in the available projects. Back in 2007 these included:

  • XO-Core — activities or components that are central to XO
  • XO-Bundled — activities that are currently being bundled or included in the builds
  • Packaging — other material that needs to be localized
  • Terminology — support translation glossary

Translators basically have two ways to participate:

Other, more committed roles are possible, including the ability to make off-line translations with whatever tools you are used to, but that needs to be coordinated with the people in charge.

If you are not already subscribed to localization@lists.laptop.org we encourage you to do so.

To add a new language, please file a request in the trac system under the component localization.

See also:

  • For more detailed information on the functionality of the translation server and its usage, see Pootle.
  • For a list of the language teams / administrators see Pootle#Sign-up.

Languages of G1G1 Target Countries

Other Languages

Translation project have begun in Afrikaans, Amharic, Arabic, Aymara, Basque, Bengali, Catalan, Chinese (China), Chinese (Hong Kong), Chinese (Taiwan), Czech, Danish, Dutch, English, English (South African), English (US), Finnish, Friulian, Galician, Georgian, German, Greek, Hausa, Hindi, Icelandic, Igbo, Italian, Japanese, Korean, Macedonian, Maori, Maltese, Nepali, Persian, Polish, Portuguese, Portuguese (Brazil), Quechua, Romanian, Russian, Samoan, Serbian, Slovenian, Sotho, Swedish, Thai, Tongan Turkish, Ukrainian, Urdu, Vietnamese, Wolof, Yoruba, pseudo L10n

Support for Language Learning

Having an alternate GUI language on an XO is an excellent way to get used to a language, particularly if you can refer to Pootle or to a printout as a bilingual reference. Or even two XOs side by side in the two languages. Making a language part of your daily routine imprints it on your brain in a way that no amount of class time or formal practice can do.

Having friends to talk to in the language is of course the best way of all to learn it, and look! the XO lets you do that, too, all over the world.

Localization process

Keyboarding in your language

What good is seeing the interface in a particular language if your keyboard is in another?

  • Use My Settings to set keyboard preferences.
  • Terminal commands for keyboard used with specific languages are at Keyboard layouts.

Testing your localization

Translating is a pleasure when you can check the results right after you finish. See Localization/Testing on how to test the localization on virtual XO.

Basic Localization Topics

Character Sets

Unicode is fully supported in “modern” applications and toolkits used in free software. Legacy character set support also present, but modern applications use Unicode.

Collation order (the text sorting order) is generally well supported in the C library.

See also: Category:Fonts, Unicode.

Script Layout

OLPC uses the Pango library, which is able to layout most of the “hard” languages, including: Arabic, the Indic languages, Hebrew, Persian, Thai, etc. It has a modular pluggable layout engine and supports vertical text, as well as supporting bi-directional layout. Overall, some issues remain – but overall Pango can handle most scripts already; if it cannot, modules can be built to handle new scripts as documented in Pango's reference manual.

See also: Category:Languages (international)

Fonts

To share content and preserve cultural heritage OLPC's goal must be and is full coverage of all the world's languages. By using the Fontconfig system Linux has a better concept of language coverage of fonts than other systems. Fontconfig is used to configure the font system and determine what set of fonts are needed to cover a set of languages.

The formats of fonts supported on Linux include OpenType, TrueType and many others: see Freetype for details. Most of the font formats supported by Freetype are obsolete, and by far the best results on the screen will be had from OpenType and TrueType format fonts, particularly if they are hinted well. Type 1 fonts are useful primarily for printing; the renderer for Type1 fonts in Freetype we have today is not very good, and Type 1 does not support programmatic hinting for low resolution screens.

The OLPC XO-1 has a high resolution screen. High resolution helps OLPC considerably, particularly in grayscale mode at 200DPI. Wikipedia as usual, is a starting point for free fonts. "Font foundries" are companies who will contract to produce fonts.

See also: Category:Fonts, Fonts, HIG-The Sugar Interface/Text and Fonts

Free Fonts

Free fonts are available for most scripts in the world, though some fonts are licensed incorrectly for completely free redistribution.

Need for Screen Fonts

Applications and content should be usable on other screens everywhere, not just on OLPC's high resolution screen. Therefore the OLPC community needs to work together on extending the coverage of high quality screen fonts. The "DejaVu" font family (derived from Bitstream Vera) covers most Latin alphabets and some other languages. This family has in general good "hinting" for screen use. The Red Hat "Liberation" family recently became available to help substitute for the Microsoft family of fonts, but does not yet have very wide coverage.

SIL International also builds fonts for a number of additional languages of local interest.

Helping with these or other efforts to build fonts or to increase coverage of existing fonts is greatly appreciated. Pooling efforts on hinting glyphs, which is boring but important work, and/or donations and buyouts are also being investigated.

Keyboards

OLPC Keyboard layouts document OLPC's currently available keyboard layouts: further layouts are a modest amount of work if there are existing designs for those languages. People with local expertise will need to work with OLPC staff to generate new layouts.

See also: Category:Keyboard, HIG-Input Systems-Keyboard

Input Methods

An input method is software that allows typing of scripts with many more characters than keyboard keys. Examples include languages such as Chinese, Japanese, and Korean.

Free software systems now are using SCIM - Smart Common Input Method Platform. SCIM is replacing older input method systems.

Knowing what languages are taught as “foreign” languages, as well as are native in an area is needed to design keyboards that are most useful in each country. For example, the Nigerian keyboard is designed to allow easy entry of English, Hausa, and Yoruba, which are common languages in much of Nigeria. The "US/International" covers most of the western European languages.

Some issues remain in our base technology. For example: Arabic ligatures could present problems: by avoiding putting them on the keyboard we avoided the need for an input method. However, such workarounds may not be feasible for your language.

See also: Input methods, HIG-Input Systems

Accessibility and Usability

Speech Synthesis

Speech synthesis has a set of complex tradoffs of synthesizer size versus fidelity versus effort to localize a new language. See Speech synthesis.

See also Category:Accessibility

Music and Sound Samples

We want much more than dead white male western instruments for dead white male composers!

Clean samples of your musical instruments and music needed!

Samples need appropriate licensing terms.

See also TamTam: Sounds

Dictionaries and Spellcheckers

There is existing support for most major languages.

Spelling, Hyphenation, Thesaurus dictionaries may be needed for different parts of Linux, which may or may not apply to OLPC directly; for example you can check:

Of these, the first three are most immediately interesting to OLPC: we use versions of these codebases as part of the Sugar environment.

Character Recognition

Stroke/character recognizer localization is of some interest with the pen/tablet: in the future (Gen 2) when we have a touch screen they will become essential. xstroke is one such individual character/stroke recognizer, sufficient for alphabets of up to about 100 characters.

Considerations

Current Shortcomings

There are some real shortcomings where help is needed. These include:

  • Non-Gregorian calendars
  • Non-Latin digits (Roozbeh Pournader has patches, but these are not yet integrated and may need help).
  • and the sheer scale of the localization problem will eventually require changes in free software projects.

Localization Techniques

It only takes a small team to localize Linux for a language: e.g. Welsh, Icelandic, which are relatively small languages, have been pretty fully localized by small teams.

You can do the work yourself, hire the work out, or find volunteers among universities (worldwide), the world wide internet and free software community. Add to existing projects whenever possible. By checking with some of the major free software projects (e.g. Gnome, OpenOffice, Mozilla, KDE), you can often locate people already at work in your language.

Work directly in the software and content projects whenever possible. This makes your work available worldwide, while lessens the ongoing work. If you keep your localization work local, others cannot benefit from your work and effort and your software and content will be that much harder to localize.

Tools

Some example tools include pootle, kbabel and rosetta. Most software uses the GNU “gettext” libraries and standard .po files, including Sugar; Firefox and OpenOffice have their own systems for historical reasons. Wordforge is a good place to get plugged into tools and the community efforts.

The cldr project is worth watching, though OpenOffice is the first major project using this.

Remember, contribute your translations to the “upstream” projects to minimize long term effort: share your work with the world. Do not presume that if one Linux distribution has your effort that you are finished; some Linux distributions are not good about working with the community that builds and distributes the original software.

Licensing

Translated strings will often be useful among many projects, not just the the project you are working on translating, therefore, since the MIT/BSD (3 clause) licenses are usable by all projects, these are the safest licenses to use for translation to enable widest sharing.

The SIL OFL license recommended for Fonts. An often overlooked issue with fonts is that they are incorporated into documents themselves (for example, into PDF documents) and that therefore licensing needs to be considered carefully.

See also Software licensing

Next Steps

Localization is by nature local: but languages often cross borders. Please contact Jim Gettys and Mokurai to identify issues.

We need to identify people/organizations responsible for language, translation, keyboards, and speech synthesis, as well as effective free software community leaders to help with local deployment and "on the ground" knowledge.

Sugar Localization

Sugar and Sugar applications use standard .po files, and can be localized using the usual tools. Sugar_18n goes into the details of the localization process.

General GNU/Linux Localization

By looking at the gnome, mozilla, OpenOffice, KDE projects, you can get plugged into translating other GNU/Linux software of general interest.

Localization of Python

See Python i18n for details and a step-by-step example.

Current l10n projects

library exchange

activities


DansGuardian

The school server uses DansGuardian for web filtering. DansGuardian accomplishes filtering using keyword lists. These lists need to be translated.

See Also

Why localization is important?

Because.