Localization: Difference between revisions
(Replacing page with '{{OLPC}} {{Translations}}{{TOCright}} Internationalization technology is the technology for representing and composing the languages spoken, taught or used in your countries. L...') |
RafaelOrtiz (talk | contribs) m (Undo revision 42627 by 122.16.74.92 (Talk)) |
||
Line 3: | Line 3: | ||
Internationalization technology is the technology for representing and composing the languages spoken, taught or used in your countries. Localization is the process of taking software or content and adapting it for local use. |
Internationalization technology is the technology for representing and composing the languages spoken, taught or used in your countries. Localization is the process of taking software or content and adapting it for local use. |
||
Localization involves fonts, script layout, input methods, speech synthesis, musical instrumentation, collating order, number |
Localization involves fonts, script layout, input methods, speech synthesis, musical instrumentation, collating order, number & date formats, dictionaries, and spelling checkers, among other issues. |
||
Linux is already more widely localized than Microsoft Windows since no cooperation from a vendor is required to do so: having said this, cooperation with the free software and content community is vital to reduce overall work required. |
|||
The size of the problem is huge. [http://www.ethnologue.org/ethno_docs/distribution.asp?by=size Ethnologue] has extensive information on the languages of the world. |
|||
: See also [http://en.wikipedia.org/wiki/Internationalization_and_localization Wikipedia's definition] |
|||
This is an outline of (some of) the core topics and tools, and issues of localization. |
|||
== Basic Localization Topics == |
|||
=== Character Sets === |
|||
[http://www.unicode.org/ Unicode] is fully supported in “modern” applications and toolkits used in free software. Legacy character set support also present, but modern applications use Unicode. |
|||
Collation order (the sorting order when text is sorted by Linux) is generally well supported in the C library. |
|||
: See also: [[:Category:Fonts]], [[Unicode]]. |
|||
=== Script Layout === |
|||
OLPC primarily concentrates on using the [http://www.pango.org/ Pango library], which is able to layout most of the “hard” languages, including: Arabic, the Indic languages, Hebrew, Persian, Thai, etc. It has a modular puggable layout engine and supports vertical text, as well as supporting bi-directional layout. Overall, some issues remain – but overall Pango is in pretty good shape and can handle most scripts already. |
|||
: See also: [[:Category:Languages (international)]] |
|||
=== Fonts === |
|||
To share content and preserve cultural heritage OLPC's goal must be and is for full coverage of all the world's languages. By using the [http://www.fontconfig.org/wiki/ Fontconfig] system Linux has a better concept of language coverage of fonts than other systems. This system is used to configure the font system and determine what set of fonts are needed to cover a set of languages. |
|||
The formats of fonts supported on Linux include [http://en.wikipedia.org/wiki/OpenType OpenType], [http://en.wikipedia.org/wiki/TrueType TrueType] and many others: see [http://www.freetype.org/ Freetype] for details. Most of the current font formats supported by Freetype are obsolete, and by far the best results on the screen will be had from OpenType and TrueType format fonts. [http://en.wikipedia.org/wiki/Type_1_font Type 1 fonts] are useful primarily for printing; the renderer for Type1 fonts in Freetype we have today is not very good, and Type 1 does not support programmatic hinting for low resolution screens. |
|||
OLPC itself has a relatively high resolution screen; this helps us considerably, particularly in grayscale mode at 200DPI. |
|||
: See also: [[:Category:Fonts]], [[Fonts]], [[OLPC Human Interface Guidelines/The Sugar Interface/Text and Fonts|HIG-The Sugar Interface/Text and Fonts]] |
|||
==== Free Fonts ==== |
|||
Free fonts are available for most scripts in the world, though some fonts are [[#Licensing|licensed]] incorrectly for completely free redistribution. |
|||
==== Need for Screen Fonts ==== |
|||
Regardless of the XOs resolution, we also need our applications and content to be usable on other screens everywhere, so we need to work together on extending the coverage we have today on high quality screen fonts. The [http://dejavu.sourceforge.net/wiki/index.php/Main_Page "DejaVu"] font family (derived from Bitstream Vera) covers most [http://en.wikipedia.org/wiki/Latin_alphabet Latin alphabets] and some other languages. This family has in general good "hinting" for screen use. |
|||
[http://www.sil.org/computing/catalog/show_software_catalog.asp?by=cat&name=Font SIL International] also builds fonts for a number of additional languages of local interest. |
|||
Helping with these or other efforts to build fonts or to increase coverage of existing fonts is greatly appreciated. Pooling efforts on hinting glyphs, which is boring but important work, and/or donations and buyouts are also being investigated. |
|||
=== Keyboards=== |
|||
[[OLPC Keyboard layouts]] document OLPC's currently available keyboard layouts: further layouts are a modest amount of work, requiring people with local expertise to work with OLPC staff to generate new layouts. |
|||
: See also: [[:Category:Keyboard]], [[OLPC Human Interface Guidelines/The Sugar Interface/Input Systems#Keyboard|HIG-Input Systems-Keyboard]] |
|||
=== Input Methods === |
|||
An input method is software that allows typing of complex characters, for example for languages such as Chinese, Japanese, Korean. Some issues remain, for example: Arabic ligatures, by avoiding putting them on the keyboard we've avoided the need for an input method. However, such workarounds may not be feasible for your language. |
|||
Free software systems now are using [http://www.scim-im.org/projects/imengines SCIM - Smart Common Input Method Platform]. SCIM is replacing older input method systems. |
|||
We need to know what languages are taught as “foreign” languages, as well as are native, to design keyboards that are most useful in each country. For example, the Nigerian keyboard is designed to allow easy entry of English, Hausa, and Yoruba, which are common languages in much of Nigeria. The "US/International" covers most of the western European languages. |
|||
: See also: [[Input methods]], [[OLPC Human Interface Guidelines/The Sugar Interface/Input Systems|HIG-Input Systems]] |
|||
== [http://en.wikipedia.org/wiki/Accessibility#Telecommunications_and_information_technology_access Accessibility] and [http://en.wikipedia.org/wiki/Usability Usability] == |
|||
=== Speech Synthesis === |
|||
There are tradeoffs of size vs. fidelity vs. effort to synthesize a new language between the [http://en.wikipedia.org/wiki/Speech_synthesis speech synthesis] software that is available, which includes [http://www.cstr.ed.ac.uk/projects/festival/ festival], [http://www.speech.cs.cmu.edu/flite/ flite], [http://espeak.sourceforge.net/ espeak] are available. |
|||
[http://sourceforge.net/projects/espeak/ Espeak] is small enough for us to often bundle and covers quite a few languages: ~10 languages currently supported tuned by native speakers with 10 more languages underway. |
|||
Synthesis is essential or accessibility to content by people with vision problems, and will need to be integrated with the [http://developer.gnome.org/projects/gap/ ATK library] used, as well as literacy training, other uses as part of a GUI. Full localization therefore involves selection of a suitable synthesis system and integration into the ATK framework, along with localization of that system for the particular language involved. |
|||
Speech synthesis is usually not a good guide for pronunciation learning languages – but it may be better than a poor teacher who has never had the opportunity to learn from a native speaker of that language. |
|||
: See also [[:Category:Accessibility]] |
|||
=== Music and Sound Samples === |
|||
We want much more than dead white male western instruments for dead white male composers! |
|||
Clean samples of your musical instruments and music needed! |
|||
Samples need appropriate [[#Licensing|licensing]] terms. |
|||
: See also [[TamTam: Sounds]] |
|||
=== Dictionaries, Spelling Checkers, Thesaurus === |
|||
Support exists for most major languages. |
|||
Spelling, Hyphenation, Thesaurus dictionaries may be needed for different parts of Linux, which may or may not apply to OLPC directly; for example you can check: |
|||
* [http://aspell.net/man-html/Supported.html '''aspell'''] |
|||
* [http://dictionaries.mozdev.org/installation.html '''mozilla'''] |
|||
* [http://www.abiword.org/languages.phtml '''abiword'''] |
|||
* [http://wiki.services.openoffice.org/wiki/Dictionaries Open Office] |
|||
Of these, the first three are most immediately interesting to OLPC. |
|||
=== Character Recognition === |
|||
Stroke/character recognizer localization is of some interest with the pen/tablet: in the future (Gen 2) when we have a touch screen they will become essential. [ftp://ftp.handhelds.org/projects/xstroke/release-0.5/ xstroke] is one such individual character/stroke recognizer, sufficient for alphabets of up to about 100 characters. |
|||
== Considerations == |
|||
=== Current Shortcomings === |
|||
There are some real shortcomings where help is needed. These include: |
|||
* Non-Gregorian [http://en.wikipedia.org/wiki/List_of_calendar_systems calendars] |
|||
* Non-Latin digits (Roozbeh Pournader has patches, but these are not yet integrated and may need help). |
|||
* and the sheer scale of the localization problem will eventually require changes in free software projects. |
|||
=== Localization Techniques === |
|||
It only takes a small team to localize Linux for a language: e.g. Welsh, Icelandic, which are relatively small languages, have been pretty fully localized by small teams. |
|||
You can do the work yourself, hire the work out, or find volunteers among universities (worldwide), the world wide internet and free software community. Add to existing projects whenever possible. By checking with some of the major free software projects (e.g. [http://live.gnome.org/TranslationProject Gnome], [http://l10n.openoffice.org/ OpenOffice], [http://www.mozilla.org/projects/l10n/ Mozilla], [http://l10n.kde.org/ KDE]), you can often locate people already at work in your language. |
|||
Work directly in the software and content projects whenever possible. This makes your work available worldwide, while lessens the ongoing work. If you keep your localization work local, others cannot benefit from your work and effort and your software and content will be that much harder to localize. |
|||
=== Tools === |
|||
Some example tools include [http://pootle.wordforge.org/ pootle], [http://kbabel.kde.org/ kbabel] and rosetta. |
|||
Most software uses the GNU “gettext” libraries and standard .po files, including Sugar; Firefox and OpenOffice have their own systems for historical reasons. [http://www.wordforge.org/drupal/ Wordforge] is a good place to get plugged into tools and the community efforts. |
|||
The [http://www.unicode.org/cldr cldr project] is worth watching, though OpenOffice is the first major project using this. |
|||
Remember, contribute your translations to the “upstream” projects to minimize long term effort: share your work with the world. Do not presume that if one Linux distribution has your effort that you are finished; some Linux distributions are not good about working with the community that builds and distributes the original software. |
|||
=== Licensing === |
|||
Translated strings will often be useful among many projects, not just the the project you are working on translating, therefore, since the MIT/BSD (3 clause) licenses are usable by all projects, these are the safest licenses to use for translation to enable widest sharing. |
|||
The [http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL SIL OFL license] recommended for Fonts. An often overlooked issue with fonts is that they are incorporated into documents themselves (for example, into PDF documents) and that therefore licensing needs to be considered carefully. |
|||
: See also [[Software licensing]] |
|||
== Next Steps == |
|||
Localization is by nature local: but languages often crosses borders. Please contact [[User:Jg|Jim Gettys]] to identify issues. |
|||
We need identified people/organizations responsible for language, translation, keyboards, speech synthesis, an effective free software community leaders to help with local deployment and "on the ground" knowledge. |
|||
=== Sugar Localization === |
|||
Sugar and sugar applications use standard .po files, and can be localized using the usual [[#Tools|tools]]. |
|||
=== General Linux Localization === |
|||
By looking at the [http://www.gnome.org/i18n/ gnome], [http://www.mozilla.org/projects/l10n/mlp.html mozilla], [http://contributing.openoffice.org/native-lang.html OpenOffice], [http://l10n.kde.org/ KDE] projects, you can get plugged into translating other Linux software of general interest. |
|||
== Current l10n projects == |
|||
=== library exchange === |
|||
* [[Localization/Library|Library strings]] -- header and descriptive strings for an [http://dev.laptop.org/pub/content/Library/ OLPC sample library]. Includes some ''PO-like'' strings for the following sections: |
|||
** [[Localization/Library/sidebar po|sidebar]] — en | es | ko | pt |
|||
** [[Localization/Library/biology po|biology]] — en | es | ko — '''to review:''' pt |
|||
** [[Localization/Library/books po|books]] — en | es | ko — '''''wanted:''''' pt |
|||
** [[Localization/Library/games po|games]] — en | es | ko | pt |
|||
** [[Localization/Library/nature po|nature]] — en | es | ko | pt |
|||
** [[Localization/Library/atlas po|atlas]] — en | es | ko | pt |
|||
* [[Localization/www.laptop.org]] -- The l10n effort for the [http://www.laptop.org new www.laptop.org website] |
|||
* We can't translate everything, but we sure want to hear what you would like to see translated into your language. If you got a [[Translating#suggested translations|translation to suggest]] please let us know! |
|||
=== activities === |
|||
Add / include links to upstream localization where appropriate. |
|||
* [[Localization/Library/camera po|camera]] — en | es | ko | pt | zh-CN |
|||
* web? |
|||
* read? |
|||
* write? |
|||
* blockparty? |
|||
; See also : [[Translators]] & [[Translating]] for the localization of this wiki. |
|||
== i18n & l10n == |
|||
The following table is focused on the list of languages present in the currently 'green status' countries ({{Status green countries}}). Countries with other 'status' may benefit from efforts for the 'green languages', plus add their own set of languages. Each language must be fully supported for the [[Localization]] effort. |
|||
{| border="1" cellspacing="0" |
|||
|- |
|||
! Language !! Green Countries !! Red Countries !! Orange |
|||
|- valign="top" |
|||
| [[Arabic]] |
|||
| [[OLPC Libya|Libya]] |
|||
| |
|||
| <font size="-1">Bahrain, [[OLPC Egypt|Egypt]], Iraq ([http://en.wikipedia.org/wiki/Irak +]), [[OLPC Israel|Israel]] ([http://en.wikipedia.org/wiki/Israel +]), Jordan, Kuwait, Lebanon ([http://en.wikipedia.org/wiki/Lebanon#Languages +]), Morocco, Oman, Palestine, Saudi Arabia, Sudan ([http://en.wikipedia.org/wiki/Sudan#Official_languages +]), Syria ([http://en.wikipedia.org/wiki/Syria#Languages +]), Tunisia, Yemen</font> |
|||
|- valign="top" |
|||
| [[English]] |
|||
| [[OLPC Nigeria|Nigeria]],<br>[[OLPC Rwanda|Rwanda]],<br>[[OLPC USA|USA]] ([http://en.wikipedia.org/wiki/USA#Languages +]) |
|||
| <font size="-1">Belize ([http://en.wikipedia.org/wiki/Belize +]), [[OLPC Pakistan|Pakistan]] ([http://en.wikipedia.org/wiki/Pakistan +]), [[OLPC Philippines|Philippines]] ([http://en.wikipedia.org/wiki/Philippines#Languages +])</font> |
|||
| <font size="-1">Canada ([http://en.wikipedia.org/wiki/Canada +]), Gambia, Guyana, [[OLPC India|India]] ([http://en.wikipedia.org/wiki/India +]), [[OLPC Kenya|Kenya]] ([http://en.wikipedia.org/wiki/Kenya +]), Mauritius ([http://en.wikipedia.org/wiki/Mauritius +]), Namibia ([http://en.wikipedia.org/wiki/Namibia +]), Saint Kitts and Nevis, Sierra Leone, Singapore ([http://en.wikipedia.org/wiki/Singapore#Languages +]), [[OLPC South Africa|South Africa]] ([http://en.wikipedia.org/wiki/South_Africa#Languages +]), St. Lucia, Trinidad and Tobago, Uganda ([http://en.wikipedia.org/wiki/Uganda +]), Zimbabwe ([http://en.wikipedia.org/wiki/Zimbabwe#Language +])</font> |
|||
|- valign="top" |
|||
| [[French]] |
|||
| [[OLPC Rwanda|Rwanda]] |
|||
| <font size="-1">Haiti ([http://en.wikipedia.org/wiki/Haitian_Creole_language +])</font> |
|||
| <font size="-1">[[OLPC Benin|Benin]], Cameroon ([http://en.wikipedia.org/wiki/Cameroon +]), Democratic Republic of the Congo ([http://en.wikipedia.org/wiki/Democratic_Republic_of_the_Congo#Languages +]), Gabon, Mali, Niger, Senegal, St. Martin ([http://en.wikipedia.org/wiki/St._Martin +]), Togo</font> |
|||
|- valign="top" |
|||
| [[Hausa]] |
|||
| [[OLPC Nigeria|Nigeria]] |
|||
|- valign="top" |
|||
| [[Igbo]] |
|||
| [[OLPC Nigeria|Nigeria]] |
|||
|- valign="top" |
|||
| [[Kinyarwanda]] |
|||
| [[OLPC Rwanda|Rwanda]] |
|||
|- valign="top" |
|||
| [[Portuguese]] |
|||
| [[OLPC Brazil|Brazil]] |
|||
| <font size="-1">Angola</font> |
|||
| <font size="-1">Mozambique, Portugal, São Tomé and Príncipe</font> |
|||
|- valign="top" |
|||
| [[Spanish]] |
|||
| [[OLPC Argentina|Argentina]],<br>[[OLPC Peru|Peru]] ([http://en.wikipedia.org/wiki/Peru +]),<br>[[OLPC Uruguay|Uruguay]],<br>[[OLPC USA|USA]] ([http://en.wikipedia.org/wiki/USA#Languages +]) |
|||
| <font size="-1">Belize, Costa Rica, Dominican Republic, El Salvador, Guatemala ([http://en.wikipedia.org/wiki/Guatemala#Language +]), Honduras, [[OLPC Mexico|México]] ([http://en.wikipedia.org/wiki/Mexico#Languages +]), Nicaragua, Panamá</font> |
|||
| <font size="-1">Bolivia ([http://en.wikipedia.org/wiki/Bolivia +]), [[OLPC Chile|Chile]], [[OLPC Colombia|Colombia]], Cuba, [[OLPC Ecuador|Ecuador]], Paraguay ([http://en.wikipedia.org/wiki/Paraguay +]), Puerto Rico ([http://en.wikipedia.org/wiki/Puerto_Rico#Languages +]), Spain ([http://en.wikipedia.org/wiki/Spain#Languages +]), Venezuela ([http://en.wikipedia.org/wiki/Venezuela +])</font> |
|||
|- valign="top" |
|||
| [[Thai]] |
|||
| [[OLPC Thailand|Thailand]] |
|||
|- valign="top" |
|||
| [[Yoruba]] |
|||
| [[OLPC Nigeria|Nigeria]] |
|||
|- valign="top" |
|||
| colspan="2" | Other non-green languages |
|||
| <font size="-1">[[OLPC Ethiopia|Ethiopia]], Indonesia, [[OLPC Philippines|Philippines]] ([http://en.wikipedia.org/wiki/Philippines#Languages +]), [[OLPC Pakistan|Pakistan]] ([http://en.wikipedia.org/wiki/Pakistan +]), Vietnam</font> |
|||
| <font size="-1">Afghanistan, [[OLPC Albania|Albania]], Armenia, Azerbaijan, Bangladesh, [[OLPC Bhutan|Bhutan]] ([http://en.wikipedia.org/wiki/Bhutan +]), Bosnia and Herzegovina, [[OLPC Cambodia|Cambodia]], [[OLPC China|China]] ([http://en.wikipedia.org/wiki/China#Languages +]), Croatia, [[OLPC Cyprus|Cyprus]], Eritrea, Estonia, Georgia, [[OLPC Greece|Greece]], Hungary, Iceland, [[OLPC India|India]] ([http://en.wikipedia.org/wiki/India +]), Iran, Italy, [[OLPC Japan|Japan]], Kyrgyzstan, Latvia, Lithuania, Macedonia, Malaysia, Moldova, [[OLPC Mongolia|Mongolia]], Romania, [[OLPC Russia|Russia]], Slovenia, [[OLPC Korea|South Korea]], [[OLPC Sri Lanka|Sri Lanka]], Tajikistan, Tanzania, Turkey, Ukraine, Uzbekistan, Vatican City</font> |
|||
|} |
|||
The following table presents on a per country base the target languages that must be considered for the [[Localization]] effort of the countries with 'green status' ({{Status green countries}}). |
|||
{| style="text-align:top; " |
|||
|- style="background:grey; " |
|||
! Country !! Target Languages !! Mayor/important languages !! Minor/relevant languages |
|||
|- valign="top" |
|||
| [[OLPC Argentina|Argentina]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=AR EthnologueAR]</font> |
|||
| [spa] [[Spanish]] |
|||
| <font size="-1">[quh] Quechua (0.85M - 2.1%)</font> |
|||
| See [[OLPC Argentina/Languages]] |
|||
|- valign="top" style="background:lightgrey; " |
|||
| [[OLPC Brazil|Brazil]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=BR EthnologueBR]</font> |
|||
| [por] [[Portuguese]] |
|||
| colspan="2" | ''none reported by [http://www.ethnologue.org/show_country.asp?name=BR Ethnologue BR] above 50,000 speakers.'' |
|||
|- valign="top" |
|||
| [[OLPC Libya|Libya]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=LY EthnologueLY]</font> |
|||
| [arb] [[Arabic|Arabic, Standard]] |
|||
| <font size="-1">[ayl] Arabic, Libyan Spoken (4.2M - 75%),<br>[jbn] Nafusi (0.14M - 2.5%)</font> |
|||
| <font size="-1">[rmt] Domari (0.03M - 0.6%)</font> |
|||
|- valign="top" style="background:lightgrey; " |
|||
| [[OLPC Nigeria|Nigeria]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=NG EthnologueNG]</font> |
|||
| [eng] [[English]],<br>[hau] [[Hausa]]<font size="-1"><br>—(18.5M - 13.5%)</font>,<br>[yor] [[Yoruba]]<font size="-1"><br>—(18.9M - 13.8%)</font> |
|||
| <font size="-1">[bin] [[Edo]] (1.0M - 0.7%) official,<br>[efi] [[Efik]] (0.4M - 0.3%) official,<br>[fub] [[Adamawa Fulfulde|Fulfulde, Adamawa]] (7.6M - 5.6%) official,<br>[fuv] Fulfulde, Nigerian (1.7M - 1.2%),<br>[ibb] Ibibio (1.5M to 2.0M - 1.0-1.5%),<br>[idu] [[Idoma]] (0.6M - 0.4%) official,<br>[ibo] [[Igbo]] (18.0M - 13.1%) official,<br>[knc] [[Central Kanuri|Kanuri, Central]] (3.0M - 2.2%) official,<br>[tiv] Tiv (2.2M - 1.6%)</font> |
|||
| See [[OLPC Nigeria/Languages]] |
|||
|- valign="top" |
|||
| [[OLPC Peru|Peru]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=PE EthnologueNG]</font> |
|||
| [spa] [[Spanish]] |
|||
| <font size="-1">''pending''</font> |
|||
| See [[OLPC Peru/Languages]] |
|||
|- valign="top" style="background:lightgrey; " |
|||
| [[OLPC Rwanda|Rwanda]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=RW EthnologueRW]</font> |
|||
| [kin] [[Kinyarwanda]],<br>[fra] [[French]],<br>[eng] [[English]] |
|||
| <font size="-1">[swh] Swahili (0.01M - 1.3%) |
|||
| |
|||
|- valign="top" |
|||
| [[OLPC Thailand|Thailand]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=TH EthnologueTH]</font> |
|||
| [[Thai]] (dialects?) |
|||
| <font size="-1">[nan] Chinese, Min Nan (1.1M - 1.7%),<br>[kxm] Khmer, Northern (1.1M - 1.8%),<br>[mfa] Malay, Pattani (3.1M - 4.8%),<br>[tha] Thai (20.2M - 32%),<br>[tts] Thai, Northeastern (15.0M - 23%),<br>[nod] Thai, Northern (6.0M - 9.2%),<br>[sou] Thai, Southern (5.0M - 7.7%)</font> |
|||
| <font size="-1">[ksw] Karen, S'gaw (0.3M - 0.5%),<br>[kdt] Kuy (0.3M - 0.5%)</font> |
|||
|- valign="top" style="background:lightgrey; " |
|||
| [[OLPC Uruguay|Uruguay]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=UY EthnologueUY]</font> |
|||
| [spa] [[Spanish]] |
|||
| colspan="2" | ''none other reported by [http://www.ethnologue.org/show_country.asp?name=UY Ethnologue UY]'' |
|||
|- valign="top" |
|||
| [[OLPC USA|USA]]<font size="-1"><br>[http://www.ethnologue.org/show_country.asp?name=US EthnologueUS]</font> |
|||
| [eng] [[English]] |
|||
| <font size="-1">[spa] Spanish (22.4M - 7.5%),<br>[___] Polish (3.4M - 1.1%),<br>[deu] German, Standard (6.1M - 2.0%),<br>[___] Arabic (3.0M - 1.0%)</font> |
|||
| <font size="-1">[___] Armenian (1.1M - 0.4%),<br>[___] Chinese (1.6M - 0.5%),<br>[___] Czech (1.5M - 0.5%),<br>[___] Eastern Yiddish (1.3M - 0.4%),<br>[___] French (1.1M - 0.4%),<br>[frc] French, Cajun (1.0M - 0.3%),<br>[hwc] Hawai'i Creole English (0.6M - 0.2%),<br>[___] Italian (0.9M - 0.3%),<br>[___] Japanese (0.8M - 0.3%),<br>[___] Korean (1.8M - 0.6%),<br>[___] Philippines (1.4M - 0.5%),<br>[___] Portuguese (1.3M - 0.4%),<br>[___] Swedish (0.6M - 0.2%),<br>[___] Ukrainian (0.8M - 0.3%),<br>[___] Vietnamese (0.9M - 0.3%),<br>[___] Vlax Romani (0.7M - 0.2%),<br>[___] Western Farsi (0.9M - 0.3%)</font> |
|||
|} |
|||
== Country groups and descriptions == |
|||
* [[OLPC Albania | Albania]] |
|||
* [[OLPC Argentina/l10n | Argentina]] |
|||
* [[OLPC Austria | Austria]] |
|||
* [[OLPC Brazil | Brazil]] |
|||
* [[OLPC China | China]] |
|||
* [[OLPC Colombia | Colombia]] |
|||
* [[OLPC Egypt | Egypt]] |
|||
* [[OLPC Ethiopia | Ethiopia]] |
|||
* [[OLPC Spain | Spain ]] |
|||
* [[OLPC France | France]] |
|||
* [[OLPC Germany | Germany]] |
|||
* [[OLPC Greece | Greece]] |
|||
* [[OLPC India | India]] |
|||
* [[OLPC Kenya | Kenya]] |
|||
* [[OLPC Korea | 한국 (S.Korea)]] |
|||
* [[OLPC Korea | 조선 (N.Korea)]] |
|||
* [[OLPC Laos | Laos]] |
|||
* [[OLPC Libya | Libya]] |
|||
* [[OLPC Nepal | Nepal]] |
|||
* [[OLPC Nigeria | Nigeria]] |
|||
* [[OLPC Poland | Poland]] |
|||
* [http://www.olpc.ro/index.php/Main_Page Romania] |
|||
* [[OLPC Russia | Russia]] |
|||
* [[OLPC South Africa | South Africa]] |
|||
* [[OLPC Sri Lanka | Sri Lanka]] |
|||
* [[OLPC Thailand | Thailand]] |
|||
* [[OLPC Uruguay | Uruguay]] |
|||
== Korean-based Communities == |
|||
[[Image:korea_map.gif|right]] |
|||
People using Korean as their native language are those in South Korea (한국인) and North Korea (조선인). Some Chinese and those with other nationalities, living in the Nothern part of Korea also are using Korean as their second language, because of some historical issues. They are called as 고려인(Korea-in) and 조선족 (Chosun-zok or Korean Chinese) respectively. |
|||
Currently [[OLPC Korea]] (or [[OLPC Korea|XO Korea]]) is covering all those nations and regions. In a near future, we hope there will be regional XO groups for those. |
|||
[[Category:Countries]] |
|||
[[Category:Language support]] |
|||
[[Category:Languages (international)]] |
Revision as of 21:35, 10 June 2007
Internationalization technology is the technology for representing and composing the languages spoken, taught or used in your countries. Localization is the process of taking software or content and adapting it for local use.
Localization involves fonts, script layout, input methods, speech synthesis, musical instrumentation, collating order, number & date formats, dictionaries, and spelling checkers, among other issues.
Linux is already more widely localized than Microsoft Windows since no cooperation from a vendor is required to do so: having said this, cooperation with the free software and content community is vital to reduce overall work required.
The size of the problem is huge. Ethnologue has extensive information on the languages of the world.
- See also Wikipedia's definition
This is an outline of (some of) the core topics and tools, and issues of localization.
Basic Localization Topics
Character Sets
Unicode is fully supported in “modern” applications and toolkits used in free software. Legacy character set support also present, but modern applications use Unicode.
Collation order (the sorting order when text is sorted by Linux) is generally well supported in the C library.
- See also: Category:Fonts, Unicode.
Script Layout
OLPC primarily concentrates on using the Pango library, which is able to layout most of the “hard” languages, including: Arabic, the Indic languages, Hebrew, Persian, Thai, etc. It has a modular puggable layout engine and supports vertical text, as well as supporting bi-directional layout. Overall, some issues remain – but overall Pango is in pretty good shape and can handle most scripts already.
- See also: Category:Languages (international)
Fonts
To share content and preserve cultural heritage OLPC's goal must be and is for full coverage of all the world's languages. By using the Fontconfig system Linux has a better concept of language coverage of fonts than other systems. This system is used to configure the font system and determine what set of fonts are needed to cover a set of languages.
The formats of fonts supported on Linux include OpenType, TrueType and many others: see Freetype for details. Most of the current font formats supported by Freetype are obsolete, and by far the best results on the screen will be had from OpenType and TrueType format fonts. Type 1 fonts are useful primarily for printing; the renderer for Type1 fonts in Freetype we have today is not very good, and Type 1 does not support programmatic hinting for low resolution screens.
OLPC itself has a relatively high resolution screen; this helps us considerably, particularly in grayscale mode at 200DPI.
Free Fonts
Free fonts are available for most scripts in the world, though some fonts are licensed incorrectly for completely free redistribution.
Need for Screen Fonts
Regardless of the XOs resolution, we also need our applications and content to be usable on other screens everywhere, so we need to work together on extending the coverage we have today on high quality screen fonts. The "DejaVu" font family (derived from Bitstream Vera) covers most Latin alphabets and some other languages. This family has in general good "hinting" for screen use.
SIL International also builds fonts for a number of additional languages of local interest.
Helping with these or other efforts to build fonts or to increase coverage of existing fonts is greatly appreciated. Pooling efforts on hinting glyphs, which is boring but important work, and/or donations and buyouts are also being investigated.
Keyboards
OLPC Keyboard layouts document OLPC's currently available keyboard layouts: further layouts are a modest amount of work, requiring people with local expertise to work with OLPC staff to generate new layouts.
- See also: Category:Keyboard, HIG-Input Systems-Keyboard
Input Methods
An input method is software that allows typing of complex characters, for example for languages such as Chinese, Japanese, Korean. Some issues remain, for example: Arabic ligatures, by avoiding putting them on the keyboard we've avoided the need for an input method. However, such workarounds may not be feasible for your language.
Free software systems now are using SCIM - Smart Common Input Method Platform. SCIM is replacing older input method systems.
We need to know what languages are taught as “foreign” languages, as well as are native, to design keyboards that are most useful in each country. For example, the Nigerian keyboard is designed to allow easy entry of English, Hausa, and Yoruba, which are common languages in much of Nigeria. The "US/International" covers most of the western European languages.
- See also: Input methods, HIG-Input Systems
Accessibility and Usability
Speech Synthesis
There are tradeoffs of size vs. fidelity vs. effort to synthesize a new language between the speech synthesis software that is available, which includes festival, flite, espeak are available.
Espeak is small enough for us to often bundle and covers quite a few languages: ~10 languages currently supported tuned by native speakers with 10 more languages underway.
Synthesis is essential or accessibility to content by people with vision problems, and will need to be integrated with the ATK library used, as well as literacy training, other uses as part of a GUI. Full localization therefore involves selection of a suitable synthesis system and integration into the ATK framework, along with localization of that system for the particular language involved.
Speech synthesis is usually not a good guide for pronunciation learning languages – but it may be better than a poor teacher who has never had the opportunity to learn from a native speaker of that language.
- See also Category:Accessibility
Music and Sound Samples
We want much more than dead white male western instruments for dead white male composers!
Clean samples of your musical instruments and music needed!
Samples need appropriate licensing terms.
- See also TamTam: Sounds
Dictionaries, Spelling Checkers, Thesaurus
Support exists for most major languages.
Spelling, Hyphenation, Thesaurus dictionaries may be needed for different parts of Linux, which may or may not apply to OLPC directly; for example you can check:
Of these, the first three are most immediately interesting to OLPC.
Character Recognition
Stroke/character recognizer localization is of some interest with the pen/tablet: in the future (Gen 2) when we have a touch screen they will become essential. xstroke is one such individual character/stroke recognizer, sufficient for alphabets of up to about 100 characters.
Considerations
Current Shortcomings
There are some real shortcomings where help is needed. These include:
- Non-Gregorian calendars
- Non-Latin digits (Roozbeh Pournader has patches, but these are not yet integrated and may need help).
- and the sheer scale of the localization problem will eventually require changes in free software projects.
Localization Techniques
It only takes a small team to localize Linux for a language: e.g. Welsh, Icelandic, which are relatively small languages, have been pretty fully localized by small teams.
You can do the work yourself, hire the work out, or find volunteers among universities (worldwide), the world wide internet and free software community. Add to existing projects whenever possible. By checking with some of the major free software projects (e.g. Gnome, OpenOffice, Mozilla, KDE), you can often locate people already at work in your language.
Work directly in the software and content projects whenever possible. This makes your work available worldwide, while lessens the ongoing work. If you keep your localization work local, others cannot benefit from your work and effort and your software and content will be that much harder to localize.
Tools
Some example tools include pootle, kbabel and rosetta. Most software uses the GNU “gettext” libraries and standard .po files, including Sugar; Firefox and OpenOffice have their own systems for historical reasons. Wordforge is a good place to get plugged into tools and the community efforts.
The cldr project is worth watching, though OpenOffice is the first major project using this.
Remember, contribute your translations to the “upstream” projects to minimize long term effort: share your work with the world. Do not presume that if one Linux distribution has your effort that you are finished; some Linux distributions are not good about working with the community that builds and distributes the original software.
Licensing
Translated strings will often be useful among many projects, not just the the project you are working on translating, therefore, since the MIT/BSD (3 clause) licenses are usable by all projects, these are the safest licenses to use for translation to enable widest sharing.
The SIL OFL license recommended for Fonts. An often overlooked issue with fonts is that they are incorporated into documents themselves (for example, into PDF documents) and that therefore licensing needs to be considered carefully.
- See also Software licensing
Next Steps
Localization is by nature local: but languages often crosses borders. Please contact Jim Gettys to identify issues.
We need identified people/organizations responsible for language, translation, keyboards, speech synthesis, an effective free software community leaders to help with local deployment and "on the ground" knowledge.
Sugar Localization
Sugar and sugar applications use standard .po files, and can be localized using the usual tools.
General Linux Localization
By looking at the gnome, mozilla, OpenOffice, KDE projects, you can get plugged into translating other Linux software of general interest.
Current l10n projects
library exchange
- Library strings -- header and descriptive strings for an OLPC sample library. Includes some PO-like strings for the following sections:
- Localization/www.laptop.org -- The l10n effort for the new www.laptop.org website
- We can't translate everything, but we sure want to hear what you would like to see translated into your language. If you got a translation to suggest please let us know!
activities
Add / include links to upstream localization where appropriate.
- camera — en | es | ko | pt | zh-CN
- web?
- read?
- write?
- blockparty?
- See also
- Translators & Translating for the localization of this wiki.
i18n & l10n
The following table is focused on the list of languages present in the currently 'green status' countries (Argentina, Brazil, Ethiopia, India, Libya, Nepal, Nigeria, Pakistan, Peru, Romania, Russia, Rwanda, Thailand, United States, Uruguay). Countries with other 'status' may benefit from efforts for the 'green languages', plus add their own set of languages. Each language must be fully supported for the Localization effort.
Language | Green Countries | Red Countries | Orange |
---|---|---|---|
Arabic | Libya | Bahrain, Egypt, Iraq (+), Israel (+), Jordan, Kuwait, Lebanon (+), Morocco, Oman, Palestine, Saudi Arabia, Sudan (+), Syria (+), Tunisia, Yemen | |
English | Nigeria, Rwanda, USA (+) |
Belize (+), Pakistan (+), Philippines (+) | Canada (+), Gambia, Guyana, India (+), Kenya (+), Mauritius (+), Namibia (+), Saint Kitts and Nevis, Sierra Leone, Singapore (+), South Africa (+), St. Lucia, Trinidad and Tobago, Uganda (+), Zimbabwe (+) |
French | Rwanda | Haiti (+) | Benin, Cameroon (+), Democratic Republic of the Congo (+), Gabon, Mali, Niger, Senegal, St. Martin (+), Togo |
Hausa | Nigeria | ||
Igbo | Nigeria | ||
Kinyarwanda | Rwanda | ||
Portuguese | Brazil | Angola | Mozambique, Portugal, São Tomé and Príncipe |
Spanish | Argentina, Peru (+), Uruguay, USA (+) |
Belize, Costa Rica, Dominican Republic, El Salvador, Guatemala (+), Honduras, México (+), Nicaragua, Panamá | Bolivia (+), Chile, Colombia, Cuba, Ecuador, Paraguay (+), Puerto Rico (+), Spain (+), Venezuela (+) |
Thai | Thailand | ||
Yoruba | Nigeria | ||
Other non-green languages | Ethiopia, Indonesia, Philippines (+), Pakistan (+), Vietnam | Afghanistan, Albania, Armenia, Azerbaijan, Bangladesh, Bhutan (+), Bosnia and Herzegovina, Cambodia, China (+), Croatia, Cyprus, Eritrea, Estonia, Georgia, Greece, Hungary, Iceland, India (+), Iran, Italy, Japan, Kyrgyzstan, Latvia, Lithuania, Macedonia, Malaysia, Moldova, Mongolia, Romania, Russia, Slovenia, South Korea, Sri Lanka, Tajikistan, Tanzania, Turkey, Ukraine, Uzbekistan, Vatican City |
The following table presents on a per country base the target languages that must be considered for the Localization effort of the countries with 'green status' (Argentina, Brazil, Ethiopia, India, Libya, Nepal, Nigeria, Pakistan, Peru, Romania, Russia, Rwanda, Thailand, United States, Uruguay).
Country | Target Languages | Mayor/important languages | Minor/relevant languages |
---|---|---|---|
Argentina EthnologueAR |
[spa] Spanish | [quh] Quechua (0.85M - 2.1%) | See OLPC Argentina/Languages |
Brazil EthnologueBR |
[por] Portuguese | none reported by Ethnologue BR above 50,000 speakers. | |
Libya EthnologueLY |
[arb] Arabic, Standard | [ayl] Arabic, Libyan Spoken (4.2M - 75%), [jbn] Nafusi (0.14M - 2.5%) |
[rmt] Domari (0.03M - 0.6%) |
Nigeria EthnologueNG |
[eng] English, [hau] Hausa —(18.5M - 13.5%), [yor] Yoruba —(18.9M - 13.8%) |
[bin] Edo (1.0M - 0.7%) official, [efi] Efik (0.4M - 0.3%) official, [fub] Fulfulde, Adamawa (7.6M - 5.6%) official, [fuv] Fulfulde, Nigerian (1.7M - 1.2%), [ibb] Ibibio (1.5M to 2.0M - 1.0-1.5%), [idu] Idoma (0.6M - 0.4%) official, [ibo] Igbo (18.0M - 13.1%) official, [knc] Kanuri, Central (3.0M - 2.2%) official, [tiv] Tiv (2.2M - 1.6%) |
See OLPC Nigeria/Languages |
Peru EthnologueNG |
[spa] Spanish | pending | See OLPC Peru/Languages |
Rwanda EthnologueRW |
[kin] Kinyarwanda, [fra] French, [eng] English |
[swh] Swahili (0.01M - 1.3%) | |
Thailand EthnologueTH |
Thai (dialects?) | [nan] Chinese, Min Nan (1.1M - 1.7%), [kxm] Khmer, Northern (1.1M - 1.8%), [mfa] Malay, Pattani (3.1M - 4.8%), [tha] Thai (20.2M - 32%), [tts] Thai, Northeastern (15.0M - 23%), [nod] Thai, Northern (6.0M - 9.2%), [sou] Thai, Southern (5.0M - 7.7%) |
[ksw] Karen, S'gaw (0.3M - 0.5%), [kdt] Kuy (0.3M - 0.5%) |
Uruguay EthnologueUY |
[spa] Spanish | none other reported by Ethnologue UY | |
USA EthnologueUS |
[eng] English | [spa] Spanish (22.4M - 7.5%), [___] Polish (3.4M - 1.1%), [deu] German, Standard (6.1M - 2.0%), [___] Arabic (3.0M - 1.0%) |
[___] Armenian (1.1M - 0.4%), [___] Chinese (1.6M - 0.5%), [___] Czech (1.5M - 0.5%), [___] Eastern Yiddish (1.3M - 0.4%), [___] French (1.1M - 0.4%), [frc] French, Cajun (1.0M - 0.3%), [hwc] Hawai'i Creole English (0.6M - 0.2%), [___] Italian (0.9M - 0.3%), [___] Japanese (0.8M - 0.3%), [___] Korean (1.8M - 0.6%), [___] Philippines (1.4M - 0.5%), [___] Portuguese (1.3M - 0.4%), [___] Swedish (0.6M - 0.2%), [___] Ukrainian (0.8M - 0.3%), [___] Vietnamese (0.9M - 0.3%), [___] Vlax Romani (0.7M - 0.2%), [___] Western Farsi (0.9M - 0.3%) |
Country groups and descriptions
- Albania
- Argentina
- Austria
- Brazil
- China
- Colombia
- Egypt
- Ethiopia
- Spain
- France
- Germany
- Greece
- India
- Kenya
- 한국 (S.Korea)
- 조선 (N.Korea)
- Laos
- Libya
- Nepal
- Nigeria
- Poland
- Romania
- Russia
- South Africa
- Sri Lanka
- Thailand
- Uruguay
Korean-based Communities
People using Korean as their native language are those in South Korea (한국인) and North Korea (조선인). Some Chinese and those with other nationalities, living in the Nothern part of Korea also are using Korean as their second language, because of some historical issues. They are called as 고려인(Korea-in) and 조선족 (Chosun-zok or Korean Chinese) respectively.
Currently OLPC Korea (or XO Korea) is covering all those nations and regions. In a near future, we hope there will be regional XO groups for those.