Guidelines for writing for eventual translation
Writing with multiple language translations in mind
- Write/ Edit primary documentation according to an explicit set of writing conventions designed to minimize ambiguity and complexity in order to facilitate translation.
- Treat this English documentation as source code which is meant to be translated/compiled into user languages.
- Use/Create collaboration tools to make translation, distribution, and maintenance of docs more efficient.
Some of those doing translation will not be professional translators fully bilingual in English and the target language. They might be any of the following:
- a village teacher who speaks the target language as her first language (L1) and English as a weak second language (L2);
- a missionary who speaks English as L1 or L2 (in the case of a French missionary in Africa, for example) and the target language as a weak L3;
- a professional translator who speaks a non-English L1, reads and writes the target language as L2, and knows English as just a subject that he or she studied in school and uses for travel;
- a native L1 speaker of the target language who has immigrated to a foreign country in which English is spoken as a primary or secondary language.
Many of the translators are not going to be career translators, so rather than having the translator accommodate the source text, the source text should accommodate the translator.
Documentation translation is particularly difficult because of inconsistencies during creation of the document.
Ambiguity is the biggest obstacle to translation. In the case of OLPC documentation, ambiguity should be avoided at all costs. Anything that interferes with teachers and students using the laptops should be avoided, and bad docs would certainly be frustrating and demotivating for the educators and pupils. In order to have translations that are as clear as possible, we must have source docs that are as clear as possible.
Consider documentation/ translation as parallel to computer programming
The OLPC team uses English as a common working language, but the users will be using translations, so the English documentation can be seen as not a product in and of itself but as the source for all translations. The English-language "source docs" should be written to a set of conventions meant to reduce ambiguity and ensure consistency, even when doing so necessitates violating conventional English writing style. The set of documentation standards I am proposing is similar to the set of coding conventions a programmer follows. The "source docs" (though written in English) should be seen as source code which is then compiled (or translated) into the many languages needed to support the users. Likewise, the source-docs should include explicit comments and extra-textual blocks to clarify ambiguity introduced by the writing style or inherent in the language itself, much in the same way that a good programmer includes comments in source code to compensate for the lack of explanatory devices in the code itself. Looping through a multi-array doesn't tell you WHY you need to do so or how it plays into the next code block, just as being told that the subject of a sentence is "Suzuki-san" does not tell you if Suzuki is a "she" or a "he". Most techs have had the experience of having to maintain a code base which did not include sufficient comments: while "read the friendly code" or "use the source" might be good ways to learn to program, this kind of detective work is not an efficient use of time and effort.
Documentation writing conventions
Some linguistic research has been done on "simplified English" as a subset of English to use for low-level learners, and it might be a good place to look for ways to simplify the source docs. But just thinking intuitively, I have cooked up the following suggestions in order to generate discussion:
- Use the first-person singular pronoun "I" to represent the author of the docs,
- the second-person singular pronoun "you" to represent the reader of the docs, and
- the first-person plural pronoun "we" to represent the OLPC project.
- Examples. "We have designed a screen that switches to black-and-white to conserve energy. I will explain how to switch your screen to black-and-white. First, you press the X button on your keyboard...." Because we want the docs to be easily translated and easily understood, the tone should be personal, using "I" for the voice of the writer. This will be easier for amateur translators to translate and easier for younger readers to understand. This will also help the writer avoid the passive construction, which is very difficult for some non-native English speakers to understand.
- Use tables to explain parallel relationships, comparisons, the composition of an entity, and categorical relationships.
- Use numbered lists to explain the stages of a process, the steps in a sequence, or anything that has an inherent spatial or temporal order or expresses precedence. Do not use numbered lists if the numbers do not relate to some inherent property of the items. A grocery list should not be numbered, unless the order in which the items are purchased is important.
- Use bulleted lists for lists that do not have inherent order or precedence. The grocery list would be bulleted.
- All comma sequences should have a comma before the last conjunction, i.e. "I like to read books, eat shrimp, and run marathons," rather than, "I like to read books, eat shrimp and run marathons." It is fashionable right now to leave out the last comma, but doing so puts the onus of comprehension on the reader. While this is a nit-picky detail, OLPC source-docs should do as much of the work as possible so that translation and comprehension are as easy as possible.
- Use parentheses to include supplemental information like the gender of human agents, steps in a sequence, the target of a pronoun, etc. when there is any ambiguity.
- Many languages, including Japanese, represent non-native names in a native writing system. In Japanese, foreign names are written in a phonetic script called katakana, and my name is pronounced Kuupaa Maikeru. The result is that there is a loss of data; the orthography of my name (the spelling in English) is lost to any Japanese-to-English translator, as is the proper pronunciation. I suggest that all source-docs have personal names written in the alphabet and followed by the pronunciation written in IPA (International Phonetic Alphabet) in parentheses behind it. Then translators should be told to always put the original orthography in parentheses after the name that they are using, so that my name would be "<katakana>Kuupaa Maikeru</katakana> (<alpha>Micheal Cooper</alpha>)" in a Japanese translation.
- Insert a table that acts as a glossary of terms and their definitions at the beginning of each text. These would be the key nouns and verbs used in the text, terms that need to have clear meanings and consistent translations. The translators would be required to keep cumulative lists in OO Calc or such of these key terms so that, in the case that the translator changes or a group of translators is doing the job, the key terms can be kept consistent. This area is where Pootle can be helpful - translators should refer to the strings used in the Pootle translation management system.
- Idioms and culture-specific metaphors and references should be avoided or used sparingly. Of course, terminology that originated in cultural metaphor, like "kill a process" and "reboot the server" would be treated as key terms and added to the glossary to be translated consistently, but more creative and expressive language ("you can type like a banshee", "students will be on it like white on rice", "resulting in a Mickey Mouse, vanilla solution to the problem") should be curtailed.
- Use words, mathematical symbols, and visuals to reinforce and enhance purely verbal explanations with conceptual representations of information (I am thinking Edward Tufte here), i.e. (poor example, but here goes) "I will show you how to teach your students to create multimedia presentations. <in box> Sound + Pictures = Multimedia </in box>." I think you get the idea, though.
- The source-docs should be organized so that each section and each paragraph is identified by a number and that the translators be required to maintain this organization so that paragraph 61 in the Yoruba translation is paragraph 61 in the source-docs. By doing so, it will be easier to modify the translations when changes are made to the source-docs, and an added bonus is that bad images or broken links can be replaced by people who do not read the target language. This would imply some kind of web-based app to store and manage the docs. I am looking at the way we translate in my organization (Miyazaki International College, Japan) and thinking about what would be a good online tool to coordinate translations. There are many proprietary tools with vast hoards of features and complications which cost 1-2 thousand dollars per user, but they are not suitable for OLPC. I think OLPC docs-trans would do well with a lighter, simpler application.
Examples of questions from translators
Q: Various times the document refers to storing pictures on the XO. Does this mean pictures like drawings that you make on the computer, photographs from a camera, or both?
A: An excellent question. Kids can take a photo-type picture with the camera on the front of the XO... but they can also store drawn pictures when using the Draw Activity. There are two references to "picture" on the XO screen that refer to "symbols" or "icons" also. Sorry about that, I'd prefer we were more consistent especially to try to make your translation work easier.
Q: What do these mean? "Neighborhood View Key" "Friends View Key" Is it like a legend that explains what things mean when you are looking at something in the "neighborhood view" or in the "family view?" Or does it turn on the "neighborhood view" or the "friends view?"
A: Those terms are invented to help the user orient themselves when looking at the XO... there are basically four views, with four "symbols" in the top "frame" or "bar." The symbols are called "View Key"s. When viewing it on an emulator, I see "Neighborhood" and "Group" when I hoover over the symbols that will switch me to another view. In Windows, its similar to pressing Alt + Tab to get to another "window." Or its as if there are four different "desktops" if you were using Windows except they're called "views" instead of "desktops."
So... the user will see "Neighborhood" and "Group" in Spanish (Vencindario and Grupo?) Im not sure how to emulate in Spanish so I could see what the interface itself shows. We'll need to refer to the Pootle collection of translated strings on the OLPC Wiki for the answers for each language.
Q: What does "Mesh" mean?
A: I had to ask my sysadmin husband for some help with this one. :) Mesh is what theyre calling this special type of network where all the XOs become their own network "mesh." Our ordinary laptops dont work this way, they dont create a "grid" or "mesh" or "matrix" of connections to each other. So the XO is special in this way, so I think you should probably just keep it as "Mesh" and leave in English. Do the rest of yall agree?
Q: Are some of these areas like "frame" or "Neighborhood View" actually labeled on the screen? What terms should we use for interface areas that are not labeled?
A: Theres no label when youre in the "Neighborhood view" and its actually a little odd but you figure it out quickly - you have to move your mouse all the way to the upper corner in order to get your "buttons" or "View Keys" back in the upper "frame." So you have plenty of wiggle room here in your translation. The main thing is to let users know that theres a grey box (frame) around the screen. Frame and View are two items that are not labeled that I can see when emulating.
Originally written by Micheal Cooper, Japan, and posted to the devel-list . Edited by Anne Gentle from the writer's perspective and added to the OLPC Wiki after answering translator's questions.