Bityi/GSoC: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
Line 38: Line 38:


::I believe that if a tool makes creating and sharing translations in real time easy, the translations will happen.
::I believe that if a tool makes creating and sharing translations in real time easy, the translations will happen.

== Design ==

For all of the below, I will use "Spanish" to refer to an arbitrary non-English user language.

A few definitions:

* A "dictionary" is a one-to-one list of words in English and Spanish. Each python file can have up to two dictionaries, one for the public interface (including that which comes from imports), and one for the locally-defined private internals (not including imports).

* A "mapping" is an (almost) one-to-one mapping from on-disk (presumably English) words to on-screen (presumably Spanish) words. (It is permissible for several on-screen words to map to one English word as long as it is unlikely that users would type any but one of them). A mapping is simply the disambiguated union of one or more "dictionaries" - the public and private dictionaries of the current file, and the public dictionaries of any direct imports.

* "Disambiguated" means that if two dictionaries disagree on the translation for an English word (dictone:is<->es;dicttwo:is<->esta) then the Spanish shows both (dictone__es__dicttwo__esta). If they disagree on the translation for a Spanish word (dictone:do<->hacer;dicttwo:make<->hacer), then the two english words are shown with disambiguating prefixes (do<->dictone__hacer, make<->dicttwo__hacer).

* In version 1, the public and private dictionaries are created manually but with computer help (for instance, moving something from any dictionary to the public one is trivial). All dictionary entries also point to their "master version" (either themselves or some direct or indirect import) to enable propagation of dictionary changes.

** A later improvement would be to use a pylint-like static analysis to automatically (re)build the public dictionary by adding public global variables (including classes) and the imported classes they use (superclasses, declared types on functions, and, where static analysis reveals it, instance types). Note that this analysis would also enable many intelligent features such as argument tooltips, intelligent [eclipse-like] auto-completion, etc.

Revision as of 17:29, 25 March 2008

Note:

This page is currently under heavy work. I plan to have a presentable version by 23:59 UTC, March 25th, and probably 6-8 hours before then.

The basic idea

Why can you only program in English-based programming languages? Why can't people to program in something much closer to their own natural language, but have the resulting program be fully portable?

A use-case answer

Pepito wants to program his XO. He opens Develop and creates a new activity based on the new-activity template. The file is in English on disk, but he sees it and edits it in Spanish. When he copies and pastes in some English example code, it switches to Spanish automatically.

He adds "importa xml" to a file, and, since the xml module already has a translation, he can use the functions in that module by their Spanish names, which he can see in a module browser. Except that he had already defined a variable called "analiza" (Spanish for parse), so, in order to avoid a naming conflict, his variable is called "mymodule__analiza" on-screen (and "es__analiza" on disk, as it was before the import) and the xml function is called "xml__analiza" on-screen (and "parse" on disk, of course). By right clicking on mymodule__analiza, he gets a context menu for setting translation, and he selects the option to harmonize the translation; now his variable is called "parse" on disk.

Now he wants to use some example code he found which uses the cgi module. He adds "importa cgi", but the cgi module has no translations yet. He copies and pastes in the example code, and it stays in English. Then he right-clicks on a word to add a translation. By scanning the imported modules, his computer can guess that the translation should be associated with the cgi module, so it puts that module at the top of the list of options where to add the translation. He chooses that module, and gets a dialog for adding his translation (hopefully, seeded with guesses from local and remote dictionaries). He chooses a reasonable translation, and, with another click, his choice is uploaded to a central server so that others can use it.

Now he wants to look up documentation for a function. Suspecting that his question is too specialized to have an answer in Spanish, he hovers the mouse over the Spanish function name, and gets a tooltip with the English name so that he can search it in Google. Or, if he wants, with a simple menu option he can switch his whole view to English, and get the Spanish only in the tooltips.

Later, he sends the module he created to his friend Janinha in Brazil. She imports it and adds portuguese translations to the functions she uses, but leaves his internal variables untranslated because she doesn't care about them.

Why does OLPC need this?

If you intend to have a "view source" key that lets any user modify their applications (and more); if you are shipping to the non-English-speaking world; and if the target audience is ALL children; then you need this. Arguably, any two of these factors would not require localized coding; but the combination of all three does.

But...

  • But many non-native English speakers are programmers, and they will generally tell you that English was not a barrier to learning programming for them. After all, "raise string(variable)", while it is composed of English words, makes as much English sense as "lift twine(capricious)".
    • That's true, but these are a pretty self-selected group. If you intend to expose / teach all the children in a country to programming, forcing them to do it in a foreign language is going to be a significant hurdle. Also, even if the language hurdle is, in retrospect, a minor one, it comes at the very outset of learning to program. Experience in widely varying areas shows consistently that removing initial barriers can have a disproportionate effect on participation.
  • But anyone who aspires to be a good programmer will eventually want to learn at least some English anyway.
    • Exactly. By letting more people get a taste of programming, you will let more people aspire to be good programmers. The end result will be more people learning more English, not the reverse; but along the way, this will also be encouraging viable communities based in non-English languages too.
  • But this has been attempted before, and it has failed.
    • Apple tried to do something similar with Appletalk in the 90s. There are several important differences this time. That was before Wikipedia - before the principle of "many hands make light work" was really operational on the web. Appletalk was never an important language for initially learning programming, as Logo, Basic, Pascal, and now Python all are/have been. The translations only existed for the language built-ins, it was impossible to have an actual program exist in two forms. Unicode had less penetration. Etc.
I believe that if a tool makes creating and sharing translations in real time easy, the translations will happen.

Design

For all of the below, I will use "Spanish" to refer to an arbitrary non-English user language.

A few definitions:

  • A "dictionary" is a one-to-one list of words in English and Spanish. Each python file can have up to two dictionaries, one for the public interface (including that which comes from imports), and one for the locally-defined private internals (not including imports).
  • A "mapping" is an (almost) one-to-one mapping from on-disk (presumably English) words to on-screen (presumably Spanish) words. (It is permissible for several on-screen words to map to one English word as long as it is unlikely that users would type any but one of them). A mapping is simply the disambiguated union of one or more "dictionaries" - the public and private dictionaries of the current file, and the public dictionaries of any direct imports.
  • "Disambiguated" means that if two dictionaries disagree on the translation for an English word (dictone:is<->es;dicttwo:is<->esta) then the Spanish shows both (dictone__es__dicttwo__esta). If they disagree on the translation for a Spanish word (dictone:do<->hacer;dicttwo:make<->hacer), then the two english words are shown with disambiguating prefixes (do<->dictone__hacer, make<->dicttwo__hacer).
  • In version 1, the public and private dictionaries are created manually but with computer help (for instance, moving something from any dictionary to the public one is trivial). All dictionary entries also point to their "master version" (either themselves or some direct or indirect import) to enable propagation of dictionary changes.
    • A later improvement would be to use a pylint-like static analysis to automatically (re)build the public dictionary by adding public global variables (including classes) and the imported classes they use (superclasses, declared types on functions, and, where static analysis reveals it, instance types). Note that this analysis would also enable many intelligent features such as argument tooltips, intelligent [eclipse-like] auto-completion, etc.