Revision as of 19:38, 19 July 2008 by (Talk)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

An interesting idea and not one that has occurred to me before: On the Squeak page the issue was raised of translating the Smalltalk language itself for non-english speakers. Any thoughts on the desirability of doing same for python and attendant libraries?

I'll try to give a technical view of the feasibility of translating Python:

I don't think this is feasible for Python, though it's not totally impossible -- you could create an alternate importer that compiled the translated language (primarily with new keywords)... and maybe that wouldn't even be too hard. You'd have to use another extension to enable the custom importer, and either include the language in the extension (e.g., .py_pt) or in some marker (like -*- lang: pt -*-). Source-level introspection tools may not work as a result (e.g., PyChecker), though object-level introspection tools should be fine (e.g., pydoc, aka help()). With the new AST support in Python, if the translation system generates a proper AST then the keywords will be abstracted out, and so even source-level introspection tools should work.

The libraries are less feasible. Potentially you could extract the documentation that you see when you do help(some_library_or_object), and then translate these strings, and provide translated source. Since the strings are frequently updated (and must be to ensure accuracy) the translation overhead is challenging and substantial.

Actually translating the original source of the libraries is infeasible IMHO.

Also, most Python variables and identifiers must be ASCII strings. That is, you cannot give a variable a unicode name. In theory the importer could translate unicode variable names to a different encoding (which may have to be ASCII safe, unlike UTF7/8, but more like punycode). That will cause all sorts of problems, like weird variable names in tracebacks, since that will expose the encoded variable name, and decoding variable names is difficult since the variable name may be embedded in a string in a non-obvious way). ASCIIfying variables is another option, using heuristics and relying on users not to create ASCII-ambiguous names. This only works well for Roman character languages, though many languages have a Romanized version. Of course, the children don't know that Romanized version.

Alternately, something like PyLogo could be made fully translatable, as it is built on Python instead of as Python, and provides a layer of insulation from the underlying system (so the user is not necessarily exposed to the underlying English-based code).

-- Ian Bicking

python end coder hold is don hort

Personal tools
  • Log in
  • Login with OpenID
About OLPC
About the laptop
About the tablet
OLPC wiki