Talk:Bityi (translating code editor)

From OLPC
Jump to navigation Jump to search

Scintilla is best starting codebase I know of as it has widest use. However, this is a new concept, so developing something from some other codebase or from scratch is not out of the question. I have had trouble in the past few days even getting on to the Scintilla mailing list or getting any live response, so this may well have to be a fork anyway. --Homunq 17:07, 25 July 2007 (EDT)

Name collisions

It looks likely that name collisions will become common in a system like this. Your suggestion of prefixing names will reduce the problem, but will then introduce other problems if the raw source wants to be viewed in english again.

My suggestion would be firstly to try implementing a simpler system where the code is edited in english, but if any key word is mouse-overed, a translation is shown in a tooltip. A short description of the syntax of each "command" could be shown in the local language in the tooltip, together with an example usage and more details a hyperlink away.

Since most programming languages only have tens of keywords, I can't see that this is much of a barrier to entry, especially if tooltips are provided. The tooltips also ought to be provided for common objects and functions. The tooltips should be provided in english as well for the english olpc versions - it would probably encourage more developers if things like the sugar api's could be briefly explained in a tooltip rather than wading through pages of in depth documentation.

I don't know what the status of the current "Develop" activity is, but the tooltips could go well with auto-completion. See Visual Studio .NET 2005 for a very nice implemention of that - they call it intellisense or something - have a look on wikipedia (http://en.wikipedia.org/wiki/IntelliSense) if you don't have windows.

Using tooltips dramaticly simplifies the programming and corner-cases, together with confusion to the user when things don't quite work right with translation due to problems with certain local names conflicting with keywords, while only slightly increasing difficulty for the user in my opinion.

One difficulty, which may rear it's head with either method, is that some keyboards won't be able to type standard latin characters, or may be missing some of the characters required for a certain programming language (eg. not all keyboards have the three types of braces "([{" ). The only solution I can think of would be to have a symbol palette somewhere.

As far as support goes, I don't have much experience with python, so I don't think I can help since olpc seems written in python (if they could re-write the kernel in python I guess they would...), but on a more serious note, If you decide to implement the tooltip idea first, I think that could be done singlehanded. If later full translation is required then a lot of the tooltip translation code could be re-used.

87.127.98.185 15:34, 28 July 2007 (EDT)

My suggestion would rigorously eliminate name collisions. Yes, it would mean that code written in Spanish and viewed in English would still have Spanish variable/function/object names, including a few which, due to collisions with English keywords (or even collisions with English v/f/o names that happened to be dynamically added to the translation mapping by the Spanish programmer), have the prefix es_. The Spanish names are pretty much unavoidable and there is a possible solution for code that gains widespread use in English (a retranslation tool which also adds things to the dynamic translation mapping). The prefixing is exactly the correct behaviour in this case. And besides, this stuff will be MORE of a problem if you do things with tooltips (think about it).
As a teacher, I can say categorically that even 10s of keywords would be a barrier to many kids. Not all, by any means; but we should go for maximum accessibility.
ps. Who are you? You should register, it would be good to have at least a screen name to respond to.
pps. Thanks for the comments, and no need to make excuses for not volunteering to help. But honestly, if Python has any advantage, it's that it is is easy to pick up. Whether you like it is another issue, but I'm sure you could hack it. :) Homunq 17:23, 28 July 2007 (EDT)
OK - you pursuaded me to sign up - just getting an account... 87.127.98.185 05:28, 29 July 2007 (EDT)
Back again with a shiny new account. I see your point about "even 10s of keywords would be a barrier to many kids", and since I only speak one language, I'll take your word on that.
For the full translation, there are quite a few more things that need to be considered. For example runtime errors that mention a refrence to the code (eg. Error on line 17, column 21) would need to be dynamicly adjusted. Also, if the interpreter gives a code snippit around an error it would need adjustment.
I'd thought of that, it's not particularly difficult if you're in some kind of "debug mode" which brings up the code on error. If you're in a "user mode", errors shouldn't show code anyway, they should fail as gracefully and as softly-silently as possible. Homunq 11:14, 29 July 2007 (EDT)
Also, if it's a programming language that allows code in strings to be evaluated, that could cause big problems, because that would require translation at program runtime rather than just on file open or save. For example Javascripts "eval" function (http://www.w3schools.com/jsref/jsref_eval.asp). In that case I can't see any clean way of allowing localized keywords to work without serious modification to the interpreter, and in many cases it would mean that the program actually runs differently.
Hadn't thought of that. Python does have such abilities, though they're not particularly widely used AFAIK. Brainstorming solutions: special quotes for potentially-executable string literals, so that they end up in English on-disk as if they were code; the ability to run the translation functions in either direction for whenever there's user interaction with these strings as such.
Even with using full translation, I think tooltips with descriptions and "intellisense like" features are useful, especially if you're new to programming in a particular area and want to know the class etc. names.
I would consider it important for it to be very easy to toggle the translation on and off - particularly because if someone always works with translation on they won't be able to program on another platform that doesn't have translation. Hello1024 05:43, 29 July 2007 (EDT)
Agreed and agreed. The dev environment already has some tooltips if it's based on IDLE as it appears from screenshots. English versions should be visible in the tooltips and by toggling off translation, for tech support/collaboration reasons.
But actually the "they won't really learn to program" is not true - once this translation functionality is coded well here, it will be relatively easy to port to ALL LANGUAGES AND PLATFORMS and MANY MODERN IDES, meaning that someone could spend an entire life as a professional programmer without ever typing "if". Obviously, they'd still learn English eventually to be able to communicate and read comments and such, and so they'd know what "if" means, but they'd never have to retrain their fingers if they didn't want to. Homunq 11:14, 29 July 2007 (EDT)

Design thoughts...

are at Source-code editor with transparent native-language display/design. AND LERYIA ATTENINDING UCF WAS SO AMAZING AND SHOCKING

Older discussion from main page:

To make my proposal a little more specific:

  • Based on scintilla (BOB open-source editing component, already does coloring and folding).
  • If the user unknowingly used an English keyord on-screen, it would be "escaped" with a prefix like "ES_" on disk.
  • Similarly, if a program used a whateverlanguage keyword, it could be escaped on-screen.
  • A right-click on any word shows the English version, obviously includes easy option to turn translation off globally.
  • By default, only translates keywords for given programming language, but includes option to have cascading translation files for files and the libraries they use. These could be created on-the-fly using right-clicks with dictionary support.

a few implementation brainstorms...

  • Only what's actually on screen need be duplicated in memory
  • The cursor counts as a wordbreak for speed reasons.

Interested? Contact me.... I will be continuing to explore this idea, but I'm not going to jump in with both feet unless I have some backup.--Homunq 16:20, 25 July 2007 (EDT)

Some details of scite/scintilla: GTK based, as is takes under 2 megs decompressed and with lexers for many many languages.

But...

I just read the OLPC interface guidelines, particularly the "view source" key. From one angle: that just emphasizes that this idea should be absolutely central to this project, I'm actually even more surprised than I already was that (as far as I can tell, sorry if I've missed you) no-one has thought of it. From another angle: if you want to be able to view the source OF the view source activity, a laudible goal, that would tend to indicate that for OLPC purposes this would be better based on IDLE. But my interest in this idea is not solely OLPC-based. So... again, contact me if you want to collaborate with me on this one. If my first collaborator wants to go with or other Python/TK based editor, great, I like to program Python; if they prefer scintilla in C++, great, I think it will have a wider impact.

Either way, I repeat: this idea is absolutely vital to OLPC fulfilling its goal (perhaps not the main goal, but clearly a goal) of creating as many programmers as possible.Homunq 06:05, 28 July 2007 (EDT)

And then...

Further browsing and exploration finally leads me to Develop#Human Language and Culture Concerns where this issue has been discussed. As you can see from what I say here, I think that there are simple solutions to the issues raised there. I'll go continue this discussion over there. Homunq 15:30, 28 July 2007 (EDT)

Isolating

This will serve to isolate the programmers. It locks them into a programming ghetto. They won't be able to write normal code with a normal editor. They won't be able to follow examples on the web.

It's not as if anybody needs this. As proof, I offer English. The "for" loop common to numerous programming languages has nothing to do with the English word as far as I can see. Neither does the "static" keyword. We get by OK. We just memorize what the keywords do. We even deal with unpronouncable things like "^" and completely weird usage of basic punctuation.

AlbertCahalan 00:06, 10 August 2007 (EDT)

I understand the point, and it is a possibility.
However, I'd argue that, in making the first step into programming easier, it is a good thing; further steps, of learning "keyword English", can and will come easily later. After all, which was harder to learn, the first computer language you knew, or the second?
Consider the following scenarios: A child who is literate in Arabic presses the "view source" key and sees an impenetrable mass of foreign symbols; or, they see some confusing words which seem to have their own strange logic. Which do you think is more likely to lead to that child, 10 years later, being fluent enough in English to search for code snippets on the web? It's not clear-cut, but I'd say the second.
For an Arabic child, I suspect that left-to-right is the biggest problem. Mixing in Arabic, with the result being bi-directional, seems nightmarish. Have you ever tried selecting text across a direction boundry? It's mind-bending. Consider an English word followed by an Arabic word. Put the mouse at the left, hold down the button, and move right. At first, you select more and more of the English word. As you cross the boundry, suddenly the whole Arabic word gets selected. As you keep going, the selection appears to split in two as you begin to deselect the Arabic word! Now think about indentation with tabs. Even better, think about mixing tabs with spaces. Perhaps being able to turn off bi-directional text, forcing things to go in either of the two other directions at the click of a button, would make things easier to deal with. This would require usability testing with an actual kid though. Also note that it is normal to use a fixed-spacing font for programming; Arabic is often said to look nearly unreadable in such a font, especially if you leave out the glyph form substitution. AlbertCahalan 21:42, 25 August 2007 (EDT)
You raise some good points. Also, I recently went through and tried to translate the python builtin functions and classes to Spanish - and yes, translation is never as clear-cut as you imagine it might be.
So, I understand your skepticism. See my response to your next comment for some big-picture responses. As for the specific problems you mension, I can come up with what seem to me to be "reasonable" answers. For instance: yes to bidirectional text, but have a "dominant" direction that governs everything above the scale of a single word/string literal. Then any selection which starts in text with the dominant direction would be unable to have an endpoint inside the non-dominant direction. (.this like ,reversed was which text get d'You) Non-dominant words would become essentially single glyphs - which is exactly what they are, from the point of view of the lexer. As for monospace, there are very few examples of ascii art in programs, the only real reason you need it is for indentation. A special font with a nice wide space would handle 90% of the cases, and if you wanted 99% you could specially format the leading spaces on a given line to have the same width as the same number of characters at the start of the preceding line. (It's also nice for making "table" output in an interactive session - but if you really want to rely on the interactive interpreter for learning, you should put Gecko in there anyway...?)
Often people line up comments to the right of the code. (eh, to the left if you flipped the whole thing around for Arabic) Without fixed-width fonts, you can't do that neatly and the result will change with the font. I have in fact seen ASCII art in code; on rare occasions it is really important. Most importantly though, it is good practice to line up similar code for easy reading. If some adjacent lines are very similar, with the second line missing something from the middle, then putting spaces in place of the missing part can help the reader to quickly see that the lines are the same in all other ways. AlbertCahalan 02:32, 29 August 2007 (EDT)
Just checked out what Google does - I translated a sentence with a made-up word and it changed the made-up word into (presumably the corresponding) Arabic letters. If that mapping is one-to-one I think it would be fine to use a similar trick.Homunq 01:21, 28 August 2007 (EDT)
Or for you: say that SQL was in russian, with cyrillic letters and all. Sure, you could learn "выберите ... из ... где ...", and read it in all the examples on the web and in your programming books - but if an editor plugin came along that let you write "select ... from ... where ...", wouldn't you use it in your own coding? I sure would, and "ghetto" be damned.
I know I would be tempted, but hopefully I would resist that temptation. Lots of tempting things are bad for me. Providing harmful temptations is not good. Much can be done with tooltips, autocompletion, and translated language documentation. Being able to quickly go from a keyword to the documentation would be really helpful. BTW, a flaw in that Russian example is that plain ASCII is much more universally typeable than Cyrillic is. AlbertCahalan 21:42, 25 August 2007 (EDT)
"Bad for you/harmful" in this case sounds like "actually solves actual problems in the short run, but might create some new ones in the long run". I can understand if you've configured your intuition with that identity as one heuristic, but it's not an axiom, and for kids who may or may not decide to get involved in programming, I'd suggest it's not even a good heuristic.
Note also that we're actually not so far apart. I very explicitly mean my tool as a stepping stone to English, not as a destination in itself. As a stepping stone, I agree that the design should focus on minimizing the hops to either bank, not just the first hop - if this tool doesn't make it easier to learn in English later, I have failed. For instance, "is not" is used in python as a special case, if you just gloss that to "es no" in Spanish it is very jarring (the "not" refers to what follows and not to the "is"). For a while I was considering having a special case to flip it around, now I think not. I totally take your point about tooltips etc. - in a way this is really just one very big, involved tooltip. For instance - one person has suggested using Twext to show both versions at once, I see that as a worthwhile goal for "eventually".
Anyway, my work on this is pretty far along. May I ask if you are bi/multilingual? So that you could try it out when it's ready? I'd value your input. Homunq 13:36, 27 August 2007 (EDT)
Final point: it is entirely possible that, 10 years from now, this feature is as common as syntax coloring in any code editor program (and exists as a browser plugin, and is automagically done by "did you mean" in all major search engines, etc.) If the ghetto is the whole world, where's the ghetto? Homunq 13:30, 10 August 2007 (EDT))