System software

From OLPC
Revision as of 08:28, 25 July 2006 by Memracom (talk | contribs) (Light-wight scripting language (Lua?): moved to discussion)
Jump to: navigation, search


Software Ideas

Usability

Include in the OS an onscreen transluscent/watermark representation of the keyboard that indicated which key is pressed would help users learn to touch type since they would not have to look down at the keyboard, but could be turned on or off (better yet, varied in opacity from 0 to 100%). Otherwise, how many will even know to try typing while looking at the screen? It would also help in low light [as was suggested under hardware by another], as many won't have power for lighting yet would need to use it only when light for after school chores are completed, leaving only late in the day and many will be in latitudes with SHORT winter days.

System Software

A version of Touch Typing software to teach these kids to touch type, the faster you can work with a keyboard whatever age you are then the faster you can get on with solving the worlds problems and letting the world know about your solutions... ' eg unjustified government spending on military budgets that will eventully only lead to one thing, more War to justify more spending etc..."

And does someone not need to make clearer in your marketing for support of this project that it does not have to be the same person who turns the crank as types at the keyboard, that there is a shortage of electricity in developing nations not hands to turn cranks?


Peer To Peer Distribution, for Electronic Text, Software, Email

Extending the original idea from below... this is more general then just about electronic text, though. In lack of a better term, let me call it "built-in support for non-real-time Internet connectivity", provided as shared service and usable by apps.

For example, I myself often read some web pages that I had downloaded while on the network at home while traveling, disconnected from a network, and of course when clicking on a link you get some stupid technical error message. Why can't the thing remember I want to read the linked page later and "queue" it somewhere? This idea is probably more much more relevant in some OLPC scenarios than it is for myself; what if you are connected to the "Internet by Motorbike" say only once every two weeks, as in the Motoman project in Cambodia?

This applies to many forms of data, from electronic content be it a complete ebook, HTML page, Email or some software to download - or publishing of content such as homepage or blog updates, etc. (I think OneWorld has an XML-based publishing along those lines; but could be confusing it with something else.) Making it possible (and easy!) to request, and publish, data from one device, which then forwards the reqest to another, and ultimately forward to Internet when connected. Doesn't it make you feel like good ol' FIDO Net is back?

Vorburger 20:06, 9 February 2006 (EST)


Distributed Filesystem?

Will the Wikipedia Offline fit into 512 MB (or even 1 GB) ? Even if it does, how about some software and other textbooks loaded at the same time? Clearly, the storage on one device is very limited... but: What if data could be spread over several laptops, a sort of built-in distributed filesystem like Coda or MogileFS - do these make any sense on a device like this, with the goal of enhancing storage capacity through distribution? In a school, every of say 100 children has 1/100th of Wikipedia - instead of clogging each device with a complete copy.

Vorburger 20:06, 9 February 2006 (EST)

There is a version of Wikipedia in Simple English that is smaller than the main one. This is more appropriate for an children's educational project and because it uses fewer words than normal English, it probably will compress better. In any case, dumping an Internet encyclopedia into a kid's laptop makes as much sense as hiring a bus driver to fly a 747. An OLPC encyclopedia needs to be edited severely to reduce its size, make sure the language is understandable by kids and make sure that it has appropriate content. Biographies of all the kings of England are only relevant to English speaking kids, not Hindi speakers or speaker of Brazilian Portuguese.
Since I assume each school will also have an uplink gateway, maybe the Offline version of Wikipedia could be put on that, and cached on the individual units as they access it?
The idea is a good one but your implementation is flawed. Yes, each school will have some sort of system by which content can be downloaded. It could be as simple as a stack of CDROMs and a USB CDROM drive. Or, the teacher could hook the CDROM drive to her own laptop and push relevant content to the kids. In order for this to work, the encyclopedia editors have to chunk the text into thematic clusters. This means that a kid still has to do research in the encyclopedia because he has the whole Brazilian national history chunk in his laptop. But he doesn't suffer by having to carry all the Flora and Fauna of Brazil chunk, the World history chunk, and the Birth of Civilization chunk.

Grid computing

It would be interesting if software were included to allow meshed machines to create an ad-hoc grid/cluster computer. It would be useful for things like compiling software, rendering and other CPU intensive tasks. (Stuff that I imagine some of the more advanced users, High School age, might want to do). A distributed file system would be a central part of that.

  • A practical alternative, one that can be done now, is to use content in DVD (as suggested in the previous section). Some "hotspots" covered by these DVD-augmented laptops can be setup in a community, providing distributed servers for giving out content as well as hosting discussions. As the OLPC machine has USB port, adding DVD drive to it is not difficult. - Raffy, April 27, 2006.

Better-performing Flash Filesystem

The proposed JFFS2 filesystem was designed for NOR-type Flash memory, which has very different timing characteristics from the cheaper NAND-type Flash memory used in USB thumb drives and, presumably, the laptop. YAFFS is a GPL'ed open-source journalling filesystem designed specifically for NAND Flash memory that is claimed to use less RAM for its tables and generally outperform JFFS2, and they are working on YAFFS2, which is tweaked to be faster and to work with the new larger, 2KB-page-size NAND devices.

YAFFS has the following technical advantages over JFFS2:

  • It uses NAND Flash memory better, making it faster (about 2X), more space-efficient and wearing the memory chips out less quickly
  • It is faster at mounting a filesystem: a hand-waving example of startup time for a 128MB device is 3 seconds instead of 25
  • It uses far less RAM for its internal tables
  • It scales better: JFFS2 is said to fall apart above 256MB because its internal data structures get too big while YAFFS is known to work well up to 2GB (the laptop currently aims at 512MB)
  • It stores error-correcting codes for all data, which is essential since NAND Flash is supplied not 100% perfect and degrades over time
  • YAFFS provides some features lacking from JFFS2 (hard links, memory mapped file writing)

JFFS2 has the following advantages over YAFFS:

  • It has built-in write-time data compression
  • It is included in the standard Linux kernel

YAFFS has a home page and a there is a technical article which goes into depth on the differences between NOR and NAND flash memory and the drawbacks of using JFFS2 with the NAND type.

It would be worth running comparative performance tests on the two filesystems, because there are big potential performance wins on several fronts. In-filesystem compression isn't everything, slows all file operations down and, when used without error correcting codes onto an unreliable medium, risks major data loss.

Martin Guy 4 March 2006

Jörn Engel is currently working on a new flash file system called logfs. It is not yet clear if it will hit the mainline kernel in time for consideration for the first generation laptop, but it is progressing fast. It should combine all the advantages listed for either for the two file systems above with a new clean design. In particular, the mount time and memory footprint is independent from the device size, unlike the existing file systems.

I don't think that YAFFS can be considered an option for OLPC at this point because of missing compression and the quality of the code.

arnd 12 March 2006

Some corrections to the YAFFS marketing blurb:

  • Error correction is done by the NAND subsystem and not by the filesystem. It's a necessarity for NAND FLASH and the NAND subsystem provides that protection since the very beginning. JFFS2 just uses whats there. No need to reinvent the wheel.
  • JFFS2 worked on 2k page size chips before YAFFS2 showed up
  • JFFS2 has raised the bar in the boottime and scaling p*ssing contest. David improved mount time of a 512MiB FLASH down to less than 8 seconds and the RAM consumption has been reduced significantly too.
  • JFFS2 works out of the box with the MTD subsystem while YAFFS needs tewaks and patches and is hard to adopt to hardware ECC controllers

3d software rendering

As the system does not include hardware accelerated 3d rendering, a software rendering library may be included to wrap the OpenGL (OGL/ES maybe) API and create rendering code on the fly. This, even on a machine with limited clock speed can provide a rendering performance paragonable to that of some integrated 3d chipsets, especially if the resolution is kept low. This could allow educationnal software to use 3d rendering (physics and mathematics softwares could take advantage of this). There are some existing tools that can be leveraged for this; for example, Vincent is an OpenGL/ES implementation that provides software rendering for constrained devices like cell phones; SwShader, precursor of transgamings' SwiftShader and many others. Having (limited) OpenGL capability does add some capabilities to the device without requiring additional hardware.

Software Installation, Package Manager, Central Repository

How relevant is a polished end-user friendly Package Manager? With limited memory, are you more likely to uninstall and try another application and install back one? In the beginning, how important is it to be able to very easily get patched new versions of the software? Underlying question: Is a central repository of applications desirable? Completely open, anybody can submit their (pre-compiled) package?

Vorburger

Should there be an easy way to install and remove applications from the device without corrupting the system image? I am thinking of something like klik (http://klik.atekon.de/). -- DPalmerJr

-> An initial proposal and a proof-of-concept demo is here. -- Probono

I am on a team developing a deeply embedded losely connected ARM-based Linux system (64 MiB RAM, 512 MiB disc). We have discovered the hard way that it's best to support in-field upgrades -- right from day 1. Even with an effective release management + testing/validation team, specs will change, improvements will be made, bugs will slip through. Our devices are connected via slow satellite links and connect to our infrastructure as infrequently as once per month. We cannot feed a lot of data through the link without blowing our power budget. Even if/when we are willing to risk an over-the-air in-field upgrade, we may not have the bandwidth/power budget. We have found conventional package managers (dpkg, rpm) are too coarse-grained when dealing with skinny pipes and power budgets. A package manager supporting deltas would be preferable. We have even considered downloading source patches and re-compiling on the embedded device. Your network will be faster than ours, so YMMV.

System development + testing will benefit from a slick patch/upgrade mechanism too.

I don't think it's unreasonable to expect to upgrade the devices via the mesh cluster - upgrade one device and the rest can upgrade from it. Use public-key-encryption to sign 'blessed' packages.

I consider a well-thought-out, secure, trustable, user-controlable package management system to be critical to system stability, extensibility, maintainability, and ultimately to the success of this project. -- BCL

-> A system using bundled, self-contained applications like this could facilitate mesh-sharing of applications. -- Probono

Laptop as USB-Drive

It would probably be useful if the laptop could be accessed as a USB-Drive, like a digital camera.. In the Software Development context hackers could probably also configure File Sharing via the WiFi... but simple "USB cross cabling" could be interesting to end-users because it's: a) most simple, b) secure, probably OK to give access to entire filesystem, if locally attached, c) doesn't need Wifi; the nearest Internet Cafe in a bigger town will let children/teacher USB-connect their laptop to one of their stations to copy over a newly downloaded application, but not have a Wifi basestation; at least not where I have travelled in India.

Why take the laptop to the big town when you can take a thumbdrive instead. Better yet, why not just wait for the content to come to you on a CD-ROM. Send an email by motorcycle-net to order the content you want, and next week, the Motoman motorcycle brings it on CD during the regular delivery. Works in Vietnam.

Vorburger

Maybe a software can be developed for this. Since the system is going to be "Linux Based", just accesing the filesystem should allow to configure almost everything. A software that gives access to the filesystem (and emulate a camera or an USB thumb), could be included. Or maybe, a special cable provided with the laptop (that uses one special of the 3 USB ports) could allow direct access to filesystem. (or with a switch somewhere in the laptop that even without power makes it work as a USB-Drive, even with the posibility of charging batteries while connected).

Gandolfi

Hard-Reset built-in

Curious kids will certainly easily manage to screw up the software side of the device - and they should! A built-in hard-reset that can re-initialize the OS etc. from ROM; sort of like some modern laptops have a hidden partition on the HDD that can re-install without the usual Recovery CD, could be useful.

You always have the problem of personal data, files, and configuration settings. Some solution for that would have to be provided; e.g. easily copy to your friend's device over the wireless network?

Vorburger

This is a very good point. If we use a compressed read-only file (or partition) with most of the filesystem (specially the part under /usr) we can not only stuff a lot more software in there, but also resetting would be a much simpler operation. Basically all it had to do was to untar a "factory default" tar file (or something like that) into the writtable part of the flash storage.

We could have a boot option, where the user would type "reset" or something like that, to boot a "rescue" kernel and initrd that just did this operation. -- Paulo Marques


There's a problem in the Microsoft Windows world with newly-installed systems. You have to go on-line to get the latest security patches from Microsoft. But as soon as you go on-line with an unpatched system you're at risk of infection from viruses.

The reset operation could be integrated with the patch/upgrade mechanism whereby the system will only install secure signed OS-level packages until either the system or the user decides it's OK to open the doors for business. -- BCL

Font technology

Which font technology is to be used?

The OLPC uses Linux with GTK which includes Pango as a component. It also uses FreeType which means that the OLPC uses cross-platform TrueType fonts.

Yes, OpenType fonts will much better render complex scripts. The Pango and SIL Graphite projects are cooperating on the design of their rendering engines and the fonts they will need.

Anyway, this isn't a problem that OLPC needs to solve. Experts are working on it. OLPC will leverage their work. For an example of why OLPC is not directly working on the font problem, read this article on Tibetan writing.

It is important that a thorough analysis of the character rendering technology and font technology needed is carried out.

--SIL is the world's foremost research institution on such matters. They work in more than a thousand languages, and maintain the Ethnologue catalog of more than 6,000 documented human languages.

--There are such experts at the heart of the font and rendering engine initiatives described throughout this article. I have observed experts from universities and from SIL, Red Hat, Apple, Microsoft, Sun, IBM, Hewlett-Packard, Evertype, commercial font vendors, vendors of font creation software,...since that is who makes up the Unicode Consortium. The portion of their efforts that goes into Linux will inevitably end up on the OLPC products.

Here is a transcript of what I wrote before.

--I use Pango rendering and properly implemented TrueType fonts on my Linux system to render conjuncts without difficulty. Some TrueType fonts have the glyphs but not the substitution tables; they render with great ugliness. The Akruti fonts, developed in India for all of the major alphabets of India, were placed under the GPL (GNU Public License) as Free Software some time ago (on Gandhi's birthday). There are distributions of Linux in several languages of India, and more on the way.


The best, of course :-). Fontconfig does fonts substitution on a linguistic level, beyond what Windows and the Mac does. Pango is probably the most advanced layout library around, though further work for some scripts is needed. The graphite description says that Sil is working on integrating it with Pango. - jg

For European languages such as French and Spanish an ordinary font technology such as TrueType is fine. For languages using Latin script yet using accented characters which do not each have a precomposed Unicode character, including many in Africa, an advanced font format is necessary. This is so that glyph substitution can take place to convert a sequence of a base character followed by a combining accent into a "looks right" display. Any rendering engine with any font containing the appropriate glyphs can put an accent mark over a character, but only OpenType can specify exactly where the mark should go for best appearance.

Freetype, used by almost everything these days on open source formats, handles a plethora of font types, from Type 1, to TrueType, to OpenType; note that anyone wanting to introduce yet another font format had best be examining how to do it as a Freetype plugin - jg

Arabic script systems (Arabic, Farsi, Urdu, etc.) need an advanced font technology and an advanced rendering engine. Chinese does not need an advanced font technology system. For languages of the Indian subcontinent typewriter-like displays can be achieved without an advanced font technology. For full support of conjunct ligatures an advanced font technology is needed, and similarly for other Asian alphabets (Sinhalese, Lao, Khmer, Myanmar, Tibetan, Mongolian, etc.).

We know of some open issues with Thai & pango, but believe that they can be solved and that Pango handles most languages already (e.g. Arabic, the Indic languages. Please help determine where further work may be needed. - jg

Please note the use of Fontconfig on open source systems for font naming and substitution - jg


Email Client requirements

Email is the only well known internet application that doesn't depend on a working TCP/IP connection to the internet. It's model is the paper postal service where there are only one or two connections per day, when the postie visits the letterbox.

It is very likely that these laptops will be in the situation where the link to the outside world will be a fragile connection running at very low speeds. If it's a modem line it's likely that the quality is so poor that echo cancellation will fail; this will limit the speed to 2400bps duplex (higher if half duplex). This is not enough for a shared web connection for thirty kids.

This is okay for email with some rules:

  • The email client must be self contained.
  • The MTA must be light and capable of very versatile store and forward without help from DNS.
  • The MTA on the client must be capable of ad-hoc forwarding. ie the child can tell it to give their mail to another client, one who's going to school today.
  • The client must have good facilities for splitting files into multiple emails (and joining) so a maximum message size of say 16kb would not be a problem.
  • The ability to put the mail on a USB key. The bandwidth of a real postie with a pocket full of USB keys could be rather high.

A good model for this might be the old FidoNet networks, though a cleaner addressing scheme would be nice.

Having just email is not as limiting as you might imagine you can access most of the internet by email.

-- Robert de Bath -- March 2006

PS: I just did the math, I've got a 1Gbyte flash key so my bandwidth on the daily commute to work is 99kbps!

Motorcycle E-mail Network

This is an excellent idea and should be part of the core OLPC project. Here is how it is currently being done in rural Cambodia. http://www.parish-without-borders.net/cditt/cambodia/dailylife/2004/rural-internet.htm

Remember, the OLPC is NOT A LAPTOP. It is a system comprising laptops, children, teachers, applications, content, USB-devices, etc.

WLAN MAC Address

There might be privacy issues related to the WLAN MAC address. The MAC is somehow similar to the unique serial number in the CPU-ID except that it is additionally broadcast around. "Quick, she/he is leaving, lets start eating the apples." A WLAN mesh might allow for relatively fine grained position tracking.

Broadcast GPS/Galileo Position, send tiles of local map

If an OLPC has access to its GPS coordinates these should optionally be sent via WLAN. Distribution could be combined with a kind of reliability information (maybe like a kind of superset of stratum in the ntp protocol).

OLPCs without direct access should store the coordinates of a nearby OLPCs with position information. This would allow an OLPC connecting to a WLAN to download/display a local map (f.e. download of an area of 80x80km, display 20x20 km (first guess for a compromise between WLAN range, walking area, map detail, bandwith, OLPC distribution)).

This can help the answer to "Where am I? Where are you from?" (local map, country, continent, earth, and solar system if need should arise:). Tiles of the detailed map could be available for other OLPC. - Frieder Ferlemann 2006-06-06

Python & kernel memory usage cooperation

OLPC will push the Linux environment to run in much tighter memory constraints -- small RAM and no swap or paging space. And it's using Python rather than C for many commonly running apps. Currently, running out of memory is handled very primitively -- in the kernel, by killing the biggest application running; in Python applications, by exiting with an error message. This clearly won't suffice, but I haven't seen any plans to improve it.

(Correction: that is not strictly right. In case of memory pressure kernel will start freeing up memory by removing temporary cache buffers, dropping memory pages of the executable files if possible (will not do that if the programs are already executed in place), or shrink network buffers. OOM killer is really used only as last resort. By the way, the fact that kernel can and does manage buffer shrinking automatically actually discourages the applications from having own caches: the kernel has potential for a better control)

Python will have to learn to "give back memory" to the kernel when it doesn't need it. It could do this on a page-by-page basis with mmap calls, subsequent to garbage collection. Also, Python should be able to signal to the application (and/or libraries) that memory is tight and the upper level code should free unnecessary resources (such as caches). This signal should occur whenever an application is suspended, or goes unused for some period. And whenever the kernel runs low of memory.

A similar strategy should probably exist for other resources, such as filesystem space, and CPU time. The kernel should have a way to tell applications that demand is high, and that they should scale back their demand if they can.

The kernel will have to learn how to signal applications to reduce their memory usage. This is most important when there is NO memory left -- when the kernel currently picks and kills a process -- but it should be done before that point, when there's more flexibility. E.g. if an application wants to allocate another page temporarily while emptying its cache or doing a garbage collection, it can't do that when zero memory is left.

A new signal (SIGSHRINK?) is one way to communicate this to processes. Having them open something from /dev or /proc and listen on it would be another. Or just let applications (or a specialized process, like init) monitor /proc/meminfo and /proc/stat and take actions accordingly. These capabilities would be useful in the upstream kernel and applications.

A daemon like inetd could allow applications to totally terminate when idle and/or when signalled to shrink. An application, that was coded to know how to resume on demand, could pass any file descriptors that need to stay open up to the daemon (e.g. open network connections or ttys), then terminate. The daemon would wait for I/O activity on those connections, and fork a new copy of the process when needed. -- John Gilmore

Normally xinetd is part of Fedora. It currently seems to have been removed from the OLPC distro.

Applications could check for free memory before starting and refuse to run if too much memory is in use. The only support needed is a reliable way for a Python app to get meaningful numbers for total memory and memory used. (This works assuming the the app knows in advance how much memory it is going to need, independent of what the user does with it. The kernel will actually do this for you: if there's no swap space, and you allocate all your memory early in the process's life, you'll get ENOMEM and you can die cleanly then. -John Gilmore)

More effort could be put into keeping applications slimmed down. Perhaps some tools to analyze redundancy, i.e. linking to a non-shared library that some other app also links to non-shared. Multiple versions of the same library. Busybox has done a lot of this kind of refactoring for basic UNIX utilities. Valgrind is also a tool to keep in mind: it does a very good job of bookkeeping of all memory allocations and one of the programs shipped with it, cachegrind, helps to minimize CPU cache impacts.


Network Protocol

I think the most important single choice is the mesh protocol, because it is likely to have a longer deployment than any implementation of the hardware, OS, or application software.

I figured that the best mesh protocol would minimize total routing waste, in order to reduce power use. Computation will use less power as technology advances, but transmission power is going to be limited by physics at some point.

I researched mesh protocols at the wikipedia.

The hazy-sighted link state protocol just stood out among the choices. It is mathematically optimized to minimize network waste. This means that it minimizes power and won't be easily improved-upon. It also has a fairly old, well-debugged, publicly deployed open source implementation that runs on diverse hardware, and is about the right size and shape (small).

The least surprising choice is probably OLSR (which periodically floods the network with limited routing data). The simplest protocol is probably AODV (a distance vector protocol that floods the network with routing information), The others seem to be research projects, or proprietary, and I would avoid them, even though some are specifically geared to power saving.

Ray Van De Walker 10:34, 26 May 2006 (EDT)