System software

From OLPC
Revision as of 23:57, 3 January 2007 by 89.176.28.156 (talk) (→‎System Software: solving the worlds problems one apostrophe at a time)
Jump to navigation Jump to search


Software Ideas

Usability

Include in the OS an onscreen transluscent/watermark representation of the keyboard that indicated which key is pressed would help users learn to touch type since they would not have to look down at the keyboard, but could be turned on or off (better yet, varied in opacity from 0 to 100%). Otherwise, how many will even know to try typing while looking at the screen? It would also help in low light [as was suggested under hardware by another], as many won't have power for lighting yet would need to use it only when light for after school chores are completed, leaving only late in the day and many will be in latitudes with SHORT winter days.

System Software

A version of Touch Typing software to teach these kids to touch type, the faster you can work with a keyboard whatever age you are then the faster you can get on with solving the world's problems and letting the world know about your solutions... ' eg unjustified government spending on military budgets that will eventully only lead to one thing, more War to justify more spending etc..."

And does someone not need to make clearer in your marketing for support of this project that it does not have to be the same person who turns the crank as types at the keyboard, that there is a shortage of electricity in developing nations not hands to turn cranks?

Backup Software

I think an automated online backup solution would be important since laptops may be swapped, stolen or otherwise data lost on a child's system. I think that meshed with the suggestion below about a built in "hard reset" capability would allow a child to perform essentially a complete restore of the OS and then their lost files. My company, FileBanc, provides a solution that could be a good start; it runs equally well in a shell environment, web-based console and as a GUI under X. Scheduling happens with cron, and even if a system is lost, one can use another, log into the web console and restore the files through the web browser without having to install anything. We could probably do something like this for $1 per year per laptop. Thoughts? - Sean Stoner

Each school will have a server. It would be easy to set up a Subversion repository on the server, with a graphical client integrated with the file browser on each laptop. Then a student could periodically check in everything changed since the last checkpoint, with a single mouse operation.--Mokurai 06:24, 5 December 2006 (EST)

Peer To Peer Distribution, for Electronic Text, Software, Email

Extending the original idea from below... this is more general then just about electronic text, though. In lack of a better term, let me call it "built-in support for non-real-time Internet connectivity", provided as shared service and usable by apps.

For example, I myself often read some web pages that I had downloaded while on the network at home while traveling, disconnected from a network, and of course when clicking on a link you get some stupid technical error message. Why can't the thing remember I want to read the linked page later and "queue" it somewhere? (The wwwoffle proxy can do just that.) This idea is probably more much more relevant in some OLPC scenarios than it is for myself; what if you are connected to the "Internet by Motorbike" say only once every two weeks, as in the Motoman project in Cambodia?

This applies to many forms of data, from electronic content be it a complete ebook, HTML page, Email or some software to download - or publishing of content such as homepage or blog updates, etc. (I think OneWorld has an XML-based publishing along those lines; but could be confusing it with something else.) Making it possible (and easy!) to request, and publish, data from one device, which then forwards the reqest to another, and ultimately forward to Internet when connected. Doesn't it make you feel like good ol' FIDO Net is back?

Vorburger 20:06, 9 February 2006 (EST)

Distributed Filesystem?

Will the Wikipedia Offline fit into 512 MB (or even 1 GB) ? Even if it does, how about some software and other textbooks loaded at the same time? Clearly, the storage on one device is very limited... but: What if data could be spread over several laptops, a sort of built-in distributed filesystem like Coda or MogileFS - do these make any sense on a device like this, with the goal of enhancing storage capacity through distribution? In a school, every of say 100 children has 1/100th of Wikipedia - instead of clogging each device with a complete copy.

Vorburger 20:06, 9 February 2006 (EST)

As of 2006-November the bz2 compressed dump of the english wikipedia is 1.8Gb. This is the dump with only the latest versions of the articles, templates, and image descriptions. (Ie, no talk pages, no user pages, no previous versions of pages, etc). So it's not completely ridiculous to imagine putting it all onto a laptop, though of course it's hard to work with a compressed database, and even so, it's still 4 times the size of the total storage of the laptop. However, all the other language editions of wikipedia are much smaller. For example the same dump for the japanese wiki (3rd largest edition, after english and german) fits into 350Mb, and others are smaller still. Of course, without the actual images, it might not be that useful, and the images are much bigger (76Gb for the english, 7.5Gb for the Japanese)...
There is a version of Wikipedia in Simple English that is smaller than the main one. This is more appropriate for an children's educational project and because it uses fewer words than normal English, it probably will compress better. (The same type of dump as above comes to 7.4Mb, which can very easily fit onto the laptop). In any case, dumping an Internet encyclopedia into a kid's laptop makes as much sense as hiring a bus driver to fly a 747. An OLPC encyclopedia needs to be edited severely to reduce its size, make sure the language is understandable by kids and make sure that it has appropriate content. Biographies of all the kings of England are only relevant to English speaking kids, not Hindi speakers (Hindi wikipedia: 1.8Mb) or speaker of Brazilian Portuguese (Portuguese wikipedia: 120Mb).
Since I assume each school will also have an uplink gateway, maybe the Offline version of Wikipedia could be put on that, and cached on the individual units as they access it?
The idea is a good one but your implementation is flawed. Yes, each school will have some sort of system by which content can be downloaded. It could be as simple as a stack of CDROMs and a USB CDROM drive. Or, the teacher could hook the CDROM drive to her own laptop and push relevant content to the kids. In order for this to work, the encyclopedia editors have to chunk the text into thematic clusters. This means that a kid still has to do research in the encyclopedia because he has the whole Brazilian national history chunk in his laptop. But he doesn't suffer by having to carry all the Flora and Fauna of Brazil chunk, the World history chunk, and the Birth of Civilization chunk.

Grid computing

It would be interesting if software were included to allow meshed machines to create an ad-hoc grid/cluster computer. It would be useful for things like compiling software, rendering and other CPU intensive tasks. (Stuff that I imagine some of the more advanced users, High School age, might want to do). A distributed file system would be a central part of that.

  • Distributed computing may require special load balance algorithms to take into consideration the cost of electric power of each device and don't discharge hand-powered nodes, if power line connected machines are near enough.
  • A practical alternative, one that can be done now, is to use content in DVD (as suggested in the previous section). Some "hotspots" covered by these DVD-augmented laptops can be setup in a community, providing distributed servers for giving out content as well as hosting discussions. As the OLPC machine has USB port, adding DVD drive to it is not difficult. - Raffy, April 27, 2006.

Better-performing Flash Filesystem

Currently the laptop is using the JFFS2 filesystem.

Compared with previous versions of JFFS2, the version used by OLPC has these characteristics:

  • Time to mount a 512MiB file system on the OLPC board with the AMD flash controller (which is 10x slower than we should be going) was 5.9s. We ought to be able to get it into the 1-2s range once we have CAFÉ working properly, although actually we're still seeing about 6s.
  • RAM usage has been significantly decreased. Current ram usage can be calculated by 'grep jffs2 /proc/slabinfo', and we have plans to reduce it further by increasing maximum node size.
  • JFFS2 supports hard links (although this is apparently not widely known), although we don't support shared writable mmap (and there are no plans to, since its a bad idea on flash and it's hard). .
  • JFFS2 supports XATTRs, which is important if we want to use selinux.

There are two other flash file systems avialble for Linux: YAFFS and logfs.

OLPC has decided to use JFFS2 rather than YAFFS for the following reasons:

  • It has built-in write-time data compression
  • It is included in the standard Linux kernel
  • JFFS2 is much better tested than YAFFS.
  • JFFS2 works out of the box with the MTD subsystem while YAFFS needs tweaks and patches and is hard to adopt to hardware ECC controllers


YAFFS has a home page and a there is a technical article, although members of the JFFS2 team claim that the comparison is out-of-date.

Jörn Engel is currently working on a new flash file system called logfs. It is not yet clear if it will hit the mainline kernel in time for consideration for the first generation laptop, but it is progressing fast.

It would be worth running comparative performance tests on the two filesystems, because there are big potential performance wins on several fronts. In-filesystem compression isn't everything, slows all file operations down and, when used without error correcting codes onto an unreliable medium, risks major data loss.

3d software rendering

As the system does not include hardware accelerated 3d rendering, a software rendering library may be included to wrap the OpenGL (OGL/ES maybe) API and create rendering code on the fly. This, even on a machine with limited clock speed can provide a rendering performance comparable to that of some integrated 3d chipsets, especially if the resolution is kept low. This could allow educational software to use 3d rendering (physics and mathematics softwares could take advantage of this). There are some existing tools that can be leveraged for this; for example, Vincent is an OpenGL/ES implementation that provides software rendering for constrained devices like cell phones; SwShader, precursor of transgamings' SwiftShader and many others. Having (limited) OpenGL capability does add some capabilities to the device without requiring additional hardware.

Software Installation, Package Manager, Central Repository

How relevant is a polished end-user friendly Package Manager? With limited memory, are you more likely to uninstall and try another application and install back one? In the beginning, how important is it to be able to very easily get patched new versions of the software? Underlying question: Is a central repository of applications desirable? Completely open, anybody can submit their (pre-compiled) package?

Vorburger

Should there be an easy way to install and remove applications from the device without corrupting the system image? I am thinking of something like klik (http://klik.atekon.de/). -- DPalmerJr

-> An initial proposal and a proof-of-concept demo is here. -- Probono

I am on a team developing a deeply embedded losely connected ARM-based Linux system (64 MiB RAM, 512 MiB disc). We have discovered the hard way that it's best to support in-field upgrades -- right from day 1. Even with an effective release management + testing/validation team, specs will change, improvements will be made, bugs will slip through. Our devices are connected via slow satellite links and connect to our infrastructure as infrequently as once per month. We cannot feed a lot of data through the link without blowing our power budget. Even if/when we are willing to risk an over-the-air in-field upgrade, we may not have the bandwidth/power budget. We have found conventional package managers (dpkg, rpm) are too coarse-grained when dealing with skinny pipes and power budgets. A package manager supporting deltas would be preferable. We have even considered downloading source patches and re-compiling on the embedded device. Your network will be faster than ours, so YMMV.

System development + testing will benefit from a slick patch/upgrade mechanism too.

I don't think it's unreasonable to expect to upgrade the devices via the mesh cluster - upgrade one device and the rest can upgrade from it. Use public-key-encryption to sign 'blessed' packages.

I consider a well-thought-out, secure, trustable, user-controlable package management system to be critical to system stability, extensibility, maintainability, and ultimately to the success of this project. -- BCL

-> A system using bundled, self-contained applications like this could facilitate mesh-sharing of applications. -- Probono

Laptop as USB-Drive

It would probably be useful if the laptop could be accessed as a USB-Drive, like a digital camera.. In the Software Development context hackers could probably also configure File Sharing via the WiFi... but simple "USB cross cabling" could be interesting to end-users because it's: a) most simple, b) secure, probably OK to give access to entire filesystem, if locally attached, c) doesn't need Wifi; the nearest Internet Cafe in a bigger town will let children/teacher USB-connect their laptop to one of their stations to copy over a newly downloaded application, but not have a Wifi basestation; at least not where I have travelled in India.

Why take the laptop to the big town when you can take a thumbdrive instead. Better yet, why not just wait for the content to come to you on a CD-ROM. Send an email by motorcycle-net to order the content you want, and next week, the Motoman motorcycle brings it on CD during the regular delivery. Works in Vietnam.

Vorburger

Maybe a software can be developed for this. Since the system is going to be "Linux Based", just accesing the filesystem should allow to configure almost everything. A software that gives access to the filesystem (and emulate a camera or an USB thumb), could be included. Or maybe, a special cable provided with the laptop (that uses one special of the 3 USB ports) could allow direct access to filesystem. (or with a switch somewhere in the laptop that even without power makes it work as a USB-Drive, even with the posibility of charging batteries while connected).

Gandolfi

Hard-Reset built-in

Curious kids will certainly easily manage to screw up the software side of the device - and they should! A built-in hard-reset that can re-initialize the OS etc. from ROM; sort of like some modern laptops have a hidden partition on the HDD that can re-install without the usual Recovery CD, could be useful.

You always have the problem of personal data, files, and configuration settings. Some solution for that would have to be provided; e.g. easily copy to your friend's device over the wireless network?

Vorburger

This is a very good point. If we use a compressed read-only file (or partition) with most of the filesystem (specially the part under /usr) we can not only stuff a lot more software in there, but also resetting would be a much simpler operation. Basically all it had to do was to untar a "factory default" tar file (or something like that) into the writtable part of the flash storage.

We could have a boot option, where the user would type "reset" or something like that, to boot a "rescue" kernel and initrd that just did this operation. -- Paulo Marques


There's a problem in the Microsoft Windows world with newly-installed systems. You have to go on-line to get the latest security patches from Microsoft. But as soon as you go on-line with an unpatched system you're at risk of infection from viruses.

The reset operation could be integrated with the patch/upgrade mechanism whereby the system will only install secure signed OS-level packages until either the system or the user decides it's OK to open the doors for business. -- BCL

if the system is on a readonly area, why not just leave it there and usethat directly without copying anything? changes to that area could still be made using an overlay filesystem, that just shadows the original. (you could also have the system be a tree of links into the rom, and when you want to change something, then you remove the link and replace it with a real file. that would however not be as easy to use/understand, because it would be different from the way a normal system works) -- eMBee


per eMBee's idea, you could use a method similar to a live cd, using software such as unionfs. One of the problems with using true ROM is that as updates are applied, there would be wasted space where the old files are stored. Also, a change in the base system, such as changing from the current plan of linux to the proposed plan of a special Windows OS would render this ROM useless. This problem could be solved by having a "system files" partition that can only be modified by an update program that checks the md5 hash of the update .diff file aginst a list of md5s in a pgp signed file that is redistributed at the same time as a update is released. Also, the system files area should be of a reasonable size bigger than the current distribution (50-100 MB) to make room for updates. A restore could be accomplished by removing all files from the fully read-write side of the unionfs filesystem that affect system operation, while preserving documents (ex. not removing /home/*) . Having the system files area be semi-writable solves the problem mentioned above of the system not being secure at the time a restore is done, because the updates would alredy be in the system files area, which would not be affected by a restore. Just my ideas on the subject. -- Anonymous

Font technology

Which font technology is to be used?

The OLPC uses Linux with GTK which includes Pango as a component. It also uses FreeType which means that the OLPC uses cross-platform TrueType fonts.

Yes, OpenType fonts will much better render complex scripts. The Pango and SIL Graphite projects are cooperating on the design of their rendering engines and the fonts they will need.

Anyway, this isn't a problem that OLPC needs to solve. Experts are working on it. OLPC will leverage their work. For an example of why OLPC is not directly working on the font problem, read this article on Tibetan writing.

It is important that a thorough analysis of the character rendering technology and font technology needed is carried out.

--SIL is the world's foremost research institution on such matters. They work in more than a thousand languages, and maintain the Ethnologue catalog of more than 6,000 documented human languages.

--There are such experts at the heart of the font and rendering engine initiatives described throughout this article. I have observed experts from universities and from SIL, Red Hat, Apple, Microsoft, Sun, IBM, Hewlett-Packard, Evertype, commercial font vendors, vendors of font creation software,...since that is who makes up the Unicode Consortium. The portion of their efforts that goes into Linux will inevitably end up on the OLPC products.

Here is a transcript of what I wrote before.

--I use Pango rendering and properly implemented TrueType fonts on my Linux system to render conjuncts without difficulty. Some TrueType fonts have the glyphs but not the substitution tables; they render with great ugliness. The Akruti fonts, developed in India for all of the major alphabets of India, were placed under the GPL (GNU Public License) as Free Software some time ago (on Gandhi's birthday). There are distributions of Linux in several languages of India, and more on the way.


The best, of course :-). Fontconfig does fonts substitution on a linguistic level, beyond what Windows and the Mac does. Pango is probably the most advanced layout library around, though further work for some scripts is needed. The graphite description says that Sil is working on integrating it with Pango. - jg

For European languages such as French and Spanish an ordinary font technology such as TrueType is fine. For languages using Latin script yet using accented characters which do not each have a precomposed Unicode character, including many in Africa, an advanced font format is necessary. This is so that glyph substitution can take place to convert a sequence of a base character followed by a combining accent into a "looks right" display. Any rendering engine with any font containing the appropriate glyphs can put an accent mark over a character, but only OpenType can specify exactly where the mark should go for best appearance.

Freetype, used by almost everything these days on open source formats, handles a plethora of font types, from Type 1, to TrueType, to OpenType; note that anyone wanting to introduce yet another font format had best be examining how to do it as a Freetype plugin - jg

Arabic script systems (Arabic, Farsi, Urdu, etc.) need an advanced font technology and an advanced rendering engine. Chinese does not need an advanced font technology system. For languages of the Indian subcontinent typewriter-like displays can be achieved without an advanced font technology. For full support of conjunct ligatures an advanced font technology is needed, and similarly for other Asian alphabets (Sinhalese, Lao, Khmer, Myanmar, Tibetan, Mongolian, etc.).

We know of some open issues with Thai & pango, but believe that they can be solved and that Pango handles most languages already (e.g. Arabic, the Indic languages. Please help determine where further work may be needed. - jg

Please note the use of Fontconfig on open source systems for font naming and substitution - jg


Email Client requirements

Email is the only well known internet application that doesn't depend on a working TCP/IP connection to the internet. It's model is the paper postal service where there are only one or two connections per day, when the postie visits the letterbox.

It is very likely that these laptops will be in the situation where the link to the outside world will be a fragile connection running at very low speeds. If it's a modem line it's likely that the quality is so poor that echo cancellation will fail; this will limit the speed to 2400bps duplex (higher if half duplex). This is not enough for a shared web connection for thirty kids.

This is okay for email with some rules:

  • The email client must be self contained.
  • The MTA must be light and capable of very versatile store and forward without help from DNS.
  • The MTA on the client must be capable of ad-hoc forwarding. ie the child can tell it to give their mail to another client, one who's going to school today.
  • The client must have good facilities for splitting files into multiple emails (and joining) so a maximum message size of say 16kb would not be a problem.
  • The ability to put the mail on a USB key. The bandwidth of a real postie with a pocket full of USB keys could be rather high.

A good model for this might be the old FidoNet networks, though a cleaner addressing scheme would be nice.

Having just email is not as limiting as you might imagine you can access most of the internet by email.

-- Robert de Bath -- March 2006

PS: I just did the math, I've got a 1Gbyte flash key so my bandwidth on the daily commute to work is 99kbps!

Motorcycle E-mail Network

This is an excellent idea and should be part of the core OLPC project. Here is how it is currently being done in rural Cambodia. http://www.parish-without-borders.net/cditt/cambodia/dailylife/2004/rural-internet.htm

Remember, the OLPC is NOT A LAPTOP. It is a system comprising laptops, children, teachers, applications, content, USB-devices, etc.

WLAN MAC Address

There might be privacy issues related to the WLAN MAC address. The MAC is somehow similar to the unique serial number in the CPU-ID except that it is additionally broadcast around. "Quick, she/he is leaving, lets start eating the apples." A WLAN mesh might allow for relatively fine grained position tracking.

Broadcast GPS/Galileo Position, send tiles of local map

If an OLPC has access to its GPS coordinates these should optionally be sent via WLAN. Distribution could be combined with a kind of reliability information (maybe like a kind of superset of stratum in the ntp protocol).

OLPCs without direct access should store the coordinates of a nearby OLPCs with position information. This would allow an OLPC connecting to a WLAN to download/display a local map (f.e. download of an area of 80x80km, display 20x20 km (first guess for a compromise between WLAN range, walking area, map detail, bandwith, OLPC distribution)).

This can help the answer to "Where am I? Where are you from?" (local map, country, continent, earth, and solar system if need should arise:). Tiles of the detailed map could be available for other OLPC. - Frieder Ferlemann 2006-06-06

Python & kernel memory usage cooperation

OLPC will push the Linux environment to run in much tighter memory constraints -- small RAM and no swap or paging space. And it's using Python rather than C for many commonly running apps. Currently, running out of memory is handled very primitively -- in the kernel, by killing the biggest application running; in Python applications, by exiting with an error message. This clearly won't suffice, but I haven't seen any plans to improve it.

(Correction: that is not strictly right. In case of memory pressure kernel will start freeing up memory by removing temporary cache buffers, dropping memory pages of the executable files if possible (will not do that if the programs are already executed in place), or shrink network buffers. OOM killer is really used only as last resort. By the way, the fact that kernel can and does manage buffer shrinking automatically actually discourages the applications from having own caches: the kernel has potential for a better control)

Python will have to learn to "give back memory" to the kernel when it doesn't need it. It could do this on a page-by-page basis with mmap calls, subsequent to garbage collection. Also, Python should be able to signal to the application (and/or libraries) that memory is tight and the upper level code should free unnecessary resources (such as caches). This signal should occur whenever an application is suspended, or goes unused for some period. And whenever the kernel runs low of memory.

A similar strategy should probably exist for other resources, such as filesystem space, and CPU time. The kernel should have a way to tell applications that demand is high, and that they should scale back their demand if they can.

The kernel will have to learn how to signal applications to reduce their memory usage. This is most important when there is NO memory left -- when the kernel currently picks and kills a process -- but it should be done before that point, when there's more flexibility. E.g. if an application wants to allocate another page temporarily while emptying its cache or doing a garbage collection, it can't do that when zero memory is left.

A new signal (SIGSHRINK?) is one way to communicate this to processes. Having them open something from /dev or /proc and listen on it would be another. Or just let applications (or a specialized process, like init) monitor /proc/meminfo and /proc/stat and take actions accordingly. These capabilities would be useful in the upstream kernel and applications.

A daemon like inetd could allow applications to totally terminate when idle and/or when signalled to shrink. An application, that was coded to know how to resume on demand, could pass any file descriptors that need to stay open up to the daemon (e.g. open network connections or ttys), then terminate. The daemon would wait for I/O activity on those connections, and fork a new copy of the process when needed. -- John Gilmore

Normally xinetd is part of Fedora. It currently seems to have been removed from the OLPC distro.

Applications could check for free memory before starting and refuse to run if too much memory is in use. The only support needed is a reliable way for a Python app to get meaningful numbers for total memory and memory used. (This works assuming the the app knows in advance how much memory it is going to need, independent of what the user does with it. The kernel will actually do this for you: if there's no swap space, and you allocate all your memory early in the process's life, you'll get ENOMEM and you can die cleanly then. -John Gilmore)

More effort could be put into keeping applications slimmed down. Perhaps some tools to analyze redundancy, i.e. linking to a non-shared library that some other app also links to non-shared. Multiple versions of the same library. Busybox has done a lot of this kind of refactoring for basic UNIX utilities. Valgrind is also a tool to keep in mind: it does a very good job of bookkeeping of all memory allocations and one of the programs shipped with it, cachegrind, helps to minimize CPU cache impacts.

Network Protocol

I think the most important single choice is the mesh protocol, because it is likely to have a longer deployment than any implementation of the hardware, OS, or application software.

I figured that the best mesh protocol would minimize total routing waste, in order to reduce power use. Computation will use less power as technology advances, but transmission power is going to be limited by physics at some point.

I researched mesh protocols at the wikipedia.

The hazy-sighted link state protocol just stood out among the choices. It is mathematically optimized to minimize network waste. This means that it minimizes power and won't be easily improved-upon. It also has a fairly old, well-debugged, publicly deployed open source implementation that runs on diverse hardware, and is about the right size and shape (small).

The least surprising choice is probably OLSR (which periodically floods the network with limited routing data). The simplest protocol is probably AODV (a distance vector protocol that floods the network with routing information), The others seem to be research projects, or proprietary, and I would avoid them, even though some are specifically geared to power saving.

Ray Van De Walker 10:34, 26 May 2006 (EDT)