Talk:Development issues: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
(Undo revision 62549 by 203.69.39.251 (Talk))
 
(15 intermediate revisions by 10 users not shown)
Line 1: Line 1:
=== Flaws in Article wording ===

I consider the following lines, at the end of the article to conflict with each other. [[User:Nitpicker|Nitpicker]] 17:57, 16 October 2006 (EDT)

This page is a stub. Please expand on it.

Categories: Pages maintained by OLPC | Developers

=== Library use plans ===

Any plans to use any of the "low memory footprint" C libraries (uClibC, etc.)? --pdinoto
Any plans to use any of the "low memory footprint" C libraries (uClibC, etc.)? --pdinoto


::Don't think so: I18N and compatibility says otherwise. I think there are better places to go hunting. - jg
::Don't think so: I18N and compatibility says otherwise. I think there are better places to go hunting. - jg

----
::Odd. Zaurus uses some of the "low memory footprint" libraries and it supports multiple scripts including Japanese, English and Cyrillic. It is based on the QT library which has very good I18N and Unicode support. Recently, Trolltech has been selling a Qtopia Mobile Phone Edition that is being used on mobile phones in countries such as China.
where can i find the sources?
::If there is, in fact, an I18N issue with a "low memory footprint" library, then it should be investigated with a view to fixing it. Every byte we can trim the base system is one more byte for educational content.
and if i use them is there any thing i can do for help as a user?,(other than promoting well written software)

----
=== Java ===

will java (SE,ME) be supported? i know java is supposed to be huge memory hog, but carefully written app can consume as little as 16 MB together with java virtual machine - wolf
will java (SE,ME) be supported? i know java is supposed to be huge memory hog, but carefully written app can consume as little as 16 MB together with java virtual machine - wolf

----
=== Microkernel? ===

Is there any discussion of using a microkernel, such as L4, GNU/Hurd, or the Linux on L4? These aren't complete, but, a high profile project such as this could speed the development of any one of these. A small micro-kernel that is highly tuned to the individual CPU being used seems like it would increase performance and lower battery usage and could be very good for a project like this.
Is there any discussion of using a microkernel, such as L4, GNU/Hurd, or the Linux on L4? These aren't complete, but, a high profile project such as this could speed the development of any one of these. A small micro-kernel that is highly tuned to the individual CPU being used seems like it would increase performance and lower battery usage and could be very good for a project like this.
::Are you thinking about minix3???
::Are you thinking about minix3???

----
=== Python ===

I understand that Python is the primary development language. However, will C/C++ be available as well? I have a character-based app. that could be ported to OLPC.

== GX CPU Issues ==
'''This analysis is based on the old GX CPU. B3 and later systems use a wizzier CPU.'''

I've taken a look at the CPU and run some cachegrind simulations (on Rhythmbox playing Frank's 2000 Inch TV), making note of a few things.

''For those of you wishing to play with cachegrind, you can do so on any old x86. Use I1 and D1 values of '''16384,4,32''' and set L2 to '''64,4,32''' (because cachegrind demands there be L2), then ignore stats about L2 cache.''

* There is NO L2 CACHE. An L1 miss costs 25 cycles.
** Remember that sequential reads are very fast because the cache deals with them a lot better.
** There is an efficient prefetch mechanism on the Geode GX; the compiler should use it by default, I don't know if we can help in any way (i.e. calculate addresses and array indexes earlier? It probably reorders for this...)
* L1 cache is 32KiB, with 16KiB I1 and 16KiB D1.
** The D1 miss rate is 1.9%
** The L1 miss rate total is 1.2%
** Multiplying the miss rate 0.012 by the expense of 25 cycles, we see a 33.2% slowdown due to lack of cache. There are very few ways to handle this; but one interesting idea is to rework the memory allocator (i.e. malloc()) to focus on cache locality.
*** Hoard is generally a good allocator but it may not be optimal here.
*** I hear FreeBSD's memory allocator focuses on cache locality but I haven't looked.
*** I'm writing my own allocator; I have a special scheme for small allocations that will hopefully improve cache locality. It can be implemented separate and actually drop in straight over existing malloc() to manage qualifying allocations, throwing others back to the existing allocator. I'm working on getting an interposer working to see if this helps or not.
* There are only 8 ITLB, 8 DTLB, and 64 L2 TLB entries.
** The ITLB can be helped by the linker locating functions calling other functions and pulling them closer together.
*** I'm not sure how the linker rearranges functions; I'm guessing it doesn't take them out of separate objects and rearrange the code on such scale. This kind of behavior would however allow it to pull functions calling each other into the same page and thus be friendly to the TLB.
** -Os may be better or worse than -O2; -Os could result in more functions packed into a smaller area, saving cache and TLB entries.
** The DTLB can be helped by using a better allocator. I'm looking into this; as with the cache considerations, I'm going to be doing some TLB considerations.
* Python may be a slight stumbling block; but we can work on this a little.
** Python script is compiled into bytecode
*** We are aggravating the D1 problem because our 'code' is now data and affects that cache.
*** The Python interpreter should profile how it executes bytecode and rearrange it to be friendlier to cache.
*** Python modules and scripts written in Python (instead of C) could be pre-compiled to native.
*** The Psyco or PyPy JIT could be used, although start-up time and memory usage would increase.
**** The JIT could possibly be modified to cache compiled code in a PIC manner, allowing it to effectively generate pre-compiled Python on the fly. The problem with a JIT is it can't share code memory; by writing back the compiled native code to a file and later mmap()ing it in, future runs could avoid JITing the code and parallel runs using the same chunks of code could use the copies in the same file, sharing the memory.

It may be fruitful to write a library for small allocations such as linked lists and start rewriting code. This is more work but has much better potential. The advantage here is that programmers know better how they're going to allocate things, and can do a better job than a generic allocator. It would also be possible to use such a library to allocate linked lists and then actually exchange them in place such that the actual linked list was kept sequential in memory, intentionally keeping cache together; this may be a win in some places and a lose in others.

== Memory Issues ==

OLPC has 128M of memory so we have to be quite careful with what goes on. Here's a few thoughts:

* Memory Compression
** Nitin Gupta is working on this at http://linuxcompressed.sourceforge.net now.
** There is an [http://wiki.ubuntu.com/CompressedMemory Ubuntu Specification] for Compressed Memory using Nitin's patches.
* Efficient memory allocator
** Look into other memory allocators like the one in FreeBSD or Hoard.
** I'm working on my own, not sure how it will perform but the original intent was to improve space efficiency.
** Pre-compile Python scripts so that they can be mmap()ed into memory.
*** Bytecode compiled Python takes memory individually, plus takes more time to run
*** Just-in-Time compiled Python (Psyco) allocates individual code memory for individual processes that JIT the code
*** Unmodified file-backed mmap() areas are shared between processes; pre-compiled Python code will use this method and save a few pages

== Pango Issues ==

The IBM developerworks site has the following article: http://www-128.ibm.com/developerworks/linux/library/l-m17n/?ca=dgr-lnxw09PortWithm17n '''Port your code around the world with m17n''' which raises the following issues with Pango:

:The problems with Pango
:Pango can place (lay out) and render complex scripts but cannot perform sorts or searches on multi-byte text. Pango assumes that an underlying library -- typically written in the C language and able to manipulate all the languages specified in the Unicode standard -- is able to perform fundamental text processing.

Perhaps the OLPC project could focus resources to help resolve these Pango issues. Or, if these are not issues on the 2B1 due to use of Python or some support libraries, then perhaps this fact could be documented directly so that people can make better choices when setting up their development environments.

== Stripping activity source and the GPL ==

I'm not sure how important this is, but the article mentions stripping Python source for space concerns, instead distributing only the Python bytecode. It seems to me that if an activity incorporates a GPLed Python module whose source has been stripped in this fashion, then the children will not be able to legally redistribute the activity among themselves, since they will be distributing a binary GPLed product without making the source available. The same seems to be generally true of any activity using a compiled GPLed component.

This occurred to me only because I was thinking about writing a Python activity for playing Go, optionally using GnuGo (a GPLed C program) as an AI backend. It would use a lot of space to store the source along with the binary on the laptops, since the source is at least as large. But the GPL seems to require this if the activity is to be shared from laptop to laptop. It's less of a problem, of course, if the activity comes from the school server or a website, where there is more space for the source.

Is this anything to be concerned about? Should it influence license choice? —[[User:Leejc|Joe]] 18:11, 16 March 2007 (EDT)

: To answer my own question: I just read about the [http://fsfeurope.org/projects/gplv3/brussels-rms-transcript#bittorrent GPL v3] changes regarding binary distribution, and it sounds like the changes ameliorate the problem of making source available. Judging from the transcript, it sounds like the GPL v3 will allow distribution of binaries without source if the source is available via the internet, even if it is not directly available from the binary distributor... which seems to mean that it's OK to distribute an unmodified gnugo binary as long as gnu.org is making the source available.

: The new license likewise seems to allow distributing GPLed .pyc files, as long as the original .py files are available somewhere. So children wanting to distribute GPLed .pyc files of their own would need to put the source on an internet server somehow. But now, at least, they have that option.

: It's funny that Richard Stallman indirectly cites the modern abundance of bandwidth as the reason for the change, as in this case it's going to help a lot of people in bandwidth-poor areas. —[[User:Leejc|Joe]] 17:18, 4 April 2007 (EDT)

Latest revision as of 04:57, 29 August 2007

Flaws in Article wording

I consider the following lines, at the end of the article to conflict with each other. Nitpicker 17:57, 16 October 2006 (EDT)

This page is a stub. Please expand on it.

 Categories: Pages maintained by OLPC | Developers

Library use plans

Any plans to use any of the "low memory footprint" C libraries (uClibC, etc.)? --pdinoto

Don't think so: I18N and compatibility says otherwise. I think there are better places to go hunting. - jg
Odd. Zaurus uses some of the "low memory footprint" libraries and it supports multiple scripts including Japanese, English and Cyrillic. It is based on the QT library which has very good I18N and Unicode support. Recently, Trolltech has been selling a Qtopia Mobile Phone Edition that is being used on mobile phones in countries such as China.
If there is, in fact, an I18N issue with a "low memory footprint" library, then it should be investigated with a view to fixing it. Every byte we can trim the base system is one more byte for educational content.

Java

will java (SE,ME) be supported? i know java is supposed to be huge memory hog, but carefully written app can consume as little as 16 MB together with java virtual machine - wolf

Microkernel?

Is there any discussion of using a microkernel, such as L4, GNU/Hurd, or the Linux on L4? These aren't complete, but, a high profile project such as this could speed the development of any one of these. A small micro-kernel that is highly tuned to the individual CPU being used seems like it would increase performance and lower battery usage and could be very good for a project like this.

Are you thinking about minix3???

Python

I understand that Python is the primary development language. However, will C/C++ be available as well? I have a character-based app. that could be ported to OLPC.

GX CPU Issues

This analysis is based on the old GX CPU. B3 and later systems use a wizzier CPU.

I've taken a look at the CPU and run some cachegrind simulations (on Rhythmbox playing Frank's 2000 Inch TV), making note of a few things.

For those of you wishing to play with cachegrind, you can do so on any old x86. Use I1 and D1 values of 16384,4,32 and set L2 to 64,4,32 (because cachegrind demands there be L2), then ignore stats about L2 cache.

  • There is NO L2 CACHE. An L1 miss costs 25 cycles.
    • Remember that sequential reads are very fast because the cache deals with them a lot better.
    • There is an efficient prefetch mechanism on the Geode GX; the compiler should use it by default, I don't know if we can help in any way (i.e. calculate addresses and array indexes earlier? It probably reorders for this...)
  • L1 cache is 32KiB, with 16KiB I1 and 16KiB D1.
    • The D1 miss rate is 1.9%
    • The L1 miss rate total is 1.2%
    • Multiplying the miss rate 0.012 by the expense of 25 cycles, we see a 33.2% slowdown due to lack of cache. There are very few ways to handle this; but one interesting idea is to rework the memory allocator (i.e. malloc()) to focus on cache locality.
      • Hoard is generally a good allocator but it may not be optimal here.
      • I hear FreeBSD's memory allocator focuses on cache locality but I haven't looked.
      • I'm writing my own allocator; I have a special scheme for small allocations that will hopefully improve cache locality. It can be implemented separate and actually drop in straight over existing malloc() to manage qualifying allocations, throwing others back to the existing allocator. I'm working on getting an interposer working to see if this helps or not.
  • There are only 8 ITLB, 8 DTLB, and 64 L2 TLB entries.
    • The ITLB can be helped by the linker locating functions calling other functions and pulling them closer together.
      • I'm not sure how the linker rearranges functions; I'm guessing it doesn't take them out of separate objects and rearrange the code on such scale. This kind of behavior would however allow it to pull functions calling each other into the same page and thus be friendly to the TLB.
    • -Os may be better or worse than -O2; -Os could result in more functions packed into a smaller area, saving cache and TLB entries.
    • The DTLB can be helped by using a better allocator. I'm looking into this; as with the cache considerations, I'm going to be doing some TLB considerations.
  • Python may be a slight stumbling block; but we can work on this a little.
    • Python script is compiled into bytecode
      • We are aggravating the D1 problem because our 'code' is now data and affects that cache.
      • The Python interpreter should profile how it executes bytecode and rearrange it to be friendlier to cache.
      • Python modules and scripts written in Python (instead of C) could be pre-compiled to native.
      • The Psyco or PyPy JIT could be used, although start-up time and memory usage would increase.
        • The JIT could possibly be modified to cache compiled code in a PIC manner, allowing it to effectively generate pre-compiled Python on the fly. The problem with a JIT is it can't share code memory; by writing back the compiled native code to a file and later mmap()ing it in, future runs could avoid JITing the code and parallel runs using the same chunks of code could use the copies in the same file, sharing the memory.

It may be fruitful to write a library for small allocations such as linked lists and start rewriting code. This is more work but has much better potential. The advantage here is that programmers know better how they're going to allocate things, and can do a better job than a generic allocator. It would also be possible to use such a library to allocate linked lists and then actually exchange them in place such that the actual linked list was kept sequential in memory, intentionally keeping cache together; this may be a win in some places and a lose in others.

Memory Issues

OLPC has 128M of memory so we have to be quite careful with what goes on. Here's a few thoughts:

  • Memory Compression
  • Efficient memory allocator
    • Look into other memory allocators like the one in FreeBSD or Hoard.
    • I'm working on my own, not sure how it will perform but the original intent was to improve space efficiency.
    • Pre-compile Python scripts so that they can be mmap()ed into memory.
      • Bytecode compiled Python takes memory individually, plus takes more time to run
      • Just-in-Time compiled Python (Psyco) allocates individual code memory for individual processes that JIT the code
      • Unmodified file-backed mmap() areas are shared between processes; pre-compiled Python code will use this method and save a few pages

Pango Issues

The IBM developerworks site has the following article: http://www-128.ibm.com/developerworks/linux/library/l-m17n/?ca=dgr-lnxw09PortWithm17n Port your code around the world with m17n which raises the following issues with Pango:

The problems with Pango
Pango can place (lay out) and render complex scripts but cannot perform sorts or searches on multi-byte text. Pango assumes that an underlying library -- typically written in the C language and able to manipulate all the languages specified in the Unicode standard -- is able to perform fundamental text processing.

Perhaps the OLPC project could focus resources to help resolve these Pango issues. Or, if these are not issues on the 2B1 due to use of Python or some support libraries, then perhaps this fact could be documented directly so that people can make better choices when setting up their development environments.

Stripping activity source and the GPL

I'm not sure how important this is, but the article mentions stripping Python source for space concerns, instead distributing only the Python bytecode. It seems to me that if an activity incorporates a GPLed Python module whose source has been stripped in this fashion, then the children will not be able to legally redistribute the activity among themselves, since they will be distributing a binary GPLed product without making the source available. The same seems to be generally true of any activity using a compiled GPLed component.

This occurred to me only because I was thinking about writing a Python activity for playing Go, optionally using GnuGo (a GPLed C program) as an AI backend. It would use a lot of space to store the source along with the binary on the laptops, since the source is at least as large. But the GPL seems to require this if the activity is to be shared from laptop to laptop. It's less of a problem, of course, if the activity comes from the school server or a website, where there is more space for the source.

Is this anything to be concerned about? Should it influence license choice? —Joe 18:11, 16 March 2007 (EDT)

To answer my own question: I just read about the GPL v3 changes regarding binary distribution, and it sounds like the changes ameliorate the problem of making source available. Judging from the transcript, it sounds like the GPL v3 will allow distribution of binaries without source if the source is available via the internet, even if it is not directly available from the binary distributor... which seems to mean that it's OK to distribute an unmodified gnugo binary as long as gnu.org is making the source available.
The new license likewise seems to allow distributing GPLed .pyc files, as long as the original .py files are available somewhere. So children wanting to distribute GPLed .pyc files of their own would need to put the source on an internet server somehow. But now, at least, they have that option.
It's funny that Richard Stallman indirectly cites the modern abundance of bandwidth as the reason for the change, as in this case it's going to help a lot of people in bandwidth-poor areas. —Joe 17:18, 4 April 2007 (EDT)