Talk:Development issues: Difference between revisions
(more headings and article wording issue) |
(Pango Issues) |
||
Line 74: | Line 74: | ||
*** Just-in-Time compiled Python (Psyco) allocates individual code memory for individual processes that JIT the code |
*** Just-in-Time compiled Python (Psyco) allocates individual code memory for individual processes that JIT the code |
||
*** Unmodified file-backed mmap() areas are shared between processes; pre-compiled Python code will use this method and save a few pages |
*** Unmodified file-backed mmap() areas are shared between processes; pre-compiled Python code will use this method and save a few pages |
||
== Pango Issues == |
|||
The IBM developerworks site has the following article: http://www-128.ibm.com/developerworks/linux/library/l-m17n/?ca=dgr-lnxw09PortWithm17n '''Port your code around the world with m17n''' which raises the following issues with Pango: |
|||
:The problems with Pango |
|||
:Pango can place (lay out) and render complex scripts but cannot perform sorts or searches on multi-byte text. Pango assumes that an underlying library -- typically written in the C language and able to manipulate all the languages specified in the Unicode standard -- is able to perform fundamental text processing. |
|||
Perhaps the OLPC project could focus resources to help resolve these Pango issues. Or, if these are not issues on the 2B1 due to use of Python or some support libraries, then perhaps this fact could be documented directly so that people can make better choices when setting up their development environments. |
Revision as of 14:56, 23 October 2006
Flaws in Article wording
I consider the following lines, at the end of the article to conflict with each other. Nitpicker 17:57, 16 October 2006 (EDT)
This page is a stub. Please expand on it.
Categories: Pages maintained by OLPC | Developers
Library use plans
Any plans to use any of the "low memory footprint" C libraries (uClibC, etc.)? --pdinoto
- Don't think so: I18N and compatibility says otherwise. I think there are better places to go hunting. - jg
- Odd. Zaurus uses some of the "low memory footprint" libraries and it supports multiple scripts including Japanese, English and Cyrillic. It is based on the QT library which has very good I18N and Unicode support. Recently, Trolltech has been selling a Qtopia Mobile Phone Edition that is being used on mobile phones in countries such as China.
- If there is, in fact, an I18N issue with a "low memory footprint" library, then it should be investigated with a view to fixing it. Every byte we can trim the base system is one more byte for educational content.
Java
will java (SE,ME) be supported? i know java is supposed to be huge memory hog, but carefully written app can consume as little as 16 MB together with java virtual machine - wolf
Microkernel?
Is there any discussion of using a microkernel, such as L4, GNU/Hurd, or the Linux on L4? These aren't complete, but, a high profile project such as this could speed the development of any one of these. A small micro-kernel that is highly tuned to the individual CPU being used seems like it would increase performance and lower battery usage and could be very good for a project like this.
- Are you thinking about minix3???
Python
I understand that Python is the primary development language. However, will C/C++ be available as well? I have a character-based app. that could be ported to OLPC.
CPU Issues
I've taken a look at the CPU and run some cachegrind simulations (on Rhythmbox playing Frank's 2000 Inch TV), making note of a few things.
For those of you wishing to play with cachegrind, you can do so on any old x86. Use I1 and D1 values of 16384,4,32 and set L2 to 64,4,32 (because cachegrind demands there be L2), then ignore stats about L2 cache.
- There is NO L2 CACHE. An L1 miss costs 25 cycles.
- Remember that sequential reads are very fast because the cache deals with them a lot better.
- There is an efficient prefetch mechanism on the Geode GX; the compiler should use it by default, I don't know if we can help in any way (i.e. calculate addresses and array indexes earlier? It probably reorders for this...)
- L1 cache is 32KiB, with 16KiB I1 and 16KiB D1.
- The D1 miss rate is 1.9%
- The L1 miss rate total is 1.2%
- Multiplying the miss rate 0.012 by the expense of 25 cycles, we see a 33.2% slowdown due to lack of cache. There are very few ways to handle this; but one interesting idea is to rework the memory allocator (i.e. malloc()) to focus on cache locality.
- Hoard is generally a good allocator but it may not be optimal here.
- I hear FreeBSD's memory allocator focuses on cache locality but I haven't looked.
- I'm writing my own allocator; I have a special scheme for small allocations that will hopefully improve cache locality. It can be implemented separate and actually drop in straight over existing malloc() to manage qualifying allocations, throwing others back to the existing allocator. I'm working on getting an interposer working to see if this helps or not.
- There are only 8 ITLB, 8 DTLB, and 64 L2 TLB entries.
- The ITLB can be helped by the linker locating functions calling other functions and pulling them closer together.
- I'm not sure how the linker rearranges functions; I'm guessing it doesn't take them out of separate objects and rearrange the code on such scale. This kind of behavior would however allow it to pull functions calling each other into the same page and thus be friendly to the TLB.
- -Os may be better or worse than -O2; -Os could result in more functions packed into a smaller area, saving cache and TLB entries.
- The DTLB can be helped by using a better allocator. I'm looking into this; as with the cache considerations, I'm going to be doing some TLB considerations.
- The ITLB can be helped by the linker locating functions calling other functions and pulling them closer together.
- Python may be a slight stumbling block; but we can work on this a little.
- Python script is compiled into bytecode
- We are aggravating the D1 problem because our 'code' is now data and affects that cache.
- The Python interpreter should profile how it executes bytecode and rearrange it to be friendlier to cache.
- Python modules and scripts written in Python (instead of C) could be pre-compiled to native.
- The Psyco or PyPy JIT could be used, although start-up time and memory usage would increase.
- The JIT could possibly be modified to cache compiled code in a PIC manner, allowing it to effectively generate pre-compiled Python on the fly. The problem with a JIT is it can't share code memory; by writing back the compiled native code to a file and later mmap()ing it in, future runs could avoid JITing the code and parallel runs using the same chunks of code could use the copies in the same file, sharing the memory.
- Python script is compiled into bytecode
It may be fruitful to write a library for small allocations such as linked lists and start rewriting code. This is more work but has much better potential. The advantage here is that programmers know better how they're going to allocate things, and can do a better job than a generic allocator. It would also be possible to use such a library to allocate linked lists and then actually exchange them in place such that the actual linked list was kept sequential in memory, intentionally keeping cache together; this may be a win in some places and a lose in others.
Memory Issues
OLPC has 128M of memory so we have to be quite careful with what goes on. Here's a few thoughts:
- Memory Compression
- Nitin Gupta is working on this at http://linuxcompressed.sourceforge.net now.
- There is an Ubuntu Specification for Compressed Memory using Nitin's patches.
- Efficient memory allocator
- Look into other memory allocators like the one in FreeBSD or Hoard.
- I'm working on my own, not sure how it will perform but the original intent was to improve space efficiency.
- Pre-compile Python scripts so that they can be mmap()ed into memory.
- Bytecode compiled Python takes memory individually, plus takes more time to run
- Just-in-Time compiled Python (Psyco) allocates individual code memory for individual processes that JIT the code
- Unmodified file-backed mmap() areas are shared between processes; pre-compiled Python code will use this method and save a few pages
Pango Issues
The IBM developerworks site has the following article: http://www-128.ibm.com/developerworks/linux/library/l-m17n/?ca=dgr-lnxw09PortWithm17n Port your code around the world with m17n which raises the following issues with Pango:
- The problems with Pango
- Pango can place (lay out) and render complex scripts but cannot perform sorts or searches on multi-byte text. Pango assumes that an underlying library -- typically written in the C language and able to manipulate all the languages specified in the Unicode standard -- is able to perform fundamental text processing.
Perhaps the OLPC project could focus resources to help resolve these Pango issues. Or, if these are not issues on the 2B1 due to use of Python or some support libraries, then perhaps this fact could be documented directly so that people can make better choices when setting up their development environments.