Wiki as a book reader: Difference between revisions

From OLPC
Jump to navigation Jump to search
m (Reverted edits by Livekoor (Talk) to last revision by Skierpage)
 
(45 intermediate revisions by 24 users not shown)
Line 1: Line 1:
== The Use Cases ==
== The Use Cases ==
There has been alot of healthy discussions around ebooks and book libraries lately. This is not a simple tool selection discussion, it is one in which participants are trying to understand how to use (or create) the best combination of tools that will enable [http://en.wikipedia.org/wiki/Constructionist_learning constructionist learning] [Wikipedia] in schools in the developing world.
There have been a lot of healthy discussions around [[ebooks]] and book [[library|libraries]] lately. This is not a simple tool selection discussion, it is one in which participants are trying to understand how to use (or create) the best combination of tools that will enable [[constructionist]] learning in schools in the developing world.


The challenge lies in the fact that there are multiple ways in which content can be provided (book, text book, article, magazine, multimedia, etc.), and then when constructionism is taken into consideration, there are multiple ways in which users can interact with those differnt mediums (create them, edit them, comment on them, share them, etc.)
The challenge lies in the fact that there are multiple ways in which content can be provided (book, text book, article, magazine, multimedia, etc.), and then when [[constructionist|constructionism]] is taken into consideration, there are multiple ways in which users can interact with those different mediums (create them, edit them, comment on them, share them, etc.)


The following use cases describe the most probable points of intersection between mediums, roles and interaction models:
The following use cases describe the most probable points of intersection between media, roles and interaction models:


# '''Curriculum distribution:''' Content that is currently taught in schools, need to be made available to students on their new laptops. This is going to be a major way in which governments in poor developing countries will be able to justify and allocate the financial resources needed to finance those freely-distributed laptops. They are effectively a replacement for text books. Any solution will need to allow for the distribution, and offline reading of curriculum.
# '''Curriculum distribution:''' Content that is currently taught in schools, need to be made available to students on their new laptops. This is going to be a major way in which governments in poor developing countries will be able to justify and allocate the financial resources needed to finance those freely-distributed laptops. They are effectively a replacement for text books. Any solution will need to allow for the distribution and offline reading of curricula.
# '''Content creation:''' Children should be able to create their own content. They should also be able to share their content and work together on developing it. Not just that, they should be able to modify pre-existing content, to edit it, update it or even modify it to be more relevent to their lives and experience. This means that they should be able to modify an ecology chapter to add local knowledge to it, by describing examples from the surrounding environment, or stating an local exception that is not governed by the rules described in the original text.
# '''Content creation:''' Children should be able to create their own content. They should also be able to share their content and work together on developing it. Not just that, they should be able to modify pre-existing content, to edit it, update it or even modify it to be more relevant to their lives and experience. This means that they should be able/ to modify an ecology chapter to add local knowledge to it, by describing examples from the surrounding environment, or stating a local exception that is not governed by the rules described in the original text.
:Before people make decisions on the technology to be used for this, they should make sure they have some hands-on experience with tools like Hyperstudio (Logo-based) or Hypercard. You can download a Hyperstudio Preview from http://www.hyperstudio.com. Hypercard itself has not been distributed for years but Revolution is available in a trial edition at http://www.runrev.com. This implements the hypercard language and the card stack model. Wiki may in fact be too simplistic for the kids because it does not allow active content, i.e. application code
# '''Easy third party book publishing:''' Any person with scientific, literary or artistic knowledge or experience should be able to easily author content and make it available to children. She also should be able to interact with readers of that content and be able to read and react to any changes made to that content or any comments posted about it.
# '''Easy third party book publishing:''' Any person with scientific, literary or artistic knowledge or experience should be able to easily author content and make it available to children. She also should be able to interact with readers of that content and be able to read and react to any changes made to that content or any comments posted about it.
:This is already available. Get a paper book, scan it, compress it with [[DJVU]] and display it with the [[Evince]] reader included on the OLPC.
# '''Easy search and discovery of third party content:''' On the other hand, children should be able to access, share and interact with the third party content being provided to them, in the same way that they are able to interact with content that they themselves created and shared.
# '''Easy search and discovery of third party content:''' On the other hand, children should be able to access, share and interact with the third party content being provided to them, in the same way that they are able to interact with content that they themselves created and shared.


There is no software that has been evaluated so far that provides such flexibility and integration of features that would fullfill all those requirements. One class of software though, embodies alot of the basic characteristics described above and has been in wide use for a while now. It is called [http://en.wikipedia.org/wiki/Wiki Wiki] [wikipedia].
There is no software that has been evaluated so far that provides such flexibility and integration of features that would fullfill all those requirements. One class of software though, embodies alot of the basic characteristics described above and has been in wide use for a while now. It is called [http://en.wikipedia.org/wiki/Wiki Wiki] [wikipedia].

Is the technology being addressed to meet the the needs of the learners? How has student scores changed in comparison to the wiki implementation? (Ashley P.)


== Wiki Required Changes ==
== Wiki Required Changes ==
Line 18: Line 22:


=== Synchronization Support===
=== Synchronization Support===
In order for wiki to be used as an offline reading tool, a version of it should be running on the local machine. This will have additional added advantages, such as providing the children with a more powerful way to store their notes and thoughts than the file system and text files, or maybe a way for them to solve their homework and submit it, etc.
In order for wiki to be used as an offline reading tool, a version of it should be running on the local machine. This would either be a Python-based server running locally or something like [http://www.tiddlywiki.com/ TiddlyWiki] which runs in the browser. This will have additional added advantages, such as providing the children with a more powerful way to store their notes and thoughts than the file system and text files, or maybe a way for them to solve their homework and submit it, etc.


But there will also be a school server where more content than can be stored on the laptops will be stored or cached (in case it is fetched from yet a higher level server). The children will need to be able to obtain content from the school server and store it on the local wiki for offline reading.
But there will also be a school server where more content than can be stored on the laptops will be stored or cached (in case it is fetched from yet a higher level server). The children will need to be able to obtain content from the school server and store it on the local wiki for offline reading.
Line 29: Line 33:
Unless there is some basic ability to group a set of wiki pages and download/upload them together, or browse them sequentially, it will be very difficult to have ebooks or any content of considerable size be distributable or readable through wiki.
Unless there is some basic ability to group a set of wiki pages and download/upload them together, or browse them sequentially, it will be very difficult to have ebooks or any content of considerable size be distributable or readable through wiki.


Without this feature, two alternatives come to mind. The first is to have all related content be in one page. This might result in articles that very large to the degree where they take alot of time to render, consume alot of energy form the processor to render them, and possibly run out of memory while viewing them.
Without this feature, two alternatives come to mind. The first is to have all related content be in one page. This might result in articles that very large to the degree where they take a lot of time to render, consume a lot of energy from the processor to render them, and possibly run out of memory while viewing them.


The second alternative is to have the ebooks consist of a normal set of wiki pages that are not grouped in any way. It is obvious why this would be an issue, including, but not limited to, the need to download many pages seperately, which would not be doable for the average size textbook.
The second alternative is to have the ebooks consist of a normal set of wiki pages that are not grouped in any way. It is obvious why this would be an issue, including, but not limited to, the need to download many pages separately, which would not be doable for the average size textbook.


Given those two alternative, it becomes aparent why some grouping of wiki pages is needed, so that operations can be applied to the group as a whole, and the viewing experience of a book of multiple pages can be enhanced with simple UI features.
Given those two alternative, it becomes apparent why some grouping of wiki pages is needed, so that operations can be applied to the group as a whole, and the viewing experience of a book of multiple pages can be enhanced with simple UI features.


: This seems to be solved for download. People produce [[Wikislices]], sets of static HTML pages of Wikipedia content, and make them available as [[Collections]]. [[WikiBrowse]] is a fancier solution involving a local web server that serves content from a compressed Wikipedia dump. -- [[User:Skierpage|Skierpage]] 03:59, 18 October 2008 (UTC)
=== Programmable Content ===
One of the greatest educational tool a constructionist has is programming. This was clearly demonstrated by the early success of logo in teaching children math skills. In order for this it to succeed though, children will need to author those programs, interact with them and interact with other children during the creation process.


Drupal is an open source content management platform that might be a solution to the page grouping requirement through the use of its [http://drupal.org/handbook/modules/book book module].
Given that most programs can be textually represented, wiki could be a great medium for storing and sharing those programs. The challenge though is that wiki servers today are designed to be stores of formatted text.


=== Programmable Content ===
Wiki could benefit alot of as a constructionist education tool if its front end is integrated with educational programming language's interactive frontend interface.
One of the greatest educational paths for a [[constructionist]] is programming. This was clearly demonstrated by the early success of [[LOGO]] in teaching children math skills. In order for this to succeed, children will need to author those programs, interact with them and interact with other children during the creation process.


Given that most programs can be textually represented, wiki could be a great medium for storing and sharing those programs. The challenge is that wiki servers today are designed to be stores of formatted text.
=== Simplified User Interface ===
Wikis today are fairly simple compared to other kind of applications that try to achieve the same goal. They are not simple enough though to allow young children to use and easily interact with that interface. The function it provides is simple, which means that it should be possible to create an interface that reflects that simplicity. That user interface does not exist yet.


Wiki could be enhanced as a constructionist education tool if its front end were to be integrated with an educational programming language as an interactive front-end interface.
==Landscape format or Portrait Format or a Choice by an Author?==


For example, try the toy demo [http://www.LogoWiki.net LogoWiki] which has a [[LOGO]] interpreter in JavaScript and does not need the wiki-mode for programming, testing or saving. (This is an experiment so be gentle.)
[Start of note by William Overington 19 March 2006]


This can be used without downloads, which are sometimes difficult or forbidden in schools, and can be taken further to make a Hypercard-like WYSIWYG system.
Should the display be landscape format like a web page or portrait format like most hardcopy books or should the author be able to choose?


Via a plug-in such as [[Squeak]] uses (e.g. see [http://www.squeakland.org/pdf/etoys_n_learning.pdf Squeak Etoys], [http://www.squeakland.org/pdf/etoys_n_authoring.pdf Squeak Etoys Media]), a full range of authoring, compatible with the web can be accomplished within a browser.
[End of note by William Overington 19 March 2006]


=== Simplified User Interface ===
Additional question: Should the READER be able to choose?
Wikis today are fairly simple compared to other kind of applications that try to achieve the same goal. They are not simple enough though to allow young children to use and easily interact with that interface. The function it provides is simple, which means that it should be possible to create an interface that reflects that simplicity. That user interface does not exist yet.
: NB: UNICEF has a small group working on making such an interface.


==Existing Wiki Code to Start With==
==Format control of text==
There is already a Wiki called [[Wikidpad]] that may be appropriate as the base for this type of ebook reader. Unlike most wikis, it is designed to be run as a GUI application, not as a web server. In addition, it is targetted at the individual user who needs to organize their work, their plans or their writing.


Strong points of Wikidpad:
I am not sure why we wouldn't just use CSS, which allows both author and reader control and is certainly up to the job in terms of pagination. There could readily be a style-sheet for BW portrait and landscape, as well as Color portrait and landscape. (Maybe this should move to the discussion page?) --[[User:Walter | Walter]]
* Open source tool written in [[Python]]
* Uses a single [[SQLite]] database as datastore
* has minimal GUI requirements making it easy to port from wxWindows to GTK


[Start of note by William Overington 19 March 2006]


==Landscape format or Portrait Format or a Choice by an Author?==
An issue which needs to be addressed is as to how a content author can insert format control commands into text so as to produce the content author's desired results upon the screen.


Should the display be landscape format like a web page or portrait format like most hardcopy books or should the author be able to choose? -- William Overington 19 March 2006
Unicode is good for specifying characters for text and symbols, yet does not, by choice, for the most part address formatting issues, leaving that to higher level protocols. Such protocols will be needed so as to address point size of type, choice of font, whether regular or italic or bold or bold italic, justification, page throws, colour of text, colour of paper. location of diagrams, wrapping of text around diagrams.


Additional question: Should the READER be able to choose?
There are various possibilities.

One is to make the whole text an HTML or XML style document in every case.

Another is to use Unicode Private Use Area codes to signal such things, so that a default format text is just ordinary plain text and any special characters are added as needed. The format characters could have authoring-time symbols for display if desired at authoring-time though they would be zero-width at display time. So there could be three modes. Authoring mode ordinary, authoring mode special and display mode. Authoring mode ordinary and display mode would have formatting symbols as zero-width, authoring mode special would show the authoring-time symbols so that situations where a formatting symbol is to be deleted or a sequence of two or more formatting symbols needs to be inspected or edited could easily be managed by a content author.

I had a try at defining various such formatting characters some years ago.

http://www.users.globalnet.co.uk/~ngo/court000.htm

That page was the start and the first two pages linked from it were part of an attempt to produce a comprehensive system.

However, a selection of the codes would be much easier to use, for example by children.

For example, on the web page http://www.users.globalnet.co.uk/~ngo/courtcol.htm
are sixteen direct colours
U+F3E0 BLACK,
U+F3E1 BROWN,
U+F3E2 RED,
U+F3E3 ORANGE,
U+F3E4 YELLOW,
U+F3E5 GREEN,
U+F3E6 BLUE,
U+F3E7 MAGENTA,
U+F3E8 GREY,
U+F3E9 WHITE,
U+F3EA CYAN,
U+F3EB PINK,
U+F3EC DARK GREY,
U+F3ED LIGHT GREY,
U+F3EE LAVENDER and
U+F3EF MINT so a word could be made red by inserting a U+F3E2 character before it and another character after it to change the colour back. That could either be the code for the colour of the text or a code to indicate that the colour should be popped back to the previous colour (the system did not include a code to pop the colour, though one could easily be added to the set of formatting codes if that is seen as desirable).

[Supplementary note by William Overington 20 March 2006

I produced some authoring-time glyphs for the colours.
They are in my Quest text font.

http://www.users.globalnet.co.uk/~ngo/QUESTTXT.TTF


[Reader should definitely be able to choose. I frequently read long PDF documents sideways on my laptop using the rotation ability of Adobe Acrobat and the full-screen display. I can hold the laptop sideways like a book and my left thumb is next to the right-arrow key which tells Acrobat to turn to the next page.]
http://www.users.globalnet.co.uk/~ngo/questtxt.txt


: Indeed, the user can press a button to rotate screen display on the XO, so ideally content should re-lay out to both formats (well-written HTML and SVG can do this); other content will require the user to zoom in/out in the [[Browse]] or [[Read]] activity.
The designs are based upon the Petra Sancta method of depicting colours in old black and white books about heraldry. For example, vertical lines for red, horizontal lines for blue.


===Existing book readers and formats===
They can each be accessed using WordPad on a Windows 98 PC using an Alt code.


Before people make too many decisions about book storage formats and digital book reader features, it would be a good idea to spend some time surveying previous efforts. If you don't have experience using a PDA, then buy an old Palm or [[Zaurus]], install Plucker or OpieReader and try them out. As far as Ebooks are concerned, the OLPC is just a bigger and faster PDA.
Alt 62432 to Alt 62447 repespectively for the colours listed above.


In addition, I think it would be a good idea to build a prototype e-book reader in Python that uses a standard XML document format. Get lots of people to try this out, add and subtract document formatting features, fiddle with the UI, etc. Then, once you have something that works well, if someone has actual data showing that a non-xml format takes less space on the [[JFFS2]] filesystem than plain XML, then the e-book format can be changed.
If these were used in a page authoring system for the future wiki system here being discussed then the content author would not need to use the code numbers. The content author would simply, in authoring mode ordinary, position the cursor line before a text character and then click on one of sixteen coloured buttons within a text colour palette on the screen and the code would be inserted into the text and the display would alter. If the content author chose to go into authoring mode special, the authoring-time symbol for the colour would be shown in a black and white display.
]


Personally, I think that the ideal book format for the OLPC will be a modified form of XML. Each document will have two parts. The first part will be a compressed form of the XML hierarchical structure with byte offsets into the second part. The second part will contain the pure UNICODE text. However, the sequence of the text will not be the same as in the original document. It will have been "scrambled" in order to take advantage of the JFFS2 compression capabilities. [[JFFS2]] compression works only on a single virtual block so if you can move the text around so that similar runs of text are close to each other, then you will get better compression. The byte offsets allow for this to be done. It will be processor intensive to create an ebook but decoding it will be as quick as parsing XML.
It seems to me that one way of looking at it is that this project is trying to produce an open source wiki engine which produces an end user effect of as much as possible of what the Adobe Portable Document Format does yet with having a very straightforward way of a content author entering the content and formatting it.


== See also ==
It seems to me that a good way to approach this is to have a system which allows good black and white effect to be produced just by a content author keying text. The text can then be edited and the formatting characters would override the default settings of the reader.


* [[Wikis for children]];
My own view is that using Unicode Private Use Area codes, (whether those in my attempt of some years ago or starting again with a blank specification) is a good way to go. However, I cannot say that it is necessarily the best way to go as I do not have much experience of other ways of doing it. It is only right and proper that I mention that the idea of using Unicode Private Use Area codes for formatting in this way is seen in some quarters as entirely the wrong way to proceed. My wish is that the best system possible is produced so I have mentioned my ideas here as input for discussions on designing such a wiki engine and if they are used or not used then that is just how it goes.
* [[Wiki Activity Module]].
* [[Read]];
** [[Book reader]]
** [[Book reader feature set]]


[[Category:General Public]]
[End of note by William Overington 19 March 2006]
[[Category:Pedagogical ideas]]
[[Category:Software ideas]]
[[Category:Use cases]]

Latest revision as of 15:51, 15 October 2012

The Use Cases

There have been a lot of healthy discussions around ebooks and book libraries lately. This is not a simple tool selection discussion, it is one in which participants are trying to understand how to use (or create) the best combination of tools that will enable constructionist learning in schools in the developing world.

The challenge lies in the fact that there are multiple ways in which content can be provided (book, text book, article, magazine, multimedia, etc.), and then when constructionism is taken into consideration, there are multiple ways in which users can interact with those different mediums (create them, edit them, comment on them, share them, etc.)

The following use cases describe the most probable points of intersection between media, roles and interaction models:

  1. Curriculum distribution: Content that is currently taught in schools, need to be made available to students on their new laptops. This is going to be a major way in which governments in poor developing countries will be able to justify and allocate the financial resources needed to finance those freely-distributed laptops. They are effectively a replacement for text books. Any solution will need to allow for the distribution and offline reading of curricula.
  2. Content creation: Children should be able to create their own content. They should also be able to share their content and work together on developing it. Not just that, they should be able to modify pre-existing content, to edit it, update it or even modify it to be more relevant to their lives and experience. This means that they should be able/ to modify an ecology chapter to add local knowledge to it, by describing examples from the surrounding environment, or stating a local exception that is not governed by the rules described in the original text.
Before people make decisions on the technology to be used for this, they should make sure they have some hands-on experience with tools like Hyperstudio (Logo-based) or Hypercard. You can download a Hyperstudio Preview from http://www.hyperstudio.com. Hypercard itself has not been distributed for years but Revolution is available in a trial edition at http://www.runrev.com. This implements the hypercard language and the card stack model. Wiki may in fact be too simplistic for the kids because it does not allow active content, i.e. application code
  1. Easy third party book publishing: Any person with scientific, literary or artistic knowledge or experience should be able to easily author content and make it available to children. She also should be able to interact with readers of that content and be able to read and react to any changes made to that content or any comments posted about it.
This is already available. Get a paper book, scan it, compress it with DJVU and display it with the Evince reader included on the OLPC.
  1. Easy search and discovery of third party content: On the other hand, children should be able to access, share and interact with the third party content being provided to them, in the same way that they are able to interact with content that they themselves created and shared.

There is no software that has been evaluated so far that provides such flexibility and integration of features that would fullfill all those requirements. One class of software though, embodies alot of the basic characteristics described above and has been in wide use for a while now. It is called Wiki [wikipedia].

Is the technology being addressed to meet the the needs of the learners? How has student scores changed in comparison to the wiki implementation? (Ashley P.)

Wiki Required Changes

There are many ways in which those challenges could be solved, and wiki is by no means the only option available. But this write up will concentrate on wiki, and discuss what set of basic features wiki will need to get, before it can be considered as a viable option for solving those challenges.

Synchronization Support

In order for wiki to be used as an offline reading tool, a version of it should be running on the local machine. This would either be a Python-based server running locally or something like TiddlyWiki which runs in the browser. This will have additional added advantages, such as providing the children with a more powerful way to store their notes and thoughts than the file system and text files, or maybe a way for them to solve their homework and submit it, etc.

But there will also be a school server where more content than can be stored on the laptops will be stored or cached (in case it is fetched from yet a higher level server). The children will need to be able to obtain content from the school server and store it on the local wiki for offline reading.

They also might want to modify the downloaded content and upload the changes, or create new content and publish it on the school server to share with other students or even for backup reasons (although other more complete backup solutions might need to be provided).

The ability to synchronize specific content between two wikis is needed to allow for this kind of functionality. The user should be able to choose which pages to upload or download, and then the system will need to be able to detect differences in versions and give the user the ability to edit conflicts and make merge decisions when needed, or simply decide not to go on with the upload/download.

Page Grouping

Unless there is some basic ability to group a set of wiki pages and download/upload them together, or browse them sequentially, it will be very difficult to have ebooks or any content of considerable size be distributable or readable through wiki.

Without this feature, two alternatives come to mind. The first is to have all related content be in one page. This might result in articles that very large to the degree where they take a lot of time to render, consume a lot of energy from the processor to render them, and possibly run out of memory while viewing them.

The second alternative is to have the ebooks consist of a normal set of wiki pages that are not grouped in any way. It is obvious why this would be an issue, including, but not limited to, the need to download many pages separately, which would not be doable for the average size textbook.

Given those two alternative, it becomes apparent why some grouping of wiki pages is needed, so that operations can be applied to the group as a whole, and the viewing experience of a book of multiple pages can be enhanced with simple UI features.

This seems to be solved for download. People produce Wikislices, sets of static HTML pages of Wikipedia content, and make them available as Collections. WikiBrowse is a fancier solution involving a local web server that serves content from a compressed Wikipedia dump. -- Skierpage 03:59, 18 October 2008 (UTC)

Drupal is an open source content management platform that might be a solution to the page grouping requirement through the use of its book module.

Programmable Content

One of the greatest educational paths for a constructionist is programming. This was clearly demonstrated by the early success of LOGO in teaching children math skills. In order for this to succeed, children will need to author those programs, interact with them and interact with other children during the creation process.

Given that most programs can be textually represented, wiki could be a great medium for storing and sharing those programs. The challenge is that wiki servers today are designed to be stores of formatted text.

Wiki could be enhanced as a constructionist education tool if its front end were to be integrated with an educational programming language as an interactive front-end interface.

For example, try the toy demo LogoWiki which has a LOGO interpreter in JavaScript and does not need the wiki-mode for programming, testing or saving. (This is an experiment so be gentle.)

This can be used without downloads, which are sometimes difficult or forbidden in schools, and can be taken further to make a Hypercard-like WYSIWYG system.

Via a plug-in such as Squeak uses (e.g. see Squeak Etoys, Squeak Etoys Media), a full range of authoring, compatible with the web can be accomplished within a browser.

Simplified User Interface

Wikis today are fairly simple compared to other kind of applications that try to achieve the same goal. They are not simple enough though to allow young children to use and easily interact with that interface. The function it provides is simple, which means that it should be possible to create an interface that reflects that simplicity. That user interface does not exist yet.

NB: UNICEF has a small group working on making such an interface.

Existing Wiki Code to Start With

There is already a Wiki called Wikidpad that may be appropriate as the base for this type of ebook reader. Unlike most wikis, it is designed to be run as a GUI application, not as a web server. In addition, it is targetted at the individual user who needs to organize their work, their plans or their writing.

Strong points of Wikidpad:

  • Open source tool written in Python
  • Uses a single SQLite database as datastore
  • has minimal GUI requirements making it easy to port from wxWindows to GTK


Landscape format or Portrait Format or a Choice by an Author?

Should the display be landscape format like a web page or portrait format like most hardcopy books or should the author be able to choose? -- William Overington 19 March 2006

Additional question: Should the READER be able to choose?

[Reader should definitely be able to choose. I frequently read long PDF documents sideways on my laptop using the rotation ability of Adobe Acrobat and the full-screen display. I can hold the laptop sideways like a book and my left thumb is next to the right-arrow key which tells Acrobat to turn to the next page.]

Indeed, the user can press a button to rotate screen display on the XO, so ideally content should re-lay out to both formats (well-written HTML and SVG can do this); other content will require the user to zoom in/out in the Browse or Read activity.

Existing book readers and formats

Before people make too many decisions about book storage formats and digital book reader features, it would be a good idea to spend some time surveying previous efforts. If you don't have experience using a PDA, then buy an old Palm or Zaurus, install Plucker or OpieReader and try them out. As far as Ebooks are concerned, the OLPC is just a bigger and faster PDA.

In addition, I think it would be a good idea to build a prototype e-book reader in Python that uses a standard XML document format. Get lots of people to try this out, add and subtract document formatting features, fiddle with the UI, etc. Then, once you have something that works well, if someone has actual data showing that a non-xml format takes less space on the JFFS2 filesystem than plain XML, then the e-book format can be changed.

Personally, I think that the ideal book format for the OLPC will be a modified form of XML. Each document will have two parts. The first part will be a compressed form of the XML hierarchical structure with byte offsets into the second part. The second part will contain the pure UNICODE text. However, the sequence of the text will not be the same as in the original document. It will have been "scrambled" in order to take advantage of the JFFS2 compression capabilities. JFFS2 compression works only on a single virtual block so if you can move the text around so that similar runs of text are close to each other, then you will get better compression. The byte offsets allow for this to be done. It will be processor intensive to create an ebook but decoding it will be as quick as parsing XML.

See also