DJVU: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
==Why is DJVU important?==
In regions where computers are scarce and there is little support for native scripts, DJVU allows existing paper books to be scanned and distributed as ebooks. Even handwritten books can be distributed this way. Tie this together with the OLPC chat application's support for SVG input and the GECKO support for displaying SVG graphics and it is concievable to distribute a computer with no font support and '''NO TEXT AT ALL''' in its user interface. Icons would substitute for text in the UI and handwriting would be a primary mode of input. Note that the OLPC has a wider than normal touchpad to be used as a handwriting input device.

Of course, this is a bootstrap scenario. Once the OLPC is deployed in this way, native language speakers will begin to work on fonts, and a keyboard layout to enable text use on the OLPC. This could take months or years to sort out, but in the meantime, the kids have an educational tool to use.

==What is DJVU?==
==What is DJVU?==
The main site for information on [http://en.wikipedia.org/wiki/Djvu DJVU] compression format for ebooks is here http://www.djvuzone.org/
The main site for information on [http://en.wikipedia.org/wiki/Djvu DJVU] compression format for ebooks is here http://www.djvuzone.org/
Line 19: Line 14:
If you want to contribute to the DJVU project in any way, here is the site:
If you want to contribute to the DJVU project in any way, here is the site:
http://djvulibre.djvuzone.org/
http://djvulibre.djvuzone.org/

==Why is DJVU important?==
In regions where computers are scarce and there is little support for native scripts, DJVU allows existing paper books to be scanned and distributed as ebooks. Even handwritten books can be distributed this way. Tie this together with the OLPC chat application's support for SVG input and the GECKO support for displaying SVG graphics and it is concievable to distribute a computer with no font support and '''NO TEXT AT ALL''' in its user interface. Icons would substitute for text in the UI and handwriting would be a primary mode of input. Note that the OLPC has a wider than normal touchpad to be used as a handwriting input device.

Of course, this is a bootstrap scenario. Once the OLPC is deployed in this way, native language speakers will begin to work on fonts, and a keyboard layout to enable text use on the OLPC. This could take months or years to sort out, but in the meantime, the kids have an educational tool to use.


==How Do We Produce DJVU Documents?==
==How Do We Produce DJVU Documents?==
Line 35: Line 35:
==Articles and Papers==
==Articles and Papers==
* [http://www.profsurv.com/archive.php?issue=48&article=671 this article from Professional Surveyor magazine] explains how the National Land Survey of Sweden went about converting their historical archive to DJVU format.
* [http://www.profsurv.com/archive.php?issue=48&article=671 this article from Professional Surveyor magazine] explains how the National Land Survey of Sweden went about converting their historical archive to DJVU format.


[[Category:File formats]]

Revision as of 16:46, 25 June 2006

What is DJVU?

The main site for information on DJVU compression format for ebooks is here http://www.djvuzone.org/
Recently a good overview article was published on News Forge.

In a nutshell, DJVU was invented to solve this problem:

Conventional web formats such as JPEG, GIF, and PNG produce prohibitively large image files at decent resolution. As a result, Web site content developers have been largely unable to leverage existing printed materials.

DJVU is intended to be used with scanned images of book pages, either black & white or full color. It then compresses those scanned pages to produce very highly compressed files.

Given that the target countries for the OLPC have poorly developed computing infrastructures, scanning of existing printed documents into DJVU format may be the fastest way of making a wide variety of educational material and Ebooks available to the kids.

DJVU is supported by the Evince reader which is being used by the OLPC project.

If you want to contribute to the DJVU project in any way, here is the site: http://djvulibre.djvuzone.org/

Why is DJVU important?

In regions where computers are scarce and there is little support for native scripts, DJVU allows existing paper books to be scanned and distributed as ebooks. Even handwritten books can be distributed this way. Tie this together with the OLPC chat application's support for SVG input and the GECKO support for displaying SVG graphics and it is concievable to distribute a computer with no font support and NO TEXT AT ALL in its user interface. Icons would substitute for text in the UI and handwriting would be a primary mode of input. Note that the OLPC has a wider than normal touchpad to be used as a handwriting input device.

Of course, this is a bootstrap scenario. Once the OLPC is deployed in this way, native language speakers will begin to work on fonts, and a keyboard layout to enable text use on the OLPC. This could take months or years to sort out, but in the meantime, the kids have an educational tool to use.

How Do We Produce DJVU Documents?

Workflow Planning

First, you need to think of this in terms of setting up a workflow. There are several steps, some of which require technical expertise and some which do not. In addition, the expertise required to set up and maintain the workflow is different from that required to make encoding decisions and check the quality of scans.

Scanning

Some scanners can handle bound books but they cost a lot more money. However, if you can spare a copy, then you can take it apart and scan the pages on a flatbed scanner. Save the files in an uncompressed TIFF format because they will be processed further. Pages should be scanned in color because the DJVU compression software produces a better result that way.

Encoding the Pages

Next, you need to process the individual page scans with various tools to encode the pages. Different encoding tools may be used for different pages depending on the presence of illustrations, photos, colored text, etc. Pages can be segmented into a black and white layer and a color layer so that different encoders can be used on each. In addition, if you have an OCR program for the script that the book is written in, you can run the black and white segment through it.

Bundling and Postprocessing

After this you have various pieces which you need to bundle together into a multipage book file. Then, you may wish to further process the book to add text annotations, precompute thumbnail images of pages, etc. Perhaps the book is written in an archaic form of the language and you wish to annotate it with a glossary similar to what we do with Shakespeare's plays.

Testing

Don't forget to test your book thoroughly using Evince to make sure that there are no problems with using it on the OLPC.

Tools

If you would rather have the scanning done by a company with expertise in the field, that is possible. Once the first pilot country is deployed, there will likely be other companies who can offer this service. But the tools needed are all open source so you can also set up your own production line for scanning books.

Articles and Papers