Book reader meetings
- Today's topics: producing better downloadable collections, testing new Read features, and gnubook support.
We'll meet again next week, and every two weeks after that, on Friday; time TBA.
Sayamindu is close to having gnubook support ready to test, and it's looking great for short books with full-resolution page images. (roughly 5-10x the size of the equivalent PDF. there's more work to be done here.) To show off and compare gnubook/pdf/html support, we discussed a few next steps:
- 1) get a list of ~5000 titles : draw from the Archive, Gutenberg, CK-12 (15 textbooks), & a mobile classics list. Make sure all are available as html, pdf, and flipbook [there's some instance-to-work association needed here, and potentially pdf-to-html conversion with image placement].
- 2) define an AJAX html-reader (based on something very simple) that works on local collections, provides a simple stylesheet with omnipresent links back to metadata / to an online high-quality version [a flipbook] where available, and styles text well for reading : margins & good font size for longer reading spells in wide columns.
- 3) write a bundling script that understands the Open Library (or other) metadata api and can generate an .xol collection index for a list of books. A single work's metadata should include a link to the original, and a link to each of its formats online.
- 4) define related projects :
- a. define a file extension? for flipbook books, since they take their own reader / find another way to auto launch the reader from clicking on a link to a book file
- b. make each of the above smaller -- shrink Flipbook images, use text-only pdf's, compress a shelf of html books and only unpack the one being read (all in the reader)
- c. publish a toolchain for converting from each format to the others
- d. use wikisource to publish and correct OCR-html from pristine PDFs for a wikibook version of each... rate limited and on demand, so as not to flood ws.
- e. extend Read testing to epub and djvu formats
- f. figure out who will host/maintain collection updates for each collection (to complete its metadata and the URL to watch for updates)
- g. Make 5-10 large bundles of HTML only; with links to the flipbooks for each. Provide a single-book-bundling link next to each flipbook... or simply a download link in a unique format. [you don't want Read to load all zip files for you]
Now that people will have a choice of formats to use, let them give real feedback. Define a 30-min test suite for picking a collection and a book in it and testing various reader options. Find heavy reader already addicted to reading longer works on their laptops or mobiles, get input.
In terms of finding a better html viewer, some thoughts tossed around about design and layout options: --> use the flipbook frame/design? html is already chunked into pages... --> which books have pagebreaks in their html? add to metadata somehow. --> correlate the OCR, which has pagebreaks, and insert based on statistics
On the Read side, a couple other new features should be landing soon -- send in your own feature requests please. On the Open Library side, edward in London is working on some harder library metadata problems, and we should get him on this list.