Talk:Wikislices: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
(Design thoughts for tools)
Line 52: Line 52:
* [http://code.pediapress.com/wiki/wiki PediaPress] has an "mwlib" library for parsing mediawiki text which is freely available
* [http://code.pediapress.com/wiki/wiki PediaPress] has an "mwlib" library for parsing mediawiki text which is freely available
* the "Wikipedia 1.0" team and Andrew Cates (user:BozMo on en:wp) is using their own scripts to generate and review static collections from a list of constantly changing wiki articles.
* the "Wikipedia 1.0" team and Andrew Cates (user:BozMo on en:wp) is using their own scripts to generate and review static collections from a list of constantly changing wiki articles.

== Design thoughts for tools ==

There should be some consideration of different modes of wikislice selection criteria:
#Slice by filtering, for large slice generation, (e.g. all featured articles, all good articles, all articles with Category:''foo'' and so forth.
#Slice based on input list of hand-selected articles See [[Animal_health/Livestock]] for example slice index in English/Spanish on farm animals.

Large slices will probably need to exclude pictures based on size constraints, but small slices like [[Animal_health/Livestock]] example should be able to include images, because intent is a small, but media-rich picture-book.

Input format for list should be simple (say created with text editor from copy/paste of URLs) and the wikislicer tool ideally GUI'ed to allow local creation of wikislices (e.g. from relevant availability in local language version of Wikipedia). Imagine teacher building own slices as class prep.

For bonus points, wikislicer should have a pre-run "quick-spider" mode that provides rapid output size estimate of target slice, allow interactive (or at least quickly iterative) adjustment of parameters (with or without images, scrape one/two link(s) deep within wiki from pages meeting criteria, etc.) before actually downloading/building slice.

Revision as of 07:10, 9 May 2008

see Talk:Bundles for scripts used here

Universalism

The question of universal use of this content needs to be considered. Do we run this project under OLPC entirely? Or do we try to create logical bundles for anyone with a wikireader? What are our ideas that may differ from other Wikipedians?

The Wikireader project is one to develop a toolchain for rendering, revising and searching collections of pages from a wiki into a single browsable/readable collection. Two popular subprojects are the Wikislice project, with an additional toolchain for selecting and updating a list of articles from a large collection to turn into a wikireader, and the Wikibooks offline project, which focuses on updating pages in a wikibook, and a toolchain for pretty html and pdf generation, for better display once converted into a[n offline] wikireader.
OLPC is focusing on wikireaders for basic education, with space constraints, in many languages, with a minor testing focus on readability/browsability on an XO (this has not been an issue yet). the material and readers themselves are for anyone with a[ny] wikireader. When we find ways in which our ideas seem to differ from those of other Wikipedians, they simply highlight one niche audience for wikireaders. Language simplicity, and special attention to space-conservation, including ways to include some images with larger dumps, may be among these. --Sj talk 22:47, 7 May 2008 (EDT)


Meeting minutes 2008-02-2?

care of mel

Meeting minutes moved to Wikislice meetings/2008-02.


Meeting notes 2/21/08

Overall goal of meeting: wiki-hacking session to improve on the tools that Zdenek and others are currently using to make & refine wikislices. Held in #olpc-content on freenode.

Wikipedia snapshots

Developing snapshots of Wikipedia at every order of magnitude from 10MB to 100GB.

snapshot tools

We need...

  • libraries for producing different styles of wikipedia snapshots (wikitext, html, txt, pdf) from categories (special:export), topics/pages (wikiosity), and index pages (wikislices)
  • libraries that can do intelligent things with metadata from history and wikimedia-commons pages
  • libraries that support no-image/thumbnail/mid-res image selection
  • libraries that recalculate blue v. red links given a wikislice

Wiki format glue

We need glue code/scripts to interface between similar projects : WP WikiReaders, Wikibooks, wikipedia wikislice projects, webaroo wikislices, kiwix snapshots, schools-wikipedia snapshots, ksana snapshots, WP 1.0 revision-vetting --- at least at the level of sharing index selections and a list of "good revisions" for included articles.

Offline readers

As a recent focal point, Zvi Boshernitzan and Ben Lisbakken have both made offline wikipedia-readers using Google Gears that are pretty fast and offer some nice features in terms of letting you select a set of articles, cache them locally, and browse an index. We talked last week about how to integrate Gears more tightly into a browsing experience, with hopes of pursuing a prototype withing a couple of weeks. It would be helpful to inform such a client with lists of good revisions of articles, such a those Martin Walker and Andrew Cates have developed for their own projects... and to plan for it to support offline editing as well as reading, using synchronization tools such as Mako's distributed wiki client.

What can people do?

  • wikipediaondvd - Pascal and Guillame are trying to help


Older notes

Code libraries

  • KsanaForge and their KsanaWiki project have a set of scripts that process raw xml dumps from MediaWiki. They are working on producing read-only flash drives and SD cards for distribution.
  • Linterweb, developer of one of the freely-available static selections of Wikipedia, has an open source toolchain for building it; they are also working on wiki search engines (see Kiwix) and have offered to help build the local-filesystem search for the journal.
  • The Moulinwiki project and Renaud Gaudin have a toolchain from processing html output from the MediaWiki parser. They are now combining forces with Linterweb.
  • PediaPress has an "mwlib" library for parsing mediawiki text which is freely available
  • the "Wikipedia 1.0" team and Andrew Cates (user:BozMo on en:wp) is using their own scripts to generate and review static collections from a list of constantly changing wiki articles.

Design thoughts for tools

There should be some consideration of different modes of wikislice selection criteria:

  1. Slice by filtering, for large slice generation, (e.g. all featured articles, all good articles, all articles with Category:foo and so forth.
  2. Slice based on input list of hand-selected articles See Animal_health/Livestock for example slice index in English/Spanish on farm animals.

Large slices will probably need to exclude pictures based on size constraints, but small slices like Animal_health/Livestock example should be able to include images, because intent is a small, but media-rich picture-book.

Input format for list should be simple (say created with text editor from copy/paste of URLs) and the wikislicer tool ideally GUI'ed to allow local creation of wikislices (e.g. from relevant availability in local language version of Wikipedia). Imagine teacher building own slices as class prep.

For bonus points, wikislicer should have a pre-run "quick-spider" mode that provides rapid output size estimate of target slice, allow interactive (or at least quickly iterative) adjustment of parameters (with or without images, scrape one/two link(s) deep within wiki from pages meeting criteria, etc.) before actually downloading/building slice.