Talk:One encyclopedia per child: Difference between revisions

From OLPC
Jump to navigation Jump to search
(+esperanto)
 
(11 intermediate revisions by 4 users not shown)
Line 126: Line 126:
**From Special:Export for small (maybe order 1000 or so) page sets.
**From Special:Export for small (maybe order 1000 or so) page sets.
**Maybe from dump files for larger sets.
**Maybe from dump files for larger sets.
**See a sample wikipedia XML dump by going to http://en.wikipedia.org/w/index.php?title=Special:Export
*Assemble xml back into wikitext
*Assemble xml back into wikitext
*:title,text, what other fields are needed?
* fields needed:
** <title> - the page title
** <text> - page text (in wiki syntax)
** <timestamp> - for use in determining if this page needs to be upgraded (see below)
** ...
*modifiy wikitext
*modifiy wikitext
**template blacklist
**template blacklist
Line 134: Line 139:
***handwritten python wikitext-&gt;html conversion
***handwritten python wikitext-&gt;html conversion
**country specified, high quality
**country specified, high quality
***use Preview to render modifed wikitext on the original server.
***use Save in a project subpage to render modifed wikitext on the original server.
***scrape resulting html and images
***scrape resulting html and images
***extract core html, and rewrap with olpc "frame".
***extract core html, and rewrap with olpc "frame".


Re handwritten python wikitext-&gt;html conversion:
Re handwritten python wikitext-&gt;html conversion:
*Pilaf's [http://en.wikipedia.org/wiki/User:Pilaf/InstaView InstaView] [http://en.wikipedia.org/w/index.php?title=User:Pilaf/instaview.js&action=raw&ctype=text/javascript&dontcountme=s js script] is one existant prototype.
* Pilaf's [http://en.wikipedia.org/wiki/User:Pilaf/InstaView InstaView] [http://en.wikipedia.org/w/index.php?title=User:Pilaf/instaview.js&action=raw&ctype=text/javascript&dontcountme=s js script] is one existant prototype.
* [http://openwetware.org/wiki/OpenWetWare:Dewikify OpenWetWare's Dewikify System]
*Is there anything better out there, to use as tool or prototype?
*Is there anything better out there, to use as tool or prototype?
* The conversion should be able to take the wikitext on [[Talk:One encyclopedia per child/sample_wiki_text]] and convert it to the html that mediawiki does. (This defines a set of unit tests that must be passed for the converter).

=== Current work ===
*There is the existing perl script. See jblucks's git for a copy. Not currently under active development?
*jblucks has been looking at a python solution, and has a copy of the current perl script, at:
git clone http://slyjbl.hopto.org/~projects/wikipedia_oplpc_scripts.git
*Arael72 is working on an html-modifier written in php, and compressing image files. http://dictionary.110mb.com/olpc2/
*MitchellNCharity has been looking at a command-line oriented version, based on wikitext and using the wikis to expand templates and render html.
http://www.vendian.org/mncharity/olpc/ something

=== feature requests ===
''break out the task into subtasks, such as the following:''
# how to get xml dumps
# how to do screen scraping (may be deprecated, but a short description pointing to code)
# how to whitelist templates; the kinds of templates to watch out for. what they look like in wikitext, what they look like rendered
# how to take in a list of keywords
# an algorithm for finding other languages' articles from an original list... [via wikitext or html]
#* perhaps these links are listed at the bottom of the english page?
# an algorithm for pruning or keeping external links (can be based on a preference)
# a script for making links local
## an algorithm for checking to see which targets exist and removing links whose targets do not
# a script for grabbing thumbnails and resized images from articles.
## option to only take the first N images from a page
## option to leave out images larger then X
## option to include the source high-res image
## a script for grabbing metadata about the image, and a way to store that locally (also image: pages? if so, fine; make sure the href around the image points to the right place)
# a script for getting author data -- say, sorting through all authors from a dump and ordering unique authors by name with links to their contrib history online (as wikitravel does)
## --or just grabbing the last 5 non-minor authors from the page history
# a snapshot-update script that doesn't repeat all of this, keeps the same keyword list, but checks to see which pages have changed since last grab

*sj/irc 2007jul21:
ideally take the english body text from schools-wikipedia
add the external links section from the current en:wp if it exists
and grab the wp page for other languages...
be sure to capture the article's oldid for anything that comes from wp
so we can selectively update just a few oldid's if we find things that should be removed.
the script should know the difference between "get this list of oldid's,
with a blacklist of id's that should be replaced with sth new"
and "generate a set of articles from scratch, with the newest pageids"
we would do the latter every major revision, and the former for minor updates.

==Browsing relationships between articles==
In addition to searching and browsing for articles, the encyclopedia could provide several ways to explore the '''''relationships''''' between articles, such as...

* Time relationship between articles - When the concepts were invented/discovered, via a timeline.
* Space relationship between articles - The same/nearby countries/places, via maps or lists.
* Person relationship between articles - Ideas by the same mathematician, scientist, composer, etc, as a list/tree.
* Subset/superset relationship between articles - Such as mathematics = geometry + algebra, etc, as a tree.
* Family relationship between articles - Kings and queens, business/artistic-dynasties, as a line/tree.
* System/sub-system relationship between articles - Sub-systems of a spacecraft, parts of the human body, parts of a computer, etc, as trees or diagrams.


The relationship-information could be gathered automatically, from the hyperlinks between articles, etc, or manually, by people preparing timelines, etc, as in many existing CD-ROM encyclopedias.


The automatically-gathered relationship information might be from...

* Hyperlinks between two articles, such as ‘electricity’ and ‘magnetism’.
* Finding third articles which hyperlink to both ‘electricity’ and ‘magnetism’.
* Finding third articles containing the words ‘electricity’ and ‘magnetism’
* Checking all possible pairs of article-names in the encyclopedia using the above methods, to create a list of relationships between article-pairs.
* Tags that the authors have applied to similar/related articles.
* Tags that users have applied to similar/related articles in previous online editions.


As well as preparing fixed relationship information when the encyclopedia is published, the user could be provided with search facilities to explore the relationships. They would enter/select the names of any two articles and the relationships between them would be found dynamically, at that time. They could specify constraints, such as date-ranges, maximum distance between locations, artist/inventor’s name/nationality, etc.

The user might also be provided with a natural-language interface to retrieve articles, so they can enter queries like...

''rivers near Cairo''

''parts of a computer''

''paintings by Michelangelo after 1495''

Preparing an encyclopedia is a lot of work, so the first version of the encyclopedia could just have the standard Wikipedia ways of accessing information. These relationship features could be added in the second edition.

--[[User:Ricardo|Ricardo]] 05:25, 9 August 2007 (EDT)

== Esperanto ==
The goal is to encourage and plan the '''creation of OEPC, One Encyclopedia Per Child, in the Esperanto language'''. We supporters of Esperanto feel that incorporating an opportunity to learn Esperanto into the OLPC project is an invaluable opportunity to give the end recipients of the laptops (the children) a more international, more effective language education. (See [[Talk:Esperanto]])

=== Foreseen obstacles ===

To create an Esperanto version of OEPC would require a large volunteer effort. Needless to say, the Esperanto-speaker population is not huge (2-5 million), and probably, most volunteers would need proficiency in both Esperanto ''and'' English to effectively contribute.

Additionally, it appears that the English OEPC is based on articles pulled from the Simple English Wikipedia, to make all articles easy for young English-learners to read. While Esperanto as a whole is generally considered easier and quicker to learn than English, the vocabulary used in Esperanto's Vikipedio is likely more advanced, more precise, than the Simple English vocabulary.

=== Introduction in Esperanto / Prezento en esperanta lingvo ===

Pli frue mi ekrimarkis artikolon ([[Talk:Esperanto]]) proponanta enigo de esperanta instrua ebleco en la projekton OLPC. Aktuale ŝajnas ke la plej racia vojo al tiu celo (kiun, al mi kaj aliaj, havŝajnas egan potencialon helpi la ricevontulojn) estas, traduki OEPC-on (Unu Enciklopedio Por Junulo) en la esperantan kaj havebligi ĝin por ke ĝi ĉeestu kiam OLPC prezentas softvarajn eblecojn al la eduka ministerio de ĉiu lando. Se vi interesiĝus doni al ĉi projekto - iel ajn, kvankam verŝajne anglalingva scipovo nepros por multaj taskoj - ne hezitu skribi sur la diskutpaĝo (Discussion) aŭ kontakti ĉi-projektestron, aktuale al hunt.topher@yahoo.com (ĉar aktuale neniu alia partoprenas).

[[Category:Languages (international)]]

Latest revision as of 10:57, 14 February 2008

Culturally inappropriate heading images

Three of the images in the alphabet heading page are culturally inappropriate in some countries and should be replaced.

G - Gun - these weapons are illegal in the United Kingdom and it is considered inappropriate to use them in an educational context outside of anti-crime lessons.

P - Pig - this is considered by Muslims and Jews to be an unclean animal and children are discouraged from toys, books, etc. which contain pig characters or images. In the UK, Islam is a common religion, in the USA, Judaism is common. Best to choose something else for P.

D - Dog - in Arab culture, the dog is seen as an unclean animal. Again, there is a significant Arab immigrant community in both the USA and the UK. In addition, English is used in education in Arab countries, especially Iraq. Again, better to use some other image.

In reply

As a result of very local criticism earlier on, we had removed the bullets, and are still seeking something better than G -Gun, also because it is "sexist". Other ideas we have seen are G - Grapes; Gorilla; Goat; Girl; Giraffe; Gecko; Glass; Gift; but no three letter words except for Gap, Gas, Gel, Gem, Gin, Gnu, Gum, and Gym which will create problems with finding images. We have chosen "Gift"

The next choices would be P - Peg or Pen. We have chosen Peg. We have elected not to change "Dog". --155.232.250.35 00:25, 9 November 2006 (EST)

Peg? What kids in the 21st century English speaking world have ever seen a clothespeg? Of that, what percentage believe that clothespeg begins with "C", not "P". On the other hand, 100% of kids in the 21st century English speaking world know what a PEN is, have seen various pens and have most likely had the personal experience of writing with one. An ballpoint pen from a side view with the pocket-clip and plunger knob to click it in and out, would be ideal.
Oh, and by the way, Grapes are a common fruit in the English speaking world. Why confuse the kids with a picture of a Present. They know that Present begins with P, not G.
I think Gecko works better for G, as for D he's right Dog would not be the best choice here, Duck would probably be a much better neutral fit. Asian countries don't really do a lot of Milk drinking do they? And how many countries actually have Yo-Yos? For M I suggest Monkey, and for Y I suggest Yak. As for P I assume is a clothespin? Maybe since these are children animals and shapes should be used whenever possible, the less abstract the better so that they learn it and forget it and don't have to re-acquire the context everytime they access the OEPC interface. For P I suggest a Pelican facing left. --Basique 10:41, 25 November 2006 (EST)
It's also sexist to exclude the gun, and non-multicultural to exclude the pig. One could sort of solve the problem by adding more pictures rather than taking pictures away, but really this idea of using pictures for letters is rather poor. Pictures for categories sort of might not be too terrible. As for the current pictures:
  • A bug/insect
  • B package
  • C pussy
  • D wolf
  • E broken
  • F picture (fans have spinning blades)
  • G package/present
  • H cover
  • I bottle/perfume (only old people have ever seen ink like that!)
  • J vase
  • K key -- hay, no confusion there!
  • L tree
  • M pour/glass
  • N lace/fishnets/mesh
  • O bird
  • P art supply???
  • Q cards/hearts
  • R silver/jewlry
  • S light/circle/storm
  • T water/faucet/drop
  • U shade
  • V bottle/jug
  • W drop/splash
  • X hands/bones
  • Y toy/string/cord
  • Z undress/clothes

Single theme for alphabet images

Why not stick to a single theme (animals or household items) for the alphabet images? It would make it easier to identify even for non native speakers (eg: learning a second language). Also, when the alphabet is "translated" to other languages, having a common theme for the images would allow a more uniform look for the project.

In reply

A theme of animals was considered, but it introduces "foreign" animals such as Y - Yak. A problem with household objects is that in developing sountries, the implements in the hut are not so diverse. A similar look for different languages is at the bottom of the list of priorities in what we have discovered to be a highly over-constrained system. --Olpcme 14:51, 19 September 2006 (EDT)

Browsing Method

Although the alphabet is the traditional organisation method for print encyclopedias it has not been used on Wikipedia and I'm not convinced it is the best scheme to use for OEPC.

I suggest the main page have big navigation buttons, as here. The top row to include general functions

  • Random Page,
  • Browse by subject (leading to Category hierarchy),
  • Search.

Below that buttons leading to Portal pages for each of the main subject areas:

  • Science
  • History
  • The World
  • Mathematics
  • Biography
  • Art and Culture

All with lots of hyper links so that you can arrive at the information you want via various routes.

In reply

We have four spare squares on the corners which could be used for alternative accesses. Possible buttons are "Surprise" "Subjects" "Search" and "Spare". The icons could have a flat diagonal design / \ and \ / to hint that these are different or be 3-D shaded as buttons to indicate that they are different. --Olpcme 14:50, 19 September 2006 (EDT)

Method to contain the full wikipedia

I had an idea - if I understand correctly these laptops will form a network and be helping routing each other's data. if that's the case - perhaps we can distribute the encyclopedia between them.

all laptops have the basic summary version of the encyclopedia as a fallback that always works - and also have random bits of the media and appendices so that together several laptops contain the full thing.

cheers, yair

English as first choice?

Taking into consideration the OLPC linguistic deployment environment, wouldn't it be more sensible to target a non-english language as a starter? Let's face it: in the developing world, english is either non-existant to the common people or used as the official language due to colonial legacy and lack of a 'dominant' local language. English is currently a lingua franca, but the OLPC (afaik) is focusing on basic education in local/native languages. IOW, you may speak quechua at home, schooling in spanish, and later (if lucky) you learn english 'as a second language' in high school or later.

If another language is chosen, instead of english, as the 'initial' language for the OEPC, I think that the effort will be more in line with the objectives. Looking at the (current) target population of the OLPC I see three interesting languages: portuguese, spanish and arabic, in three countries: Brazil, Argentina and Libya. Nigeria and Thailand are different cases.

Why are they 'interesting'?

  • Portuguese is so because of sheer size and density: 190 million of brazilians organized under a single government - that is a vast population to satisfy, where 'economies of scale' could apply.
  • Spanish is native/official to practically all of latin america (~350 million people excluding Brazil) and you start with 10% of it: Argentina (40 million) allowing to create a core and then reuse/expand/modify for country specifics; allowing for 'incremental development'.
  • Arabic in the case of Libya could be similar to spanish/Argentina: initially 5.6 million people generating a core that may expand to cover ~300(+?) million globally. (Note: as far as I know, Arabic can be quite different among regions and countries, so this 'incremental' approach may not apply).

Another interesting side effect of not chosing english, is that it forces to think globally - or at least outside 'comfort zones'. Regardless of the good intentions and good faith; cultural, religious, political and other faux-pas are quite common when propagating or 'replicating' from one culture to another. Sometimes the language barrier helps to enforce the cultural barrier and avoids assumptions that lead to awkward moments or situations.

All that said, english is still a very good starting point! :) I just felt worthwhile noting that globally, english although very useful and a 'natural' starting point may not necessarily be the best first choice...--Xavi 13:39, 26 November 2006 (EST)

Portuguese fits with Orkut

There are huge numbers of Brazilians on-line, many of them using the Google's social networking site named Orkut. Set up an Orkut group to develop content for a Portuguese encyclopedia for OLPC and you will quickly have thousands of helping hands.

It is silly to think that translating an English encyclopedia will work. After all, Brazilian kids should learn about tapirs, not bears, guavas, not apples, the Itaipu dam, not the Hoover dam, etc.

Alternate images

How about the older more iconic versions of Key for K, Vase for V and Ring for R below? Yo-Yo still doesn't work for Y how about Yellowjacket, And clothes pin does not work well for P, Penguin works much better. All images below are Public Domain files.

Key0.jpgVase0.jpgRing0.jpgYellowjacket0.jpgPenguin0.jpg

Scripting - designing revision 2

This section is for discussion of the new revision two scripting.

Objectives:

  • ...

Pipeline:

  • obtain xml
  • Assemble xml back into wikitext
  • fields needed:
    • <title> - the page title
    • <text> - page text (in wiki syntax)
    • <timestamp> - for use in determining if this page needs to be upgraded (see below)
    • ...
  • modifiy wikitext
    • template blacklist
  • Here the pipeline forks. There are two paths:
    • kid specified, runs on xo, low quality
      • handwritten python wikitext->html conversion
    • country specified, high quality
      • use Save in a project subpage to render modifed wikitext on the original server.
      • scrape resulting html and images
      • extract core html, and rewrap with olpc "frame".

Re handwritten python wikitext->html conversion:

Current work

  • There is the existing perl script. See jblucks's git for a copy. Not currently under active development?
  • jblucks has been looking at a python solution, and has a copy of the current perl script, at:
git clone http://slyjbl.hopto.org/~projects/wikipedia_oplpc_scripts.git
  • Arael72 is working on an html-modifier written in php, and compressing image files. http://dictionary.110mb.com/olpc2/
  • MitchellNCharity has been looking at a command-line oriented version, based on wikitext and using the wikis to expand templates and render html.
http://www.vendian.org/mncharity/olpc/ something

feature requests

break out the task into subtasks, such as the following:

  1. how to get xml dumps
  2. how to do screen scraping (may be deprecated, but a short description pointing to code)
  3. how to whitelist templates; the kinds of templates to watch out for. what they look like in wikitext, what they look like rendered
  4. how to take in a list of keywords
  5. an algorithm for finding other languages' articles from an original list... [via wikitext or html]
    • perhaps these links are listed at the bottom of the english page?
  6. an algorithm for pruning or keeping external links (can be based on a preference)
  7. a script for making links local
    1. an algorithm for checking to see which targets exist and removing links whose targets do not
  8. a script for grabbing thumbnails and resized images from articles.
    1. option to only take the first N images from a page
    2. option to leave out images larger then X
    3. option to include the source high-res image
    4. a script for grabbing metadata about the image, and a way to store that locally (also image: pages? if so, fine; make sure the href around the image points to the right place)
  9. a script for getting author data -- say, sorting through all authors from a dump and ordering unique authors by name with links to their contrib history online (as wikitravel does)
    1. --or just grabbing the last 5 non-minor authors from the page history
  10. a snapshot-update script that doesn't repeat all of this, keeps the same keyword list, but checks to see which pages have changed since last grab
  • sj/irc 2007jul21:
ideally take the english body text from schools-wikipedia
add the external links section from the current en:wp if it exists
and grab the wp page for other languages...
be sure to capture the article's oldid for anything that comes from wp
so we can selectively update just a few oldid's if we find things that should be removed.
the script should know the difference between "get this list of oldid's,
 with a blacklist of id's that should be replaced with sth new"
 and "generate a set of articles from scratch, with the newest pageids"
we would do the latter every major revision, and the former for minor updates.

Browsing relationships between articles

In addition to searching and browsing for articles, the encyclopedia could provide several ways to explore the relationships between articles, such as...

  • Time relationship between articles - When the concepts were invented/discovered, via a timeline.
  • Space relationship between articles - The same/nearby countries/places, via maps or lists.
  • Person relationship between articles - Ideas by the same mathematician, scientist, composer, etc, as a list/tree.
  • Subset/superset relationship between articles - Such as mathematics = geometry + algebra, etc, as a tree.
  • Family relationship between articles - Kings and queens, business/artistic-dynasties, as a line/tree.
  • System/sub-system relationship between articles - Sub-systems of a spacecraft, parts of the human body, parts of a computer, etc, as trees or diagrams.


The relationship-information could be gathered automatically, from the hyperlinks between articles, etc, or manually, by people preparing timelines, etc, as in many existing CD-ROM encyclopedias.


The automatically-gathered relationship information might be from...

  • Hyperlinks between two articles, such as ‘electricity’ and ‘magnetism’.
  • Finding third articles which hyperlink to both ‘electricity’ and ‘magnetism’.
  • Finding third articles containing the words ‘electricity’ and ‘magnetism’
  • Checking all possible pairs of article-names in the encyclopedia using the above methods, to create a list of relationships between article-pairs.
  • Tags that the authors have applied to similar/related articles.
  • Tags that users have applied to similar/related articles in previous online editions.


As well as preparing fixed relationship information when the encyclopedia is published, the user could be provided with search facilities to explore the relationships. They would enter/select the names of any two articles and the relationships between them would be found dynamically, at that time. They could specify constraints, such as date-ranges, maximum distance between locations, artist/inventor’s name/nationality, etc.

The user might also be provided with a natural-language interface to retrieve articles, so they can enter queries like...

rivers near Cairo

parts of a computer

paintings by Michelangelo after 1495

Preparing an encyclopedia is a lot of work, so the first version of the encyclopedia could just have the standard Wikipedia ways of accessing information. These relationship features could be added in the second edition.

--Ricardo 05:25, 9 August 2007 (EDT)

Esperanto

The goal is to encourage and plan the creation of OEPC, One Encyclopedia Per Child, in the Esperanto language. We supporters of Esperanto feel that incorporating an opportunity to learn Esperanto into the OLPC project is an invaluable opportunity to give the end recipients of the laptops (the children) a more international, more effective language education. (See Talk:Esperanto)

Foreseen obstacles

To create an Esperanto version of OEPC would require a large volunteer effort. Needless to say, the Esperanto-speaker population is not huge (2-5 million), and probably, most volunteers would need proficiency in both Esperanto and English to effectively contribute.

Additionally, it appears that the English OEPC is based on articles pulled from the Simple English Wikipedia, to make all articles easy for young English-learners to read. While Esperanto as a whole is generally considered easier and quicker to learn than English, the vocabulary used in Esperanto's Vikipedio is likely more advanced, more precise, than the Simple English vocabulary.

Introduction in Esperanto / Prezento en esperanta lingvo

Pli frue mi ekrimarkis artikolon (Talk:Esperanto) proponanta enigo de esperanta instrua ebleco en la projekton OLPC. Aktuale ŝajnas ke la plej racia vojo al tiu celo (kiun, al mi kaj aliaj, havŝajnas egan potencialon helpi la ricevontulojn) estas, traduki OEPC-on (Unu Enciklopedio Por Junulo) en la esperantan kaj havebligi ĝin por ke ĝi ĉeestu kiam OLPC prezentas softvarajn eblecojn al la eduka ministerio de ĉiu lando. Se vi interesiĝus doni al ĉi projekto - iel ajn, kvankam verŝajne anglalingva scipovo nepros por multaj taskoj - ne hezitu skribi sur la diskutpaĝo (Discussion) aŭ kontakti ĉi-projektestron, aktuale al hunt.topher@yahoo.com (ĉar aktuale neniu alia partoprenas).