Talk:One encyclopedia per child: Difference between revisions
(→Scripting - designing revision 2: added link to sample_wiki_text for unit testing purposes) |
(→feature requests: added another feature request) |
||
Line 168: | Line 168: | ||
## --or just grabbing the last 5 non-minor authors from the page history |
## --or just grabbing the last 5 non-minor authors from the page history |
||
# a snapshot-update script that doesn't repeat all of this, keeps the same keyword list, but checks to see which pages have changed since last grab |
# a snapshot-update script that doesn't repeat all of this, keeps the same keyword list, but checks to see which pages have changed since last grab |
||
*sj/irc 2007jul21: |
|||
ideally take the english body text from schools-wikipedia |
|||
add the external links section from the current en:wp if it exists |
|||
and grab the wp page for other languages... |
|||
be sure to capture the article's oldid for anything that comes from wp |
|||
so we can selectively update just a few oldid's if we find things that should be removed. |
|||
the script should know the difference between "get this list of oldid's, |
|||
with a blacklist of id's that should be replaced with sth new" |
|||
and "generate a set of articles from scratch, with the newest pageids" |
|||
we would do the latter every major revision, and the former for minor updates. |
Revision as of 15:34, 21 July 2007
Culturally inappropriate heading images
Three of the images in the alphabet heading page are culturally inappropriate in some countries and should be replaced.
G - Gun - these weapons are illegal in the United Kingdom and it is considered inappropriate to use them in an educational context outside of anti-crime lessons.
P - Pig - this is considered by Muslims and Jews to be an unclean animal and children are discouraged from toys, books, etc. which contain pig characters or images. In the UK, Islam is a common religion, in the USA, Judaism is common. Best to choose something else for P.
D - Dog - in Arab culture, the dog is seen as an unclean animal. Again, there is a significant Arab immigrant community in both the USA and the UK. In addition, English is used in education in Arab countries, especially Iraq. Again, better to use some other image.
In reply
As a result of very local criticism earlier on, we had removed the bullets, and are still seeking something better than G -Gun, also because it is "sexist". Other ideas we have seen are G - Grapes; Gorilla; Goat; Girl; Giraffe; Gecko; Glass; Gift; but no three letter words except for Gap, Gas, Gel, Gem, Gin, Gnu, Gum, and Gym which will create problems with finding images. We have chosen "Gift"
The next choices would be P - Peg or Pen. We have chosen Peg. We have elected not to change "Dog". --155.232.250.35 00:25, 9 November 2006 (EST)
- Peg? What kids in the 21st century English speaking world have ever seen a clothespeg? Of that, what percentage believe that clothespeg begins with "C", not "P". On the other hand, 100% of kids in the 21st century English speaking world know what a PEN is, have seen various pens and have most likely had the personal experience of writing with one. An ballpoint pen from a side view with the pocket-clip and plunger knob to click it in and out, would be ideal.
- Oh, and by the way, Grapes are a common fruit in the English speaking world. Why confuse the kids with a picture of a Present. They know that Present begins with P, not G.
- I think Gecko works better for G, as for D he's right Dog would not be the best choice here, Duck would probably be a much better neutral fit. Asian countries don't really do a lot of Milk drinking do they? And how many countries actually have Yo-Yos? For M I suggest Monkey, and for Y I suggest Yak. As for P I assume is a clothespin? Maybe since these are children animals and shapes should be used whenever possible, the less abstract the better so that they learn it and forget it and don't have to re-acquire the context everytime they access the OEPC interface. For P I suggest a Pelican facing left. --Basique 10:41, 25 November 2006 (EST)
- It's also sexist to exclude the gun, and non-multicultural to exclude the pig. One could sort of solve the problem by adding more pictures rather than taking pictures away, but really this idea of using pictures for letters is rather poor. Pictures for categories sort of might not be too terrible. As for the current pictures:
- A bug/insect
- B package
- C pussy
- D wolf
- E broken
- F picture (fans have spinning blades)
- G package/present
- H cover
- I bottle/perfume (only old people have ever seen ink like that!)
- J vase
- K key -- hay, no confusion there!
- L tree
- M pour/glass
- N lace/fishnets/mesh
- O bird
- P art supply???
- Q cards/hearts
- R silver/jewlry
- S light/circle/storm
- T water/faucet/drop
- U shade
- V bottle/jug
- W drop/splash
- X hands/bones
- Y toy/string/cord
- Z undress/clothes
Single theme for alphabet images
Why not stick to a single theme (animals or household items) for the alphabet images? It would make it easier to identify even for non native speakers (eg: learning a second language). Also, when the alphabet is "translated" to other languages, having a common theme for the images would allow a more uniform look for the project.
In reply
A theme of animals was considered, but it introduces "foreign" animals such as Y - Yak. A problem with household objects is that in developing sountries, the implements in the hut are not so diverse. A similar look for different languages is at the bottom of the list of priorities in what we have discovered to be a highly over-constrained system. --Olpcme 14:51, 19 September 2006 (EDT)
Browsing Method
Although the alphabet is the traditional organisation method for print encyclopedias it has not been used on Wikipedia and I'm not convinced it is the best scheme to use for OEPC.
I suggest the main page have big navigation buttons, as here. The top row to include general functions
- Random Page,
- Browse by subject (leading to Category hierarchy),
- Search.
Below that buttons leading to Portal pages for each of the main subject areas:
- Science
- History
- The World
- Mathematics
- Biography
- Art and Culture
All with lots of hyper links so that you can arrive at the information you want via various routes.
In reply
We have four spare squares on the corners which could be used for alternative accesses. Possible buttons are "Surprise" "Subjects" "Search" and "Spare". The icons could have a flat diagonal design / \ and \ / to hint that these are different or be 3-D shaded as buttons to indicate that they are different. --Olpcme 14:50, 19 September 2006 (EDT)
Method to contain the full wikipedia
I had an idea - if I understand correctly these laptops will form a network and be helping routing each other's data. if that's the case - perhaps we can distribute the encyclopedia between them.
all laptops have the basic summary version of the encyclopedia as a fallback that always works - and also have random bits of the media and appendices so that together several laptops contain the full thing.
cheers, yair
English as first choice?
Taking into consideration the OLPC linguistic deployment environment, wouldn't it be more sensible to target a non-english language as a starter? Let's face it: in the developing world, english is either non-existant to the common people or used as the official language due to colonial legacy and lack of a 'dominant' local language. English is currently a lingua franca, but the OLPC (afaik) is focusing on basic education in local/native languages. IOW, you may speak quechua at home, schooling in spanish, and later (if lucky) you learn english 'as a second language' in high school or later.
If another language is chosen, instead of english, as the 'initial' language for the OEPC, I think that the effort will be more in line with the objectives. Looking at the (current) target population of the OLPC I see three interesting languages: portuguese, spanish and arabic, in three countries: Brazil, Argentina and Libya. Nigeria and Thailand are different cases.
Why are they 'interesting'?
- Portuguese is so because of sheer size and density: 190 million of brazilians organized under a single government - that is a vast population to satisfy, where 'economies of scale' could apply.
- Spanish is native/official to practically all of latin america (~350 million people excluding Brazil) and you start with 10% of it: Argentina (40 million) allowing to create a core and then reuse/expand/modify for country specifics; allowing for 'incremental development'.
- Arabic in the case of Libya could be similar to spanish/Argentina: initially 5.6 million people generating a core that may expand to cover ~300(+?) million globally. (Note: as far as I know, Arabic can be quite different among regions and countries, so this 'incremental' approach may not apply).
Another interesting side effect of not chosing english, is that it forces to think globally - or at least outside 'comfort zones'. Regardless of the good intentions and good faith; cultural, religious, political and other faux-pas are quite common when propagating or 'replicating' from one culture to another. Sometimes the language barrier helps to enforce the cultural barrier and avoids assumptions that lead to awkward moments or situations.
All that said, english is still a very good starting point! :) I just felt worthwhile noting that globally, english although very useful and a 'natural' starting point may not necessarily be the best first choice...--Xavi 13:39, 26 November 2006 (EST)
Portuguese fits with Orkut
There are huge numbers of Brazilians on-line, many of them using the Google's social networking site named Orkut. Set up an Orkut group to develop content for a Portuguese encyclopedia for OLPC and you will quickly have thousands of helping hands.
It is silly to think that translating an English encyclopedia will work. After all, Brazilian kids should learn about tapirs, not bears, guavas, not apples, the Itaipu dam, not the Hoover dam, etc.
Alternate images
How about the older more iconic versions of Key for K, Vase for V and Ring for R below? Yo-Yo still doesn't work for Y how about Yellowjacket, And clothes pin does not work well for P, Penguin works much better. All images below are Public Domain files.
Scripting - designing revision 2
This section is for discussion of the new revision two scripting.
Objectives:
- ...
Pipeline:
- obtain xml
- From Special:Export for small (maybe order 1000 or so) page sets.
- Maybe from dump files for larger sets.
- See a sample wikipedia XML dump by going to http://en.wikipedia.org/w/index.php?title=Special:Export
- Assemble xml back into wikitext
- fields needed:
- <title> - the page title
- <text> - page text (in wiki syntax)
- <timestamp> - for use in determining if this page needs to be upgraded (see below)
- ...
- modifiy wikitext
- template blacklist
- Here the pipeline forks. There are two paths:
- kid specified, runs on xo, low quality
- handwritten python wikitext->html conversion
- country specified, high quality
- use Save in a project subpage to render modifed wikitext on the original server.
- scrape resulting html and images
- extract core html, and rewrap with olpc "frame".
- kid specified, runs on xo, low quality
Re handwritten python wikitext->html conversion:
- Pilaf's InstaView js script is one existant prototype.
- OpenWetWare's Dewikify System
- Is there anything better out there, to use as tool or prototype?
- The conversion should be able to take the wikitext on Talk:One encyclopedia per child/sample_wiki_text and convert it to the html that mediawiki does. (This defines a set of unit tests that must be passed for the converter).
feature requests
break out the task into subtasks, such as the following:
- how to get xml dumps
- how to do screen scraping (may be deprecated, but a short description pointing to code)
- how to whitelist templates; the kinds of templates to watch out for. what they look like in wikitext, what they look like rendered
- how to take in a list of keywords
- an algorithm for finding other languages' articles from an original list... [via wikitext or html]
- perhaps these links are listed at the bottom of the english page?
- an algorithm for pruning or keeping external links (can be based on a preference)
- a script for making links local
- an algorithm for checking to see which targets exist and removing links whose targets do not
- a script for grabbing thumbnails and resized images from articles.
- option to only take the first N images from a page
- option to leave out images larger then X
- option to include the source high-res image
- a script for grabbing metadata about the image, and a way to store that locally (also image: pages? if so, fine; make sure the href around the image points to the right place)
- a script for getting author data -- say, sorting through all authors from a dump and ordering unique authors by name with links to their contrib history online (as wikitravel does)
- --or just grabbing the last 5 non-minor authors from the page history
- a snapshot-update script that doesn't repeat all of this, keeps the same keyword list, but checks to see which pages have changed since last grab
- sj/irc 2007jul21:
ideally take the english body text from schools-wikipedia add the external links section from the current en:wp if it exists and grab the wp page for other languages... be sure to capture the article's oldid for anything that comes from wp so we can selectively update just a few oldid's if we find things that should be removed. the script should know the difference between "get this list of oldid's, with a blacklist of id's that should be replaced with sth new" and "generate a set of articles from scratch, with the newest pageids" we would do the latter every major revision, and the former for minor updates.