Afghanistan School Server

Introduction

This page is an overview of the School Server work that has been done in Afghanistan. Our objective is to maximise the content available in limited connectivity situations and to minimize bandwidth requirements (if any).

For that we have a few key systems:

Webdump - Based on HTTrack - is a small PHP script designed to be run on remote servers. It copies websites for offline use and then zips them so that they can be downloaded in one .tar.gz file . We can select some good educational websites and then keep local copies on each school server.

Nutch - An Open Source search engine based on Lucene by Apache - it can build a search index of all the content that is on the school server including HTML files, media files, PDFs, etc.

Joomla - this is designed to make a simple school portal. Teachers can make announcements, school news, galleries etc.

This stuff is mostly work in progress!!! Documentation is also somewhat work in progress... We will as soon as possible put a school server running on the Internet so that everyone can try it out...

Webdump

Docs to be put here....

Nutch

Rough Notes here. FYI Localizing is not quite 100% straightforward - please look in the Nutch Wiki where I have made notes on this topic...

Extract in /opt
Make a blank text file in /opt/nutch-1.0/urls e.g. /opt/nutch-1.0/urls/schoolserver with the line http://schoolserver/afbase/ - This is a list of URLs to scan to begin the crawl
Add +http://schoolserver/ to /opt/nutch-1.0/conf/crawl-urlfilter.txt
Copy nutch-1.0.war to apache-tomcat-dir/webapps
- Under the extracted folder in webapps modify the properties to point to the crawl folder - see http://zillionics.com/resources/articles/NutchGuideForDummies.htm
Modify conf/nutch-site.xml to include crawler properties - see http://zillionics.com/resources/articles/NutchGuideForDummies.htm
Modify conf/nutch-default.xml and change plugin.includes to have msword and pdf - see http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09580.html -
Modify conf/nutch-default.xml so that the entire page gets downloaded - change all content.limit properties

<property>
  <name>http.content.limit</name>
  <value>-1</value>
  <description>The length limit for downloaded content, in bytes.
  If this value is nonnegative (>=0), content longer than it will be truncated;
  otherwise, no truncation at all.
  </description>
</property>

Afghanistan School Server

Introduction

Webdump

Nutch

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

About OLPC

About the laptop

About the tablet

Projects

OLPC wiki

Tools