Nepal:Redundancy

From OLPC
Revision as of 16:54, 15 February 2008 by 64.102.254.33 (talk) (One View of Failure Cases - GS)
Jump to: navigation, search

This page is meant to layout redundancy plans for Nepal's spring pilot of OLPC. See the Nepal page for more details on the pilot.

The Basic Heuristics for Redundancy

(a) Determine the minimum setup and identify all components that can fail individually.
(b) For each failure, what would be the impact to the entire system.
(c) Determine an N+1 or N+2 configuration that might address the concern.

One View of Failure Cases - GS

Minimum setup: 150 x XOs
XS w/3 x Active Antennas
Squid Server
Wireless AP/Router

Failure cases:

1 - XO fails
Recovery Actions
- Restore image via USB
- Restore user generated content from XS?

Impact During Downtime
Single student down for some time and may need admin help.

2 - XS fails completely or offline
Recovery Actions
- Manually bring online backup XS with identical image.

Impact During Downtime
- Mesh stays up?
- Off the Internet
- All activities local (eToys only, no web sites)

3 - Squid box fails
Recovery Actions
- Update XS routing table to reach internet directly?
- Bring up backup Squid box?
- Shutdown internet access but leave moodle online?

Impact During Downtime
- All internet offline
- Admin intervention needed
- Moodle and local eToys only

4 - Wireless AP/Router Fails Recovery Actions
- Backup Wireless AP/Router?
- Connect XS or Squid box directly to DSL Modem?

Impact During Downtime
- All internet offline
- Admin intervention needed
- Moodle and local eToys only

5 - Mesh overload until mesh offline
Recovery Actions
- Take down mesh (how?)
- Associate XOs directly with wireless AP/router (how? prebuilt script or kids click on something?)
-- If wireless/AP takes over, change wireless router gateway to go back to Squid and XS before going over WAN

Impact During Downtime
- XOs offline from each other and the internet - Admin intervention needed
- Local eToys only

Individual XO's

  • LiveCD+USB w/ correct image and settings
  •  ?Possible to restore over the network?
  • Need way to XS_backup_restore backup and restore individual student files
  • Need extra XO's for teachers, at least N + 1 where N is the # of teachers
  • How many extra XO's for kids?

Active Antennas

  • Need 3 antennas
  • 1 active antenna per 100 students
  • 2 antennas in use

?How many clients can an active antenna support?


School Server

There should be two School Servers, one for the 2nd grade class, and one for the 6th grade class. They should mirror each other.

  • Disk Failure
    • LiveCD + USB stick
    • Possibly use Fedora's LVM for disk mirroring
  • CPU failure
    • Have spare cpu fan on hand
    • Have spare School Server on hand
  • System Backups?

Library Server

  • Need backup Library server that mirrors the production Library Server

NOTE: The Library Server will be in a centralized location


Internet Connection

Need some kind of commitment from local ISP for both support and service levels

Power

Monitoring

  • Nagios for remote monitoring of Internet connection?
  • Another tool to report system usage for the school server? ZENOSS?

Tony Pearson has contributed extensively to this plan.