Nepal:Redundancy: Difference between revisions
No edit summary |
|||
Line 2: | Line 2: | ||
== The Basic Heuristics for Redundancy == |
== The Basic Heuristics for Redundancy == |
||
(a) Determine the minimum setup and identify all components that can fail individually. |
(a) Determine the minimum setup and identify all components that can fail individually. <br> |
||
(b) For each failure, what would be the impact to the entire system. |
(b) For each failure, what would be the impact to the entire system. <br> |
||
(c) Determine an N+1 or N+2 configuration that might address the concern. |
(c) Determine an N+1 or N+2 configuration that might address the concern. <br> |
||
Minimum setup: |
|||
150 x XOs<br> |
|||
XS w/3 x Active Antennas<br> |
|||
Squid Server<br> |
|||
Wireless AP/Router<br> |
|||
Failure cases: <br> |
|||
1 - XO fails <br> |
|||
Recovery Actions <br> |
|||
- Restore image via USB <br> |
|||
- Restore user generated content from XS? <br> |
|||
Impact During Downtime <br> |
|||
Single student down for some time and may need admin help.<BR> |
|||
2 - XS fails completely or offline <br> |
|||
Recovery Actions <br> |
|||
- Manually bring online backup XS with identical image. <br> |
|||
Impact During Downtime <br> |
|||
- Mesh stays up?<br> |
|||
- Off the Internet <br> |
|||
- All activities local (eToys only, no web sites) <br> |
|||
3 - Squid box fails <br> |
|||
Recovery Actions <br> |
|||
- Update XS routing table to reach internet directly? <br> |
|||
- Bring up backup Squid box? <br> |
|||
- Shutdown internet access but leave moodle online? <br> |
|||
Impact During Downtime <br> |
|||
- All internet offline <br> |
|||
- Admin intervention needed <br> |
|||
- Moodle and local eToys only <br> |
|||
4 - Wireless AP/Router Fails |
|||
Recovery Actions <br> |
|||
- Backup Wireless AP/Router? <br> |
|||
- Connect XS or Squid box directly to DSL Modem? <br> |
|||
Impact During Downtime <br> |
|||
- All internet offline <br> |
|||
- Admin intervention needed <br> |
|||
- Moodle and local eToys only <br> |
|||
5 - Mesh overload until mesh offline <br> |
|||
Recovery Actions <br> |
|||
- Take down mesh (how?) <br> |
|||
- Associate XOs directly with wireless AP/router (how? prebuilt script or kids click on something?) |
|||
- Change routing to point wireless AP back to Squid and XS before going over WAN |
|||
Impact During Downtime <br> |
|||
- XOs offline from each other and the internet |
|||
- Admin intervention needed <br> |
|||
- Local eToys only <br> |
|||
== Individual XO's == |
== Individual XO's == |
Revision as of 20:51, 15 February 2008
This page is meant to layout redundancy plans for Nepal's spring pilot of OLPC. See the Nepal page for more details on the pilot.
The Basic Heuristics for Redundancy
(a) Determine the minimum setup and identify all components that can fail individually.
(b) For each failure, what would be the impact to the entire system.
(c) Determine an N+1 or N+2 configuration that might address the concern.
Minimum setup:
150 x XOs
XS w/3 x Active Antennas
Squid Server
Wireless AP/Router
Failure cases:
1 - XO fails
Recovery Actions
- Restore image via USB
- Restore user generated content from XS?
Impact During Downtime
Single student down for some time and may need admin help.
2 - XS fails completely or offline
Recovery Actions
- Manually bring online backup XS with identical image.
Impact During Downtime
- Mesh stays up?
- Off the Internet
- All activities local (eToys only, no web sites)
3 - Squid box fails
Recovery Actions
- Update XS routing table to reach internet directly?
- Bring up backup Squid box?
- Shutdown internet access but leave moodle online?
Impact During Downtime
- All internet offline
- Admin intervention needed
- Moodle and local eToys only
4 - Wireless AP/Router Fails
Recovery Actions
- Backup Wireless AP/Router?
- Connect XS or Squid box directly to DSL Modem?
Impact During Downtime
- All internet offline
- Admin intervention needed
- Moodle and local eToys only
5 - Mesh overload until mesh offline
Recovery Actions
- Take down mesh (how?)
- Associate XOs directly with wireless AP/router (how? prebuilt script or kids click on something?)
- Change routing to point wireless AP back to Squid and XS before going over WAN
Impact During Downtime
- XOs offline from each other and the internet
- Admin intervention needed
- Local eToys only
Individual XO's
- LiveCD+USB w/ correct image and settings
- ?Possible to restore over the network?
- Need way to XS_backup_restore backup and restore individual student files
- Need extra XO's for teachers, at least N + 1 where N is the # of teachers
- How many extra XO's for kids?
Active Antennas
- Need 3 antennas
- 1 active antenna per 100 students
- 2 antennas in use
?How many clients can an active antenna support?
School Server
There should be two School Servers, one for the 2nd grade class, and one for the 6th grade class. They should mirror each other.
- Disk Failure
- LiveCD + USB stick
- Possibly use Fedora's LVM for disk mirroring
- CPU failure
- Have spare cpu fan on hand
- Have spare School Server on hand
- System Backups?
Library Server
- Need backup Library server that mirrors the production Library Server
NOTE: The Library Server will be in a centralized location
Internet Connection
Need some kind of commitment from local ISP for both support and service levels
Power
Monitoring
- Nagios for remote monitoring of Internet connection?
- Another tool to report system usage for the school server? ZENOSS?
Tony Pearson has contributed extensively to this plan.