XS Blueprints:Datastore Simple Backup and Restore

From OLPC
Revision as of 14:03, 31 March 2009 by Martinlanghoff (talk | contribs)
Jump to navigation Jump to search


At a school that has an XS in place, the Datastore Simple Backup (aka ds-backup) provides an automatic and invisible backup of the documents present in the Journal of each XO. This can be used for recovery of old documents if they have been deleted or overwritten and when the laptop is replaced or reflashed.

The storage of documents in the XS is also useful for other uses, such as a simple publishing mechanism.

A strong DS-Backup facility and good availability of the XS allows users to delete large documents to have space on their XOs, knowing that the XS will hold their docs.

Scenarios

  • Jim has deleted or changed a TurteArt activity he did last month, and now he wants it back to use it as a starting point for a new activity.
  • Jocinta's XO broke, and has been repaired and reflashed, so her documents are gone. She wants to retrieve them from the backup on the XS.

Implementation Notes

XO side

A script checks every 30 minutes whether it is appropriate to attempt a backup run. This script has a random sleep so that clients hit the XS staggered over time. The script checks whether we've completed a backup today, whether we can reach the XS, power status, and other factors.

If it passes all of those tests the script running on the XO grabs a lock and asks the XS for permission to start a backup. If the XS is too busy, the process on the XO will retry a few times with exponential back-off between the attempts.

Once running, the backup process saves a copy of the full XO journal. It uses rsync over SSH - the XS. This preserves the previous backup attempt, so we'll transfer incremental updates. An incomplete run can be completed by the next attempt. Once an rsync run completes successfully, the client runs a second, separate run to "touch" a flag-completed file on the XS, marking success.

See instructions below on how to restore files from the XS to the XO.

XS side

We have 3 processes

Traffic control

A simple mod_python script that checks of the XO is registered, and provides basic "traffic control" to keep the load from the backup processes from swamping the XS (and potentially the network).

Backup-complete script

When the client transfers the "flag completed" file to the XS, incrond (an inotify monitor) fires off an execution of `postprocess.py`, which makes a hardlinked copy of the just-transferred directory. It also updates the "datastore-latest" symlink to point to the latest snapshot.

Daily cleanup

Executed on cron, it

  • Enforces a per-user "soft" quota. The quota for each XO is set by taking the size of the disk that holds the /library directory, and assuming that 70% is for backups. Then that space is divided by number of registered XOs on the XS. Once over the quota, the oldest snapshots for the user are deleted.
  • For snapshots over a given age (1 month?) it only keeps one per month - removing intermediary snapshots for that user
  • It attempts to hardlink copies across users

Important note: You must ensure that the XS has a sufficient quota to make a backup of the full Journal from each XO. If the XS cannot store one full backup of the each XO it will not backup. A rule of thumb is to ensure that the XS has 2 GB's available for each XO which will be backed up.

Test plans and user walkthrough

Testing the backup run

  1. Start with an unregistered XO, register it with the XS - either over an Active Antenna mesh connection, or a regular AP wifi connection.
  2. After registration, you need to restart the XO (this is part of the XO side of the registration process as of build 708 / joyride 2121, might not be needed in later builds).
  3. Create some documents on the XO - or have them created before registration.
  4. Wait until the backup run happens - it will be triggered once a day. How to recognize that it has happened?
    • On the XO, run `stat /home/olpc/.sugar/default/ds-backup-done` and look for the 'modified' time, which shows the last time it ran successfully. Check that the XO clock is set to GMT, and might be off-track. Try `TZ=America/New_York stat /home/olpc/.sugar/default/ds-backup-done` to see it in local time.
    • To run the script asap, rm /home/olpc/.sugar/default/ds-backup-done -- the script should start within 30 minutes. Note that the script has a large random delay of up to 30 minutes!
      • If you don't want the first "within 30 minutes" wait, just run /usr/bin/ds-backup.sh from a Terminal activity (as the olpc user). You will still have to face the large random delay of up to 30 minutes.
    • To log the output of the script that runs the backup or to attempt to run it manually, see the file /etc/cron.d/ds-backup . Note that the script has a large random delay of up to 30 minutes!
    • On the XS, a successful registration will have created a directory `/library/users/<Serial Number>` - and each successful backup run creates a new directory under `/library/users/<Serial Number>/datastore`. The directories have a datestamp, and when the backup run completes successfully, a symlink is updated to point to the latest one (called "datastore-latest").

Restore a single document - XS 0.5.x

Note: tis describes a temporary facility which allows users to download the contents of the backup of any user (there is no authentication).

  1. On the XO, ensure you are hooked up to the School Server network, and open Browse
  2. Follow the link to the Schoolserver
  3. Add "ds-restore" to the URL in the url bar, so that it reads http://schoolserver/ds-restore
  4. You should see a listing of backup dates - pick a date
  5. You will see a listing of the documents available for that date, pick a document
  6. Clicking on that document will download it, and it will appear in the Journal. The restored Journal entry will be placed at the top of the Journal with a new date and time.

Restore a single document with Moodle

Note: this describes the workflow for XS 0.6 and newer, using Browse-101 or newer.

  1. On the XO, ensure you are hooked up to the School Server network, and open Browse
  2. Follow the link to the Schoolserver
  3. On the top-right-corner you will see "You are logged in as 'Nickname'" - where follow the link in the nickname.
  4. The page shows several tabs - click on the "Backup" tab (may later be renamed to 'WebJournal', see WebJournal_Project)
  5. You should see
    • a message indicating when the latest backup completed
    • a link to older backups
    • listing of Journal entries backedup, each with an indication of when they were created/edited
  6. Optional: If choosing from the older backups you will see a listing of backup dates - pick a date
  7. You will see a listing of the documents available for that date, pick a document
  8. Clicking on that document will download it, and it will appear in the Journal (and it may auto-open - need to check that ;-) )

TODOs and future work

  • Automagic authentication & Moodle integration
  • In order to be simple, the initial implementation does not cover a "complete restore" scenario, which requires more work
    • on the Sugar UI to trigger, display progress and manage (cancel/retry) a "complete restore"
    • on the user aliasing that needs to take place in the "replaced laptop" scenario
  • We need to test, time and tune the traffic control & backoff settings.
  • Must confine rsync-over-ssh with a chroot jail or tight SELinux policies
  • To support better the "backup as extra storage" model
    • Allow users to "pin" a resource to avoid it being deleted
    • Teach the Journal to browse & request the backups transparently (WebDAV-based browsing?)
  • Extend into the WebJournal_Project concept that Robson Mendonça is working on.
  • Some cron.d files could be swapped out and in depending on our power situation. This can probably save some juice...

User:Skierpage) test of Semantic MediaWiki#For software features
this feature requested by Requested by::OLPC Peru
this feature is part of subsystem is part of::School server