Revision as of 19:25, 15 September 2008

At a school that has an XS in place, the Datastore Simple Backup (aka ds-backup) provides an automatic and invisible backup of the documents present in the Journal of each XO. This can be used for recovery of old documents if they have been deleted or overwritten and when the laptop is replaced or reflashed.

The storage of documents in the XS is also useful for other uses, such as a simple publishing mechanism.

A strong DS-Backup facility and good availability of the XS allows users to delete large documents to have space on their XOs, knowing that the XS will hold their docs.

Scenarios

Jim has deleted or changed a TurteArt activity he did last month, and now he wants it back to use it as a starting point for a new activity.
Jocinta's XO broke, and has been repaired and reflashed, so her documents are gone. She wants to retrieve them from the backup on the XS.

Implementation Notes

XO side

A script checks every 30 minutes whether it is appropriate to attempt a backup run. This script has a random sleep so that clients hit the XS staggered over time. The script checks whether we've completed a backup today, whether we can reach the XS, power status, and other factors.

If it passes all of those tests the script running on the XO grabs a lock and asks the XS for permission to start a backup. If the XS is too busy, the process on the XO will retry a few times with exponential back-off between the attempts.

Once running, the backup process saves a copy of the full XO journal. It uses rsync over SSH - the XS. This preserves the previous backup attempt, so we'll transfer incremental updates. An incomplete run can be completed by the next attempt. Once an rsync run completes successfully, the client runs a second, separate run to "touch" a flag-completed file on the XS, marking success.

See instructions below on how to restore files from the XS to the XO.

XS side

We have 3 processes

Traffic control

A simple mod_python script that checks of the XO is registered, and provides basic "traffic control" to keep the load from the backup processes from swamping the XS (and potentially the network).

Backup-complete script

When the client transfers the "flag completed" file to the XS, incrond (an inotify monitor) fires off an execution of `postprocess.py`, which makes a hardlinked copy of the just-transferred directory. It also updates the "datastore-latest" symlink to point to the latest snapshot.

Daily cleanup

Executed on cron, it

Enforces a per-user "soft" quota. The quota for each XO is set by taking the size of the disk

that holds the /library directory, and assuming that 70% is for backups. Then that space is divided by number of registered XOs on the XS. Once over the quota, the oldest snapshots for the user are deleted.

For snapshots over a given age (1 month?) it only keeps one per month - removing intermediary snapshots for that user
It attempts to hardlink copies across users

Test plans and user walkthrough

Testing the backup run

Start with an unregistered XO, register it with the XS - either over an Active Antenna mesh connection, or a regular AP wifi connection.
After registration, you need to restart the XO (this is part of the XO side of the registration process as of build 708 / joyride 2121, might not be needed in later builds).
Create some documents on the XO - or have them created before registration.
Wait until the backup run happens - it will be triggered once a day. How to recognize that it has happened?
- On the XO, run `stat /home/sugar/.default/ds_backup-done` and look for the 'modified' time, which shows the last time it ran successfully. Check that the XO clock is set to GMT, and might be off-track. Try `TZ=America/New_York stat /home/sugar/.default/ds_backup-done` to see it in local time.
- To run the script asap, rm /home/sugar/.default/ds_backup-done -- the script should start within 30 minutes. Note that the script has a large random delay of up to 30 minutes!
  - If you don't want the first "within 30 minutes" wait, just run /usr/bin/ds-backup.sh from a Terminal activity (as the olpc user). You will still have to face the large random delay of up to 30 minutes.
- To log the output of the script that runs the backup or to attempt to run it manually, see the file /etc/cron.d/ds-backup . Note that the script has a large random delay of up to 30 minutes!
- On the XS, a successful registration will have created a directory `/library/users/<Serial Number>` - and each successful backup run creates a new directory under `/library/users/<Serial Number>/datastore`. The directories have a datestamp, and when the backup run completes successfully, a symlink is updated to point to the latest one (called "datastore-latest").

Restore a single document

While we get the Moodle side of things sorted, we have a provisional restore UI to support this. This describes a temporary system which allows users to download the contents of the backup of any user (there is no authentication).

On the XO, ensure you are hooked up to the School Server network, and open Browse
Follow the link to the Schoolserver
Add "ds-restore" to the URL in the url bar, so that it reads http://schoolserver/ds-restore
You should see a listing of backup dates - pick a date
You will see a listing of the documents available for that date, pick a document
Clicking on that document will download it, and it will appear in the Journal

TODOs and future work

Restore a single document with Moodle

Note: this describes a future feature.

On the XO, ensure you are hooked up to the School Server network, and open Browse
Follow the link to the Schoolserver
On the top-right-corner you will see "You are logged in as 'Nickname'" - where follow the link in the nickname.
The page shows several tabs - click on the "WebJournal" tab (see WebJournal_Project)
You should see the latest documents, and a link to "older journal entries"
Optional: If choosing from the older journal entries you will see a listing of backup dates - pick a date
You will see a listing of the documents available for that date, pick a document
Clicking on that document will download it, and it will appear in the Journal (and it may auto-open - need to check that ;-) )

UPDATE Ds-backup (or deprecate it) when you make releases!
Automagic authentication & Moodle integration
In order to be simple, the initial implementation does not cover a "complete restore" scenario, which requires more work
- on the Sugar UI to trigger, display progress and manage (cancel/retry) a "complete restore"
- on the user aliasing that needs to take place in the "replaced laptop" scenario
We need to test, time and tune the traffic control & backoff settings.
Must confine rsync-over-ssh with a chroot jail or tight SELinux policies
To support better the "backup as extra storage" model
- Allow users to "pin" a resource to avoid it being deleted
- Teach the Journal to browse & request the backups transparently (WebDAV-based browsing?)
Extend into the WebJournal_Project concept that Robson Mendonça is working on.
Some cron.d files could be swapped out and in depending on our power situation. This can probably save some juice...

@@ Line 24: / Line 24: @@
 Once running, the backup process saves a copy of the full XO journal. It uses rsync over SSH - the XS. This preserves the previous backup attempt, so we'll transfer incremental updates. An incomplete run can be completed by the next attempt. Once an rsync run completes successfully, the client runs a second, separate run to "touch" a flag-completed file on the XS, marking success.
-To restore content, use the Browse activity to navigate
+See instructions below on how to restore files from the XS to the XO.
 ==XS side==

XS Blueprints:Datastore Simple Backup and Restore: Difference between revisions

Revision as of 19:25, 15 September 2008

Contents

Scenarios

Implementation Notes

XO side

XS side

Traffic control

Backup-complete script

Daily cleanup

Test plans and user walkthrough

Testing the backup run

Restore a single document

TODOs and future work

Restore a single document with Moodle

Navigation menu

XS Blueprints:Datastore Simple Backup and Restore: Difference between revisions

Revision as of 19:25, 15 September 2008

Scenarios

Implementation Notes

XO side

XS side

Traffic control

Backup-complete script

Daily cleanup

Test plans and user walkthrough

Testing the backup run

Restore a single document

TODOs and future work

Restore a single document with Moodle

Navigation menu

Search