XS Blueprints:Datastore Simple Backup and Restore: Difference between revisions

From OLPC
Jump to navigation Jump to search
 
(18 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[Category:Software]]
[[Category:Developers]]
[[Category:Developers]]
[[Category:SchoolServer]]
[[Category:SchoolServer]]
Line 22: Line 21:
If it passes all of those tests the script running on the XO grabs a lock and asks the XS for permission to start a backup. If the XS is too busy, the process on the XO will retry a few times with exponential back-off between the attempts.
If it passes all of those tests the script running on the XO grabs a lock and asks the XS for permission to start a backup. If the XS is too busy, the process on the XO will retry a few times with exponential back-off between the attempts.


Once running, the backup process saves a copy of the full XO jounral. It uses rsync over SSH - the XS. This preserves the previous backup attempt, so we'll transfer incremental updates. An incomplete run can be completed by the next attempt. Once an rsync run completes successfully, the client runs a second, separate run to "touch" a flag-completed file on the XS, marking success.
Once running, the backup process saves a copy of the full XO journal. It uses rsync over SSH - the XS. This preserves the previous backup attempt, so we'll transfer incremental updates. An incomplete run can be completed by the next attempt. Once an rsync run completes successfully, the client runs a second, separate run to "touch" a flag-completed file on the XS, marking success.

See instructions below on how to restore files from the XS to the XO.


==XS side==
==XS side==
Line 40: Line 41:
Executed on cron, it
Executed on cron, it


* enforces a per-user "soft" quota - once over the quota, the oldest snapshots for the user are deleted
* Enforces a per-user "soft" quota. The quota for each XO is set by taking the size of the disk that holds the /library directory, and assuming that 70% is for backups. Then that space is divided by number of registered XOs on the XS. Once over the quota, the oldest snapshots for the user are deleted.
* for snapshots over a given age (1 month?) it only keeps one per month - removing intermediary snapshots for that user
* For snapshots over a given age (1 month?) it only keeps one per month - removing intermediary snapshots for that user
* it attempts to hardlink copies across users
* It attempts to hardlink copies across users


'''Important note:''' You must ensure that the XS has a sufficient quota to make a backup of the full Journal from each XO. If the XS cannot store one full backup of the each XO it will not backup. A rule of thumb is to ensure that the XS has 2 GB's available for each XO which will be backed up.
=Test plans and user walkthrough=


=Test plans and user walkthrough=
Joe - feel free to edit, expand and/or move this section to a different page as you see fit :-)


==Testing the backup run==
==Testing the backup run==
Line 54: Line 55:
# Create some documents on the XO - or have them created before registration.
# Create some documents on the XO - or have them created before registration.
# Wait until the backup run happens - it will be triggered once a day. How to recognize that it has happened?
# Wait until the backup run happens - it will be triggered once a day. How to recognize that it has happened?
#* On the XO, run `stat /home/sugar/.default/ds_backup-done` and look for the 'modified' time, which shows the last time it ran successfully. Check that the XO clock is set to GMT, and might be off-track. Try `TZ=America/New_York stat /home/sugar/.default/ds_backup-done` to see it in local time.
#* On the XO, run `stat /home/olpc/.sugar/default/ds-backup-done` and look for the 'modified' time, which shows the last time it ran successfully. Check that the XO clock is set to GMT, and might be off-track. Try `TZ=America/New_York stat /home/olpc/.sugar/default/ds-backup-done` to see it in local time.
#* To run the script asap, rm /home/sugar/.default/ds_backup-done -- the script should start within 30 minutes. '''Note''' that the script has a large random delay of up to 30 minutes!
#* To run the script asap, rm /home/olpc/.sugar/default/ds-backup-done -- the script should start within 30 minutes. '''Note''' that the script has a large random delay of up to 30 minutes!
#** If you don't want the first "within 30 minutes" wait, just run <tt>/usr/bin/ds-backup.sh</tt> from a Terminal activity (as the olpc user). You will still have to face the large random delay of up to 30 minutes.
#** If you don't want the first "within 30 minutes" wait, just run <tt>/usr/bin/ds-backup.sh</tt> from a Terminal activity (as the olpc user). You will still have to face the large random delay of up to 30 minutes.
#* To log the output of the script that runs the backup or to attempt to run it manually, see the file /etc/cron.d/ds-backup . '''Note''' that the script has a large random delay of up to 30 minutes!
#* To log the output of the script that runs the backup or to attempt to run it manually, see the file /etc/cron.d/ds-backup . '''Note''' that the script has a large random delay of up to 30 minutes!
#* On the XS, a successful registration will have created a directory `/library/users/<Serial Number>` - and each successful backup run creates a new directory under `/library/users/<Serial Number>/datastore`. The directories have a datestamp, and when the backup run completes successfully, a symlink is updated to point to the latest one (called "datastore-latest").
#* On the XS, a successful registration will have created a directory `/library/users/<Serial Number>` - and each successful backup run creates a new directory under `/library/users/<Serial Number>/datastore`. The directories have a datestamp, and when the backup run completes successfully, a symlink is updated to point to the latest one (called "datastore-latest").


==Restore a single document without Moodle==
==Restore a single document - XS 0.5.x ==


While we get the Moodle side of things sorted, we have a provisional restore UI to support this. This describes a temporary system which allows users to download the contents of the backup of any user (there is no authentication).
'''Note:''' this describes a temporary facility which allows users to download the contents of the backup of any user (there is no authentication).


# On the XO, ensure you are hooked up to the School Server network, and open Browse
# On the XO, ensure you are hooked up to the School Server network, and open Browse
Line 69: Line 70:
# You should see a listing of backup dates - pick a date
# You should see a listing of backup dates - pick a date
# You will see a listing of the documents available for that date, pick a document
# You will see a listing of the documents available for that date, pick a document
# Clicking on that document will download it, and it will appear in the Journal (and it may auto-open - need to check that ;-) )
# Clicking on that document will download it, and it will appear in the Journal. The restored Journal entry will be placed at the top of the Journal with a new date and time.



=TODOs and future work=


==Restore a single document with Moodle==
==Restore a single document with Moodle==


Note: this describes a future feature.
'''Note:''' this describes the workflow for XS 0.6 and newer, using Browse-101 or newer.


# On the XO, ensure you are hooked up to the School Server network, and open Browse
# On the XO, ensure you are hooked up to the School Server network, and open Browse
# Follow the link to the Schoolserver
# Follow the link to the Schoolserver
# On the top-right-corner you will see "You are logged in as 'Nickname'" - where follow the link in the nickname.
# On the top-right-corner you will see "You are logged in as 'Nickname'" - where follow the link in the nickname.
# The page shows several tabs - click on the "WebJournal" tab (see [[WebJournal_Project]])
# The page shows several tabs - click on the "Backup" tab (may later be renamed to 'WebJournal', see [[WebJournal_Project]])
# You should see the latest documents, and a link to "older journal entries"
# You should see
#* a message indicating when the latest backup completed
# Optional: If choosing from the older journal entries you will see a listing of backup dates - pick a date
#* a link to older backups
#* listing of Journal entries backedup, each with an indication of when they were created/edited
# Optional: If choosing from the older backups you will see a listing of backup dates - pick a date
# You will see a listing of the documents available for that date, pick a document
# You will see a listing of the documents available for that date, pick a document
# Clicking on that document will download it, and it will appear in the Journal (and it may auto-open - need to check that ;-) )
# Clicking on that document will download it, and it will appear in the Journal (and it may auto-open - need to check that ;-) )


==See Also==
* '''UPDATE [[Ds-backup]] (or deprecate it) when you make releases!'''

* [http://dev.laptop.org/ticket/7604 Automagic authentication & Moodle integration]
[[XS Blueprints:User account aliasing]]
* In order to be simple, the initial implementation does not cover a [http://dev.laptop.org/ticket/7605 "complete restore"] scenario, which requires more work

=TODOs and future work=

* Done: [http://dev.laptop.org/ticket/7604 Automagic authentication & Moodle integration]
* In order to be simple, the initial implementation does not cover a [http://dev.laptop.org/ticket/7605 "complete restore"] scenario, which requires more work.
** on the Sugar UI to trigger, display progress and manage (cancel/retry) a "complete restore"
** on the Sugar UI to trigger, display progress and manage (cancel/retry) a "complete restore"
** on the user aliasing that needs to take place in the "replaced laptop" scenario
** on the user aliasing that needs to take place in the "replaced laptop" scenario
* We need to test, time and tune the traffic control & backoff settings.
* We need to test, time and tune the traffic control & backoff settings.
* Must [http://dev.laptop.org/ticket/7606 confine rsync-over-ssh with a chroot jail or tight SELinux policies]
* Done: Must [http://dev.laptop.org/ticket/7606 confine rsync-over-ssh with a chroot jail or tight SELinux policies]
* To support better the "backup as extra storage" model
* To support better the "backup as extra storage" model
** Allow users to "pin" a resource to avoid it being deleted
** Allow users to "pin" a resource to avoid it being deleted
Line 100: Line 105:
* Extend into the [[WebJournal_Project]] concept that Robson Mendonça is working on.
* Extend into the [[WebJournal_Project]] concept that Robson Mendonça is working on.
* Some cron.d files could be swapped out and in depending on our power situation. This can probably save some juice...
* Some cron.d files could be swapped out and in depending on our power situation. This can probably save some juice...

''[[User:Skierpage]]) test of [[Semantic MediaWiki#For software features]] <br />
this feature requested by [[Requested by::OLPC Peru]]<br />
this feature is part of subsystem [[is part of::School server]]''
[[Category:Software features]]

Latest revision as of 04:03, 3 June 2010


At a school that has an XS in place, the Datastore Simple Backup (aka ds-backup) provides an automatic and invisible backup of the documents present in the Journal of each XO. This can be used for recovery of old documents if they have been deleted or overwritten and when the laptop is replaced or reflashed.

The storage of documents in the XS is also useful for other uses, such as a simple publishing mechanism.

A strong DS-Backup facility and good availability of the XS allows users to delete large documents to have space on their XOs, knowing that the XS will hold their docs.

Scenarios

  • Jim has deleted or changed a TurteArt activity he did last month, and now he wants it back to use it as a starting point for a new activity.
  • Jocinta's XO broke, and has been repaired and reflashed, so her documents are gone. She wants to retrieve them from the backup on the XS.

Implementation Notes

XO side

A script checks every 30 minutes whether it is appropriate to attempt a backup run. This script has a random sleep so that clients hit the XS staggered over time. The script checks whether we've completed a backup today, whether we can reach the XS, power status, and other factors.

If it passes all of those tests the script running on the XO grabs a lock and asks the XS for permission to start a backup. If the XS is too busy, the process on the XO will retry a few times with exponential back-off between the attempts.

Once running, the backup process saves a copy of the full XO journal. It uses rsync over SSH - the XS. This preserves the previous backup attempt, so we'll transfer incremental updates. An incomplete run can be completed by the next attempt. Once an rsync run completes successfully, the client runs a second, separate run to "touch" a flag-completed file on the XS, marking success.

See instructions below on how to restore files from the XS to the XO.

XS side

We have 3 processes

Traffic control

A simple mod_python script that checks of the XO is registered, and provides basic "traffic control" to keep the load from the backup processes from swamping the XS (and potentially the network).

Backup-complete script

When the client transfers the "flag completed" file to the XS, incrond (an inotify monitor) fires off an execution of `postprocess.py`, which makes a hardlinked copy of the just-transferred directory. It also updates the "datastore-latest" symlink to point to the latest snapshot.

Daily cleanup

Executed on cron, it

  • Enforces a per-user "soft" quota. The quota for each XO is set by taking the size of the disk that holds the /library directory, and assuming that 70% is for backups. Then that space is divided by number of registered XOs on the XS. Once over the quota, the oldest snapshots for the user are deleted.
  • For snapshots over a given age (1 month?) it only keeps one per month - removing intermediary snapshots for that user
  • It attempts to hardlink copies across users

Important note: You must ensure that the XS has a sufficient quota to make a backup of the full Journal from each XO. If the XS cannot store one full backup of the each XO it will not backup. A rule of thumb is to ensure that the XS has 2 GB's available for each XO which will be backed up.

Test plans and user walkthrough

Testing the backup run

  1. Start with an unregistered XO, register it with the XS - either over an Active Antenna mesh connection, or a regular AP wifi connection.
  2. After registration, you need to restart the XO (this is part of the XO side of the registration process as of build 708 / joyride 2121, might not be needed in later builds).
  3. Create some documents on the XO - or have them created before registration.
  4. Wait until the backup run happens - it will be triggered once a day. How to recognize that it has happened?
    • On the XO, run `stat /home/olpc/.sugar/default/ds-backup-done` and look for the 'modified' time, which shows the last time it ran successfully. Check that the XO clock is set to GMT, and might be off-track. Try `TZ=America/New_York stat /home/olpc/.sugar/default/ds-backup-done` to see it in local time.
    • To run the script asap, rm /home/olpc/.sugar/default/ds-backup-done -- the script should start within 30 minutes. Note that the script has a large random delay of up to 30 minutes!
      • If you don't want the first "within 30 minutes" wait, just run /usr/bin/ds-backup.sh from a Terminal activity (as the olpc user). You will still have to face the large random delay of up to 30 minutes.
    • To log the output of the script that runs the backup or to attempt to run it manually, see the file /etc/cron.d/ds-backup . Note that the script has a large random delay of up to 30 minutes!
    • On the XS, a successful registration will have created a directory `/library/users/<Serial Number>` - and each successful backup run creates a new directory under `/library/users/<Serial Number>/datastore`. The directories have a datestamp, and when the backup run completes successfully, a symlink is updated to point to the latest one (called "datastore-latest").

Restore a single document - XS 0.5.x

Note: this describes a temporary facility which allows users to download the contents of the backup of any user (there is no authentication).

  1. On the XO, ensure you are hooked up to the School Server network, and open Browse
  2. Follow the link to the Schoolserver
  3. Add "ds-restore" to the URL in the url bar, so that it reads http://schoolserver/ds-restore
  4. You should see a listing of backup dates - pick a date
  5. You will see a listing of the documents available for that date, pick a document
  6. Clicking on that document will download it, and it will appear in the Journal. The restored Journal entry will be placed at the top of the Journal with a new date and time.

Restore a single document with Moodle

Note: this describes the workflow for XS 0.6 and newer, using Browse-101 or newer.

  1. On the XO, ensure you are hooked up to the School Server network, and open Browse
  2. Follow the link to the Schoolserver
  3. On the top-right-corner you will see "You are logged in as 'Nickname'" - where follow the link in the nickname.
  4. The page shows several tabs - click on the "Backup" tab (may later be renamed to 'WebJournal', see WebJournal_Project)
  5. You should see
    • a message indicating when the latest backup completed
    • a link to older backups
    • listing of Journal entries backedup, each with an indication of when they were created/edited
  6. Optional: If choosing from the older backups you will see a listing of backup dates - pick a date
  7. You will see a listing of the documents available for that date, pick a document
  8. Clicking on that document will download it, and it will appear in the Journal (and it may auto-open - need to check that ;-) )

See Also

XS Blueprints:User account aliasing

TODOs and future work

  • Done: Automagic authentication & Moodle integration
  • In order to be simple, the initial implementation does not cover a "complete restore" scenario, which requires more work.
    • on the Sugar UI to trigger, display progress and manage (cancel/retry) a "complete restore"
    • on the user aliasing that needs to take place in the "replaced laptop" scenario
  • We need to test, time and tune the traffic control & backoff settings.
  • Done: Must confine rsync-over-ssh with a chroot jail or tight SELinux policies
  • To support better the "backup as extra storage" model
    • Allow users to "pin" a resource to avoid it being deleted
    • Teach the Journal to browse & request the backups transparently (WebDAV-based browsing?)
  • Extend into the WebJournal_Project concept that Robson Mendonça is working on.
  • Some cron.d files could be swapped out and in depending on our power situation. This can probably save some juice...

User:Skierpage) test of Semantic MediaWiki#For software features
this feature requested by Requested by::OLPC Peru
this feature is part of subsystem is part of::School server