XS backup restore: Difference between revisions

From OLPC
Jump to navigation Jump to search
(New page: <pre> XO wants to initiate a backup. In the following text, all timestamps are integers representing seconds elapsed since the UNIX epoch. XO side ------- 1. Issue a HTTP GET to XS wi...)
 
(wikify)
Line 1: Line 1:

<pre>
XO wants to initiate a backup.
= XO initiating a backup =


In the following text, all timestamps are integers representing seconds
In the following text, all timestamps are integers representing seconds
Line 6: Line 6:




== XO side ==


1. Issue a HTTP GET to XS with path
XO side
/backup/''<protocol version>''/last/''<this_XO_serial_number>''
-------

1. Issue a HTTP GET to XS with path
/backup/<protocol version>/last/<this_XO_serial_number>
<protocol version> is the integer representing the latest
''<protocol version>'' is the integer representing the latest
backup protocol version supported by this XO. In protocol version 1,
backup protocol version supported by this XO. In protocol version 1,
a successful reply consists of two comma-separated integers:
a successful reply consists of two comma-separated integers:
timestamp -- timestamp of latest backed up item for this user
'''timestamp''' -- timestamp of latest backed up item for this user
or 0 if there are no previous backups
or 0 if there are no previous backups

nonce -- a random 64-bit integer
'''nonce''' -- a random 64-bit integer

If the sent protocol version is not supported by the school server,
If the sent protocol version is not supported by the school server,
it will return a 404 not found error, whose only body contents is
it will return a 404 not found error, whose only body contents is
a comma-separated list of integers representing the backup protocol
a comma-separated list of integers representing the backup protocol
versions supported by this school server.
versions supported by this school server.

If this school server refuses to provide backup service for this XO,
If this school server refuses to provide backup service for this XO,
it will return a 403 forbidden error.
it will return a 403 forbidden error.

2. If the request in step 1 succeeded, go to step 3. Otherwise, if none
2. If the request in '''step 1''' succeeded, go to '''step 3'''. Otherwise, if none
of the backup system versions on the XO (multiple may be
of the backup system versions on the XO (multiple may be
present) are in the 'versions' variable listed in the 404 error, abort
present) are in the 'versions' variable listed in the 404 error, abort
Line 36: Line 34:
a version was returned that also exists locally, go back to step 1
a version was returned that also exists locally, go back to step 1
and use that protocol version.
and use that protocol version.

3. Let to_backup_all be the collection of _all_ items currently in the
3. Let ''to_backup_all'' be the collection of '''all''' items currently in the
XO's datastore. If returned timestamp in step 1 is 0, let to_backup
XO's datastore. If returned timestamp in step 1 is 0, let ''to_backup''
be the same collection.
be the same collection.
If not 0, let to_backup be the collection of all items whose
If not 0, let ''to_backup'' be the collection of all items whose
timestamp is greater _or equal_ to the returned timestamp.
timestamp is greater '''or equal''' to the returned timestamp.

4. Write out a plaintext index of all items in to_backup, where the
4. Write out a plaintext index of all items in ''to_backup'', where the
index format is defined by the backup protocol version selected in
index format is defined by the backup protocol version selected in
step 2. For version 1, I propose a list of lines where the first
step 2. For version 1, I propose a list of lines where the first
Line 51: Line 49:
entry in to_backup (metadata and filename). This list may include
entry in to_backup (metadata and filename). This list may include
references to other files (e.g. thumbnails) as part of the metadata.
references to other files (e.g. thumbnails) as part of the metadata.

Move this index in the datastore directory to a file called
Move this index in the datastore directory to a file called
'backup.idx' overwriting an old such file if present.
'''backup.idx''' overwriting an old such file if present.

5. Write out a plaintext index of all items in to_backup_all where
5. Write out a plaintext index of all items in '''to_backup_all''' where
the index format is defined by the backup protocol version selected
the index format is defined by the backup protocol version selected
in step 2. For version 1, I propose a list of lines where the first
in step 2. For version 1, I propose a list of lines where the first
Line 61: Line 59:
each following line is just a UUID of each object in the list
each following line is just a UUID of each object in the list
(meaning currently on the XO).
(meaning currently on the XO).

Move this index in the datastore directory to a file called
Move this index in the datastore directory to a file called
'backup-state.idx' overwriting an old such file if present.
'''backup-state.idx''' overwriting an old such file if present.
6. For every item in to_backup, also write a line to a text file
6. For every item in to_backup, also write a line to a text file
called 'backup-files.idx' in the datastore directory, overwriting
called '''backup-files.idx''' in the datastore directory, overwriting
the old file. Each output line contains only the full path to the
the old file. Each output line contains only the full path to the
binary data for each object. For objects that have additional binary
binary data for each object. For objects that have additional binary
files associated (such as thumbnails), output an additional line
files associated (such as thumbnails), output an additional line
per file.
per file.

7. Run rsync, telling it to read the list of input files
7. Run rsync, telling it to read the list of input files
from 'backup-files.idx' and write to a directory called 'backup-new/'
from '''backup-files.idx''' and write to a directory called '''backup-new/'''
in the user's home directory on the school server.
in the user's home directory on the school server.

Check the exit value from rsync. If non-zero, retry step 7 up to 3 times.
Check the exit value from rsync. If non-zero, retry step 7 up to 3 times.
If still non-zero, abort until next backup. Otherwise, proceed to step 8.
If still non-zero, abort until next backup. Otherwise, proceed to step 8.

8. Issue a GET request to the XS, with path /backup/<protocol
8. Issue a GET request to the XS, with path /backup/''<protocol version>''/new/''<XO_serial_number>''.
version>/new/<XO_serial_number>. For protocol version 1, include a
For protocol version 1, include a '''Backup-Auth''' header, whose contents is the hex-digest output of
SHA-1(''<nonce>''+''<XO_UUID>''), where ''<nonce>'' is the value received in
'Backup-Auth' header, whose contents is the hex-digest output of
SHA-1(<nonce>+<XO_UUID>), where 'nonce' is the value received in
step 1, and XO_UUID is this XO's UUID.
step 1, and XO_UUID is this XO's UUID.



XS side
== XS side ==
-------


=== request for '''last''' ===


On the school server, when getting a request for
On the school server, when getting a request for
/backup/<protocol version>/last/<SN>:
'''/backup/''<protocol version>''/last/''<SN>''''':

1. Check if we support the protocol version. If not, return 404 and a list
1. Check if we support the protocol version. If not, return 404 and a list
of supported versions. Otherwise, proceed.
of supported versions. Otherwise, proceed.
==== protocol version 1 ====


2. Check if we know this machine (can find it in our registration DB on
-- Everything below describes protocol version 1:

2. Check if we know this machine (can find it in our registration DB on
the XS). If not, return 403. We will not offer it backup service.
the XS). If not, return 403. We will not offer it backup service.
Otherwise, proceed.
Otherwise, proceed.
3. Check if backups for this machine exist. In protocol version 1, if
3. Check if backups for this machine exist. In protocol version 1, if
backups don't exist, let timestamp be 0. Otherwise, find the
backups don't exist, let timestamp be 0. Otherwise, find the
timestamp of the last backed-up object for this machine and return
timestamp of the last backed-up object for this machine and return
it.
it.

(I deliberately don't specify where the school server stores the timestamp,
(I deliberately don't specify where the school server stores the timestamp,
as it might use mysql/sqlite/plain files for this, and the XO doesn't
as it might use mysql/sqlite/plain files for this, and the XO doesn't
and must not care.)
and must not care.)

4. If backups for this machine don't exist yet, let nonce be
4. If backups for this machine don't exist yet, let nonce be
0. Otherwise, find the file 'nonce' in the backup hierarchy for
0. Otherwise, find the file '''nonce''' in the backup hierarchy for
this XO, e.g. /backups/<SN>/nonce and load its contents into the
this XO, e.g. /backups/''<SN>''/nonce and load its contents into the
variable nonce.
variable '''nonce'''.

5. Return comma-separated timestamp and nonce in the body of a 200 OK
5. Return comma-separated timestamp and nonce in the body of a 200 OK
response.
response.


=== request for '''new''' ===


On the school server, when getting a request for
On the school server, when getting a request for
/backup/<protocol version>/new/<SN>:
'''/backup/''<protocol version>''/new/''<SN>''''':

1. Check if we support the protocol version. If not, return 404 and a list
1. Check if we support the protocol version. If not, return 404 and a list
of supported versions. Otherwise, proceed.
of supported versions. Otherwise, proceed.


-- Everything below describes protocol version 1:
==== protocol version 1 ====


2. If no 'Backup-Auth' header is present, return 403, otherwise
2. If no 'Backup-Auth' header is present, return 403, otherwise
proceed.
proceed.
3. Load the contents of the 'nonce' file from the backup hierarchy for
3. Load the contents of the '''nonce''' file from the backup hierarchy for
this XO (e.g. /backups/<SN>/nonce) in the nonce variable. If there
this XO (e.g. /backups/''<SN>''/nonce) in the nonce variable. If there
is no nonce file, use '0' for the nonce variable.
is no nonce file, use '0' for the nonce variable.

4. Find the XO's UUID in the local database, load into XO_UUID
4. Find the XO's UUID in the local database, load into '''XO_UUID'''
variable. Verify that the contents of the 'Backup-Auth' header
variable. Verify that the contents of the '''Backup-Auth''' header
match exactly the contents of SHA1(<nonce>+<XO_UUID>). If not,
match exactly the contents of SHA1(''<nonce>''+''<XO_UUID>''). If not,
return 403, otherwise return empty (no body) 200 OK request to the
return 403, otherwise return empty (no body) 200 OK request to the
client and proceed to next step.
client and proceed to next step.

(Note: the nonce circus is required to keep a malicious actor
(Note: the nonce circus is required to keep a malicious actor
from inhibiting all backups on his network by watching for /last
from inhibiting all backups on his network by watching for /last
Line 149: Line 146:
server would invalidate the backup, as will be seen in the following
server would invalidate the backup, as will be seen in the following
steps.)
steps.)
5. Spawn an updater process in the background that does this:
5. Spawn an updater process in the background that does this:

5.1. Issue a call to a setuid helper command that makes the
5.1. Issue a call to a setuid helper command that makes the
'backup-new' folder in the XO's home directory (on the server)
'backup-new' folder in the XO's home directory (on the server)
Line 157: Line 154:
5.2. Check if a file exists in the XO's home directory, within the
5.2. Check if a file exists in the XO's home directory, within the
dir 'backup-new', called 'backup.idx.processing'. If the file
dir '''backup-new''', called '''backup.idx.processing'''. If the file
does not exist, go to step 5.3.
does not exist, go to step '''5.3'''.
If its timestamp is NOT older than 10 minutes, exit the
If its timestamp is NOT older than 10 minutes, exit the
updater. (We don't allow users to force us to do index
updater. (We don't allow users to force us to do index
updates for backups more frequently than once in 10 minutes.)
updates for backups more frequently than once in 10 minutes.)
If the timestamp is older than 10 minutes:
If the timestamp is older than 10 minutes:

* Check if a file called 'backup.idx.processing.pid' exists
* Check if a file called '''backup.idx.processing.pid''' exists
AND is owned by us. If so, read its contents -- it contains
AND is owned by us. If so, read its contents -- it contains
a PID of the updater that tried to deal with the new backup
a PID of the updater that tried to deal with the new backup
-- and issue a SIGKILL to that PID.
-- and issue a SIGKILL to that PID.

* Go to step 5.4.
* Go to step '''5.4'''.
5.3. Move 'backup.idx' to 'backup.idx.processing' and write our own
5.3. Move '''backup.idx''' to '''backup.idx.processing''' and write our own
PID to 'backup.idx.processing.pid'. If the move
PID to '''backup.idx.processing.pid'''. If the move
fails (because 'backup.idx' doesn't exist), go to last step.
fails (because '''backup.idx''' doesn't exist), go to last step.
Check if 'backup-state.idx' exists. If not, go to last step.
Check if 'backup-state.idx' exists. If not, go to last step.

5.4. Read 'backup.idx.processing' by line. The first line is a
5.4. Read '''backup.idx.processing''' by line. The first line is a
single backup protocol integer. If this updater doesn't
single backup protocol integer. If this updater doesn't
support this version, the client sent a backup even though we
support this version, the client sent a backup even though we
told it not to. Go to last step.
told it not to. Go to last step.

For every following line, check that the object filename it
For every following line, check that the object filename it
references exists in the 'backup-new' folder. If it exists,
references exists in the '''backup-new''' folder. If it exists,
move this file to the server's real backup hierarchy,
move this file to the server's real backup hierarchy,
e.g. /backups/<SN>/ and add a record to the server's backup DB
e.g. /backups/''<SN>''/ and add a record to the server's backup DB
backend (whatever it is) for this object. If the file doesn't
backend (whatever it is) for this object. If the file doesn't
exist, move to next line.
exist, move to next line.
5.5. Move '''backup-state.idx''' to server backup hierarchy, e.g.
/backup/''<SN>''/backup-state.idx. Generate a 64-bit nonce
and write it out to /backup/''<SN>''/nonce.
5.6. Delete everything in '''backup-new''' and exit the updater.


5.5. Move 'backup-state.idx' to server backup hierarchy, e.g.
/backup/<SN>/backup-state.idx. Generate a 64-bit nonce
and write it out to /backup/<SN>/nonce.


= XO initiates a restore =
5.6. Delete everything in 'backup-new' and exit the updater.



----------------------------------------------------------------------


XO wants to do a restore.

On the XO side:
1. Issue a HTTP GET to the XS with path
/backup/<protocol version>/restore/<this_XO_serial_number>

-- Everything below describes protocol version 1:


== XO side ==
1. Issue a HTTP GET to the XS with path
/backup/''<protocol version>''/restore/''<this_XO_serial_number>''
-- Everything below describes protocol version 1:
The response is 0 or a single absolute path on the XS, pointing to
The response is 0 or a single absolute path on the XS, pointing to
the location of this XO's backup files in the backup hierarchy. If
the location of this XO's backup files in the backup hierarchy. If
the response is 0, abort and report to user; there are no backups
the response is 0, abort and report to user; there are no backups
to restore. Otherwise store the 'path' variable for future use.
to restore. Otherwise store the ''path'' variable for future use.

If the request returns a 500, abort and report to user that they
If the request returns a 500, abort and report to user that they
must pick out restore files individually from the web interface.
must pick out restore files individually from the web interface.

If the request returns a 503, wait 1 minute, then retry step 1,
If the request returns a 503, wait 1 minute, then retry step 1,
otherwise proceed.
otherwise proceed.

2. Let variable index_path be the concatenation of the path variable
2. Let variable index_path be the concatenation of the path variable
from step 1 and the string 'restore.idx'. Rsync the file whose path
from step 1 and the string '''restore.idx'''. Rsync the file whose path
is index_path from the XS to the XO. This file is a set of lines
is index_path from the XS to the XO. This file is a set of lines
formatted like the contents of a 'backup.idx' file -- produced
formatted like the contents of a '''backup.idx''' file -- produced
exactly like in step 4 of the XO-side backup. In other words, the
exactly like in step 4 of the XO-side backup. In other words, the
first line repeats the protocol version, and every next line
first line repeats the protocol version, and every next line
describes a single DS object. If the rsync fails, retry 3 times;
describes a single DS object. If the rsync fails, retry 3 times;
if still failing, abort restore and report to user.
if still failing, abort restore and report to user.

3. For every item in this list, parse out any paths to files,
3. For every item in this list, parse out any paths to files,
and write each one (e.g. one for the binary object, one for the
and write each one (e.g. one for the binary object, one for the
thumbnail) to a local file called 'restore-files.idx', one per line.
thumbnail) to a local file called '''restore-files.idx''', one per line.

Note that the paths contained in 'restore.idx' that we
Note that the paths contained in 'restore.idx' that we
received in step 2 are absolute paths _on the schoolserver_,
received in step 2 are absolute paths '''on the schoolserver''',
e.g. /backups/<SN>/<filename>, and those paths MUST be preserved
e.g. /backups/''<SN>''/''<filename>'', and those paths MUST be preserved
when writing to 'restore-files.idx'.
when writing to '''restore-files.idx'''.

4. Run a rsync on the XO, going from the schoolserver to the XO, and
4. Run a rsync on the XO, going from the schoolserver to the XO, and
pass it 'restore-files.idx' as the list of files to rsync.
pass it '''restore-files.idx''' as the list of files to rsync.

5. Check the rsync exit value. If non-zero, retry 3 times. If still
5. Check the rsync exit value. If non-zero, retry 3 times. If still
non-zero, abort and report failure to the user.
non-zero, abort and report failure to the user.

6. Go back through the list received in step 2 line by line. For
6. Go back through the list received in step 2 line by line. For
every file path in the current line (there might be several for
every file path in the current line (there might be several for
e.g. binary object, thumbnail, etc), strip everything except the
e.g. binary object, thumbnail, etc), strip everything except the
filename -- remove the directory components. Verify that the files
filename -- remove the directory components. Verify that the files
exist locally on the XO.
exist locally on the XO.

If they don't exist, rsync didn't get all the files back, but there
If they don't exist, rsync didn't get all the files back, but there
should have been some (because we didn't get 0 for timestamp in
should have been some (because we didn't get 0 for timestamp in
step 1) AND rsync thinks it succeeded (because of step 5). Abort
step 1) AND rsync thinks it succeeded (because of step 5). Abort
restore and report to user that something is wrong.
restore and report to user that something is wrong.

If the files exist, issue a request to the DS to create the object
If the files exist, issue a request to the DS to create the object
based on the metadata in the line, and pass in stripped file paths
based on the metadata in the line, and pass in stripped file paths
for the contents/thumbnail.
for the contents/thumbnail.

(Note: if the DS does not support setting creation timestamps or
(Note: if the DS does not support setting creation timestamps or
thumbnails through the present API, another function might have to
thumbnails through the present API, another function might have to
be added specifically for the restore system to use, where such
be added specifically for the restore system to use, where such
functions are allowed.)
functions are allowed.)

7. If the last line in the list returned in step 1 is processed and
7. If the last line in the list returned in step 1 is processed and
stored in the DS, we have succeeded with the restore. Inform
stored in the DS, we have succeeded with the restore. Inform
user. Eat some ice cream. Do the macarena.
user. '''Eat some ice cream. Do the macarena.'''
== XS side ==


=== request for '''restore''' ===


On the school server, when getting a request for
On the school server, when getting a request for
/backup/<protocol version>/restore/<SN>:
/backup/''<protocol version>''/restore/''<SN>'':


1. Check if we support the protocol version. If not, return 404 and a list
1. Check if we support the protocol version. If not, return 404 and a list
of supported versions. Otherwise, proceed.
of supported versions. Otherwise, proceed.
==== protocol v1 ====


2. Check if backups for this machine exist. If not, return 200 OK whose
-- Everything below describes protocol version 1:
only body contents is '''0'''. Otherwise, proceed.

2. Check if backups for this machine exist. If not, return 200 OK whose
3. Check if a file called '''restore.idx''' exists in the backup hierarchy
only body contents is '0'. Otherwise, proceed.

3. Check if a file called 'restore.idx' exists in the backup hierarchy
for the XO. If so, return absolute path to this XO's files in the
for the XO. If so, return absolute path to this XO's files in the
server backup hierarchy (e.g. /backups/<SN>/) as sole body of a 200
server backup hierarchy (e.g. /backups/''<SN>''/) as sole body of a 200
OK response. If it doesn't exist, proceed.
OK response. If it doesn't exist, proceed.
4. Check if a file called 'restore-state.idx' in the backup hierarchy
4. Check if a file called '''restore-state.idx''' in the backup hierarchy
for the XO exists. If not, return error 500. For some reason we don't
for the XO exists. If not, return error 500. For some reason we don't
have a state file for this machine; this shouldn't happen, but it means
have a state file for this machine; this shouldn't happen, but it means
the user has to pick out objects to restore individually from the
the user has to pick out objects to restore individually from the
web interface.
web interface.
5. Return 503 service unavailable, and in the background, spawn
5. Return 503 service unavailable, and in the background, spawn
a restore process that does the following:
a restore process that does the following:

5.1. Check if a file called 'restore-state.idx.processing' in the
5.1. Check if a file called '''restore-state.idx.processing''' in the
backup hierarchy for this XO exists. If not, proceed to next
backup hierarchy for this XO exists. If not, proceed to next
step. If it exists, and its timestamp is older than 10
step. If it exists, and its timestamp is older than 10
minutes, we tried to prepare a restore list for this machine
minutes, we tried to prepare a restore list for this machine
already and somehow failed (e.g. database timeouts,
already and somehow failed (e.g. database timeouts,
etc). Check if 'restore-state.idx.processing.pid' exists and
etc). Check if '''restore-state.idx.processing.pid''' exists and
is owned by us; if so, load its contents and send SIGKILL
is owned by us; if so, load its contents and send SIGKILL
to the PID, then move to step 5.3. If the timestamp is
to the PID, then move to step '''5.3'''. If the timestamp is
younger than 10 minutes, exit.
younger than 10 minutes, exit.
5.2. Move 'restore-state.idx' in the XO's backup hierarchy to
5.2. Move '''restore-state.idx''' in the XO's backup hierarchy to
'restore-state.processing.idx'
'''restore-state.processing.idx'''

5.3. Write our own PID to 'restore-state.processing.idx.pid'.
5.3. Write our own PID to '''restore-state.processing.idx.pid'''.

5.4. To a temporary file, write a line containing the backup
5.4. To a temporary file, write a line containing the backup
protocol version.
protocol version.

5.5. For each line in 'restore-state.processing.idx'
5.5. For each line in '''restore-state.processing.idx'''
(representing a UUID), query all the relevant metadata from
(representing a UUID), query all the relevant metadata from
the XS store and write it, one JSON dictionary-encoded line
the XS store and write it, one JSON dictionary-encoded line
Line 320: Line 317:
the XS. If any queries fail, retry with a timeout, and if
the XS. If any queries fail, retry with a timeout, and if
failure continues, exit the updater.
failure continues, exit the updater.

5.6. When finished, move temporary file to 'restore.idx' in the
5.6. When finished, move temporary file to '''restore.idx''' in the
backup hierarchy for this XO. Unlink
backup hierarchy for this XO. Unlink
'restore-state.processing.idx'.
'''restore-state.processing.idx'''.
</pre>

Revision as of 19:09, 17 October 2007

XO initiating a backup

In the following text, all timestamps are integers representing seconds elapsed since the UNIX epoch.


XO side

1. Issue a HTTP GET to XS with path 
  /backup/<protocol version>/last/<this_XO_serial_number>
  
  <protocol version> is the integer representing the latest
  backup protocol version supported by this XO. In protocol version 1,
  a successful reply consists of two comma-separated integers:
  
      timestamp -- timestamp of latest backed up item for this user                                    
                   or 0 if there are no previous backups

      nonce -- a random 64-bit integer

  If the sent protocol version is not supported by the school server,
  it will return a 404 not found error, whose only body contents is 
  a comma-separated list of integers representing the backup protocol
  versions supported by this school server.

  If this school server refuses to provide backup service for this XO,
  it will return a 403 forbidden error.

2. If the request in step 1 succeeded, go to step 3. Otherwise, if none
  of the backup system versions on the XO (multiple may be
  present) are in the 'versions' variable listed in the 404 error, abort
  until next scheduled backup time (we cannot back up to this XS). If
  a version was returned that also exists locally, go back to step 1
  and use that protocol version.

3. Let to_backup_all be the collection of all items currently in the
  XO's datastore. If returned timestamp in step 1 is 0, let to_backup
  be the same collection.
  
  If not 0, let to_backup be the collection of all items whose
  timestamp is greater or equal to the returned timestamp.

4. Write out a plaintext index of all items in to_backup, where the
  index format is defined by the backup protocol version selected in
  step 2. For version 1, I propose a list of lines where the first
  line is a single integer stating the backup protocol version, and
  each following line is a JSON-encoded list describing a single
  entry in to_backup (metadata and filename). This list may include
  references to other files (e.g. thumbnails) as part of the metadata.

  Move this index in the datastore directory to a file called
  backup.idx overwriting an old such file if present.

5. Write out a plaintext index of all items in to_backup_all where
  the index format is defined by the backup protocol version selected
  in step 2. For version 1, I propose a list of lines where the first
  line is a single integer stating the backup protocol version, and
  each following line is just a UUID of each object in the list
  (meaning currently on the XO).

  Move this index in the datastore directory to a file called
  backup-state.idx overwriting an old such file if present.
  
6. For every item in to_backup, also write a line to a text file
  called backup-files.idx in the datastore directory, overwriting
  the old file.  Each output line contains only the full path to the
  binary data for each object. For objects that have additional binary
  files associated (such as thumbnails), output an additional line
  per file.

7. Run rsync, telling it to read the list of input files
  from backup-files.idx and write to a directory called backup-new/
  in the user's home directory on the school server.
 
  Check the exit value from rsync. If non-zero, retry step 7 up to 3 times.
  If still non-zero, abort until next backup. Otherwise, proceed to step 8.
 
8. Issue a GET request to the XS, with path /backup/<protocol version>/new/<XO_serial_number>. 
  For protocol version 1, include a Backup-Auth header, whose contents is the hex-digest output of
  SHA-1(<nonce>+<XO_UUID>), where <nonce> is the value received in
  step 1, and XO_UUID is this XO's UUID.
   

XS side

request for last

On the school server, when getting a request for /backup/<protocol version>/last/<SN>:

1. Check if we support the protocol version. If not, return 404 and a list
  of supported versions. Otherwise, proceed.

protocol version 1

2. Check if we know this machine (can find it in our registration DB on
  the XS). If not, return 403. We will not offer it backup service.
  Otherwise, proceed.
  
3. Check if backups for this machine exist. In protocol version 1, if
  backups don't exist, let timestamp be 0. Otherwise, find the
  timestamp of the last backed-up object for this machine and return
  it.

  (I deliberately don't specify where the school server stores the timestamp,
  as it might use mysql/sqlite/plain files for this, and the XO doesn't 
  and must not care.)

4. If backups for this machine don't exist yet, let nonce be
  0. Otherwise, find the file nonce in the backup hierarchy for
  this XO, e.g.  /backups/<SN>/nonce and load its contents into the
  variable nonce.

5. Return comma-separated timestamp and nonce in the body of a 200 OK
  response.
  

request for new

On the school server, when getting a request for /backup/<protocol version>/new/<SN>:

1. Check if we support the protocol version. If not, return 404 and a list
  of supported versions. Otherwise, proceed.

protocol version 1

2. If no 'Backup-Auth' header is present, return 403, otherwise
  proceed.
  
3. Load the contents of the nonce file from the backup hierarchy for
  this XO (e.g. /backups/<SN>/nonce) in the nonce variable. If there
  is no nonce file, use '0' for the nonce variable.

4. Find the XO's UUID in the local database, load into XO_UUID
  variable. Verify that the contents of the Backup-Auth header
  match exactly the contents of SHA1(<nonce>+<XO_UUID>). If not,
  return 403, otherwise return empty (no body) 200 OK request to the
  client and proceed to next step.

  (Note: the nonce circus is required to keep a malicious actor
  from inhibiting all backups on his network by watching for /last
  GETs, then issuing /new gets 5 seconds later for the same XO. As
  the backup won't have completed, getting an updater running on the
  server would invalidate the backup, as will be seen in the following
  steps.)
   
5. Spawn an updater process in the background that does this:

  5.1. Issue a call to a setuid helper command that makes the
       'backup-new' folder in the XO's home directory (on the server)
       writable by the updater UID.
  
  5.2. Check if a file exists in the XO's home directory, within the
       dir backup-new, called backup.idx.processing. If the file
       does not exist, go to step 5.3.
       
       If its timestamp is NOT older than 10 minutes, exit the
       updater.  (We don't allow users to force us to do index
       updates for backups more frequently than once in 10 minutes.)
        
       If the timestamp is older than 10 minutes:

       * Check if a file called backup.idx.processing.pid exists
         AND is owned by us. If so, read its contents -- it contains
         a PID of the updater that tried to deal with the new backup
         -- and issue a SIGKILL to that PID.

       * Go to step 5.4.
      
  5.3. Move backup.idx to backup.idx.processing and write our own
       PID to backup.idx.processing.pid. If the move
       fails (because backup.idx doesn't exist), go to last step.
       Check if 'backup-state.idx' exists.  If not, go to last step.

  5.4. Read backup.idx.processing by line. The first line is a
       single backup protocol integer. If this updater doesn't
       support this version, the client sent a backup even though we
       told it not to. Go to last step.

       For every following line, check that the object filename it
       references exists in the backup-new folder. If it exists,
       move this file to the server's real backup hierarchy,
       e.g. /backups/<SN>/ and add a record to the server's backup DB
       backend (whatever it is) for this object.  If the file doesn't
       exist, move to next line.

  5.5. Move backup-state.idx to server backup hierarchy, e.g.
       /backup/<SN>/backup-state.idx. Generate a 64-bit nonce
       and write it out to /backup/<SN>/nonce.

  5.6. Delete everything in backup-new and exit the updater.


XO initiates a restore

XO side

1. Issue a HTTP GET to the XS with path
  /backup/<protocol version>/restore/<this_XO_serial_number>

-- Everything below describes protocol version 1:

  The response is 0 or a single absolute path on the XS, pointing to
  the location of this XO's backup files in the backup hierarchy. If
  the response is 0, abort and report to user; there are no backups
  to restore. Otherwise store the path variable for future use.

  If the request returns a 500, abort and report to user that they
  must pick out restore files individually from the web interface.

  If the request returns a 503, wait 1 minute, then retry step 1,
  otherwise proceed.

2. Let variable index_path be the concatenation of the path variable
  from step 1 and the string restore.idx. Rsync the file whose path
  is index_path from the XS to the XO.  This file is a set of lines
  formatted like the contents of a backup.idx file -- produced
  exactly like in step 4 of the XO-side backup.  In other words, the
  first line repeats the protocol version, and every next line
  describes a single DS object. If the rsync fails, retry 3 times;
  if still failing, abort restore and report to user.

3. For every item in this list, parse out any paths to files,
  and write each one (e.g. one for the binary object, one for the
  thumbnail) to a local file called restore-files.idx, one per line.

  Note that the paths contained in 'restore.idx' that we
  received in step 2 are absolute paths on the schoolserver,
  e.g. /backups/<SN>/<filename>, and those paths MUST be preserved
  when writing to restore-files.idx.

4. Run a rsync on the XO, going from the schoolserver to the XO, and
  pass it restore-files.idx as the list of files to rsync.

5. Check the rsync exit value. If non-zero, retry 3 times. If still
  non-zero, abort and report failure to the user.

6. Go back through the list received in step 2 line by line.  For
  every file path in the current line (there might be several for
  e.g. binary object, thumbnail, etc), strip everything except the
  filename -- remove the directory components. Verify that the files
  exist locally on the XO.

  If they don't exist, rsync didn't get all the files back, but there
  should have been some (because we didn't get 0 for timestamp in
  step 1) AND rsync thinks it succeeded (because of step 5). Abort
  restore and report to user that something is wrong.

  If the files exist, issue a request to the DS to create the object
  based on the metadata in the line, and pass in stripped file paths
  for the contents/thumbnail.

  (Note: if the DS does not support setting creation timestamps or
  thumbnails through the present API, another function might have to
  be added specifically for the restore system to use, where such
  functions are allowed.)

7. If the last line in the list returned in step 1 is processed and
  stored in the DS, we have succeeded with the restore. Inform
  user. Eat some ice cream. Do the macarena.

 

XS side

request for restore

On the school server, when getting a request for /backup/<protocol version>/restore/<SN>:

1. Check if we support the protocol version. If not, return 404 and a list
  of supported versions. Otherwise, proceed.

protocol v1

2. Check if backups for this machine exist. If not, return 200 OK whose
  only body contents is 0. Otherwise, proceed.

3. Check if a file called restore.idx exists in the backup hierarchy
  for the XO. If so, return absolute path to this XO's files in the
  server backup hierarchy (e.g. /backups/<SN>/) as sole body of a 200
  OK response. If it doesn't exist, proceed.
   
4. Check if a file called restore-state.idx in the backup hierarchy
  for the XO exists. If not, return error 500. For some reason we don't
  have a state file for this machine; this shouldn't happen, but it means
  the user has to pick out objects to restore individually from the
  web interface.
   
5. Return 503 service unavailable, and in the background, spawn
  a restore process that does the following:

    5.1. Check if a file called restore-state.idx.processing in the
         backup hierarchy for this XO exists. If not, proceed to next
         step. If it exists, and its timestamp is older than 10
         minutes, we tried to prepare a restore list for this machine
         already and somehow failed (e.g. database timeouts,
         etc). Check if restore-state.idx.processing.pid exists and
         is owned by us; if so, load its contents and send SIGKILL
         to the PID, then move to step 5.3. If the timestamp is
         younger than 10 minutes, exit.
     
    5.2. Move restore-state.idx in the XO's backup hierarchy to
         restore-state.processing.idx

    5.3. Write our own PID to restore-state.processing.idx.pid.

    5.4. To a temporary file, write a line containing the backup
         protocol version.

    5.5. For each line in restore-state.processing.idx
         (representing a UUID), query all the relevant metadata from
         the XS store and write it, one JSON dictionary-encoded line
         per object, to the temporary file. Paths of any referenced
         files (binary objects, thumbnails) must be absolute paths on
         the XS. If any queries fail, retry with a timeout, and if
         failure continues, exit the updater.

    5.6. When finished, move temporary file to restore.idx in the
         backup hierarchy for this XO. Unlink
         restore-state.processing.idx.