Rainbow/DataStore Access
This page is a brain-storm page discussing how to implement the two basic access mode of the DataStore: read-only and write
- To prevent excessive copying of files, the datastore should have a way to provide access a specific file in the store to a specific instance of the Activity.
- Since we want to avoid API changes to the DS, we could tell activity authors that they need to do proper locking when writing using, e.g. POSIX 'fcntl(F_SETLK) and lockf()'. - MS, on behalf of JG
- All groups and users are the normal unix /etc/group and /etc/passwd users. Because we are going to be writing to them a lot, we need a locking mechanism!
- Locking is not needed if we reserve a range of uids and gids for sugar/rainbow's use and manage them atomically via a spool directory. - MS
- All installed activities get their own group called 'ActivityName', this will be used for file permissions per-activity settings (this ignores name clashes...)
- I desperately want to avoid race conditions when grabbing names. I'll explain one mechanism for accomplishing this below. -MS
- All activity instances get their own UID and GID. These will be between 10000 and 20000 and should for simplicity always MATCH. Unix requires names for users and groups, lets call them 'ActivityNNN' where Activity is the ActivityName (first 3 letters) and NNN is the UID#
- What do the activities use their unique gid for? - MS
- That is what can make the files private to the instance. - PS
- What do the activities use their unique gid for? - MS
Atomically update /etc/group
- Read /etc/group
- Write the new version as /etc/group.tmp
- Copy /etc/group to /etc/group.old
- Move /etc/group.tmp /etc/group
- A better mechanism is just to make /etc/group a symlink, then to atomically swing the symlink. - MS
Read-Only
Example: 'Write' needs access to a document during load:
- The Sugar shell creates UID/GID 10001 for the instance, and updates 'WriteGroup', adding Wri10001 to the WriteGroup
- The DataStore creates the tree:
user.group permissions file: olpc.olpc 755 /ds olpc.Wri10001 750 /ds/<instance-uid>/ hardlink: ln /home/olpc/..../file-in-ds.ext /ds/<instance-uid>/somefile.doc olpc.olpc 644 /ds/<instance-uid>/somefile.doc
- Sugar Shell invokes read_file()
- When read_file() returns, the DataStore sets permission 600 on somefile.doc and/or unlinks.
Problems to solve
- Clean up, this needs to be periodic / at startup (crashes happen...)
- Cleanup can be done by a cleaner-process that is forked off every few activity launches. - MS
- Two activities need read-only access to the same file, at the same time
- This can be handled by controlling access at the gate-dir and providing read access via the 'other' bits. - MS
- The datastore gets a write request while another activity is read-only-ing the same file.
- This is a bit ugly. A correct solution is to manually break the hardlink by having the DS copy the file to a new name, deleting the old link, then moving the copy into position. - MS
UID / GID Allocation Design
Several premises about how to use the filesystem.
- Directories are finite functions.
- Files are name-value bindings.
- Atomic updates are achieved by swinging symlinks, making hardlinks, renaming files, or by opening files O_EXCL | O_CREAT.
Say that we have been asked to start an instance of Paint-3.
This instance needs a restricted UID, an instance-specific GID, an activity-version specific GID, and perhaps some GIDs to access devices.
We need to acquire all these resources atomically with regard to other activity launches. Since we will be allocating these resources from reserved ranges, we are not overly concerned about contention from other sources (though we will endeavor to detect it).
We run some risk of racing with other processes that are manipulating /etc/passwd and /etc/group, but I don't know what we can do about this given the fundamentally non-composable access patterns that those files and the tools for manipulating them require.
Primitive Operations
We need two primitive operations:
- don't-care-reservation - we need to atomically reserve a value from a finite but we don't care what value is chosen - we only care that no-one else will reserve the same value.
- This can be implemented by repeatedly attempting to open files O_CREAT | O_EXCL in a reservation-dir for as long as we believe that an unreserved value exists. Efficient implementations exist for choosing small integers.
- first-customer-chooses - we need to define a value for a well-known name but don't care what value is chosen - we only care that everyone agrees on the choice that was made.
- This can be implemented by having each actor atomically reserve a value they like and then having them attempt to make a symlink at a well-known name in a reservation dir pointing to their chosen value-reservation. Actors who fail to write their symlink are guaranteed to be able to learn the value of the choice by reading the symlink that prevented their write.
Since no one cares what uid or what instance-specific gid we pick, don't-care-reservation is adequate to pick these values.
We can pick the activity-version specific and device-specific gids with the first-customer-chooses' pattern.
Once these values are picked, reverse indices can be created for client programs that would benefit from increased ease of access.
Filesystem Design and Resource Usage
UIDs SHALL be reserved in the range [10000, 60000]. GIDs SHALL be reserved in the range [10000, 60000].
The uid reservation dir SHALL be /var/spool/reservation/uid The gid reservation dir SHALL be /var/spool/reservation/gid The activity-version specific gid binding-dir SHALL be /var/spool/binding/activity_version_gid The device-specific gid binding dir SHALL be /var/spool/binding/device_gid