Rainbow/DataStore Access: Difference between revisions
No edit summary |
|||
(12 intermediate revisions by 4 users not shown) | |||
Line 3: | Line 3: | ||
* To prevent excessive copying of files, the datastore should have a way to provide access a specific file in the store to a specific instance of the Activity. |
* To prevent excessive copying of files, the datastore should have a way to provide access a specific file in the store to a specific instance of the Activity. |
||
** ''Since we want to avoid API changes to the DS, we could tell activity authors that they need to do proper locking when writing using, e.g. POSIX 'fcntl(F_SETLK) and lockf()'.'' - MS, on behalf of JG |
** ''Since we want to avoid API changes to the DS, we could tell activity authors that they need to do proper locking when writing using, e.g. POSIX 'fcntl(F_SETLK) and lockf()'.'' - MS, on behalf of JG |
||
** Since most activity developers will be writing in Python, we should point them to [http://www.voidspace.org.uk/python/pathutils.html#file-locking this discussion of file locking in Python]. It includes a way to lock read-only files. |
|||
* All groups and users are the normal unix /etc/group and /etc/passwd users. Because we are going to be writing to them a lot, we need a locking mechanism! |
* All groups and users are the normal unix /etc/group and /etc/passwd users. Because we are going to be writing to them a lot, we need a locking mechanism! |
||
** ''Locking is not needed if we reserve a range of uids and gids for sugar/rainbow's use and manage them atomically via a spool directory.'' - MS |
** ''Locking is not needed if we reserve a range of uids and gids for sugar/rainbow's use and manage them atomically via a spool directory.'' - MS |
||
*** If you grab a pty pair (/dev/ptmx and /dev/pts/*), which is useful for logging anyway, you have a unique ID. Add an offset of 10000 to your pty number and you have a UID. PID allocation can also be used, with the PID coming from a helper process that does not change UID. - AC |
|||
* All installed activities get their own group called 'ActivityName', this will be used for file permissions per-activity settings (this ignores name clashes...) |
* All installed activities get their own group called 'ActivityName', this will be used for file permissions per-activity settings (this ignores name clashes...) |
||
** ''I desperately want to avoid race conditions when grabbing names. I'll explain one mechanism for accomplishing this below.'' -MS |
** ''I desperately want to avoid race conditions when grabbing names. I'll explain one mechanism for accomplishing this below.'' -MS |
||
Line 10: | Line 12: | ||
** ''What do the activities use their unique gid for?'' - MS |
** ''What do the activities use their unique gid for?'' - MS |
||
***''That is what can make the files private to the instance.'' - PS |
***''That is what can make the files private to the instance.'' - PS |
||
** It's perfectly fine to use a UID number that has no name. If there is no meaningful name, then just don't bother. The kernel knows nothing of these names, and nothing else cares all that much. - AC |
|||
***I as a developer care - BF |
|||
=== Atomically update /etc/group === |
=== Atomically update /etc/group === |
||
Line 17: | Line 21: | ||
# Move /etc/group.tmp /etc/group |
# Move /etc/group.tmp /etc/group |
||
** ''A better mechanism is just to make /etc/group a symlink, then to atomically swing the symlink.'' - MS |
** ''A better mechanism is just to make /etc/group a symlink, then to atomically swing the symlink.'' - MS |
||
*** No, that will get corrupted because it isn't standard. Multiple programs implement the logic necessary for updating these files. Fortunately though, you needn't bother at all. Linux is perfectly happy with users and groups that don't have names. - AC |
|||
== Read-Only == |
== Read-Only == |
||
Line 34: | Line 39: | ||
* Sugar Shell invokes read_file() |
* Sugar Shell invokes read_file() |
||
* When read_file() returns, the DataStore sets permission 600 on somefile.doc and/or unlinks. |
* When read_file() returns, the DataStore sets permission 600 on somefile.doc and/or unlinks. |
||
** Hard links are no good because they don't support atomic updates via rename. You need to hand over a directory containing just that one file. Hard links are prohibited on directories, but bind mounts will do the job. "mount --bind /some/src/dir /some/dst/dir" is about what you want. BTW, bind mounts can be made read-only, and can of course be made processs-specific via CLONE_NEWNS. - AC |
|||
=== Problems to solve === |
=== Problems to solve === |
||
Line 42: | Line 49: | ||
* The datastore gets a write request while another activity is read-only-ing the same file. |
* The datastore gets a write request while another activity is read-only-ing the same file. |
||
** ''This is a bit ugly. A correct solution is to manually break the hardlink by having the DS copy the file to a new name, deleting the old link, then moving the copy into position.'' - MS |
** ''This is a bit ugly. A correct solution is to manually break the hardlink by having the DS copy the file to a new name, deleting the old link, then moving the copy into position.'' - MS |
||
== Resource Allocation Design == |
|||
Several premises about how to use the filesystem. |
|||
# Directories are finite functions. |
|||
# Files are name-value bindings. |
|||
# Atomic updates are achieved by swinging symlinks, making hardlinks, renaming files, or by opening files O_EXCL | O_CREAT. |
|||
Say that we have been asked to start an instance of Paint-3. |
|||
This instance needs a restricted UID, an instance-specific GID, an activity-version specific GID, and perhaps some GIDs to access devices. |
|||
We need to acquire all these resources atomically with regard to other activity launches. Since we will be allocating these resources from reserved ranges, we are not overly concerned about contention from other sources (though we will endeavor to detect it). |
|||
''We run some risk of racing with other processes that are manipulating /etc/passwd and /etc/group, but I don't know what we can do about this given the fundamentally non-composable access patterns that those files and the tools for manipulating them require.'' |
|||
* Simply do not place the temporary UID and GID values into those files. It is not needed. - AC |
|||
=== Primitive Operations === |
|||
We need two primitive operations: |
|||
* '''don't-care-reservation''' - we need to atomically reserve a value from a finite but we don't care what value is chosen - we only care that no-one else will reserve the same value. |
|||
** This can be implemented by repeatedly attempting to open files O_CREAT | O_EXCL in a reservation-dir for as long as we believe that an unreserved value exists. Efficient implementations exist for choosing small integers. |
|||
** This can be done via pty allocation. - AC |
|||
** This can be done via PID allocation. Simply add 10000 to the PID of a process that will be watching over the temporary user. (probably the parent of the process that changes UID) By default this gives a range of 10300...42767, which can be reduced to 10300...19999 by putting kernel/pid_max=10000 into the /etc/sysctl.conf file. - AC |
|||
** This can be done via a SysV semaphore array or via POSIX semaphores. - AC |
|||
* '''first-customer-chooses''' - we need to define a value for a well-known name but don't care what value is chosen - we only care that everyone agrees on the choice that was made. |
|||
** This can be implemented by having each actor atomically reserve a value they like and then having them attempt to make a symlink at a well-known name in a reservation dir pointing to their chosen value-reservation. Actors who fail to write their symlink are guaranteed to be able to learn the value of the choice by reading the symlink that prevented their write. |
|||
Since no one cares what uid or what instance-specific gid we pick, ''don't-care-reservation'' is adequate to pick these values. |
|||
We can pick the activity-version specific and device-specific gids with the ''first-customer-chooses' pattern. |
|||
Once these values are picked, reverse indices can be created for client programs that would benefit from increased ease of access. |
|||
=== Filesystem Design and Resource Usage === |
|||
* UIDs SHALL be reserved in the range [10000, 60000]. |
|||
* GIDs SHALL be reserved in the range [10000, 60000]. |
|||
* The uid reservation dir SHALL be /activities/uid_pool |
|||
* The gid reservation dir SHALL be /activities/gid_pool |
|||
* The activity-version specific gid binding-dir SHALL be /activities/bundle_id_to_gid |
|||
* The device-specific GIDs SHALL be pre-allocated. |
|||
Reverse indices of the form |
|||
* /activities/uid_to_instance_dir |
|||
* /activities/gid_to_data_dir |
|||
SHALL be created. |
|||
== Access Control == |
|||
First, the entry-point receiving DS messages from activities needs to record the UID of the process sending the message. |
|||
The DS can then perform an access check and place the requested file in /activities/uid_to_instance_dir/<uid> if the check succeeds. |
|||
If the file is being made available read-only, the file should be owned by olpc/olpc with permissions 604 and can be hardlinked into place. |
|||
*There is no need to change permissions, causing nand wear. Bind mounts can be made read-only. Simply bind-mount a directory containing the file onto a directory where the activity can get at it. - AC |
|||
Permissions and copying/locking semantics required for making a file available RW are currently undefined. |
|||
* If the file is provided in a directory, then a rename can be used for atomic update. Any file descriptor which references the old file will continue to do so and be perfectly valid. - AC |
|||
== Resource Deallocation == |
|||
This has not been fully worked out. The basic idea is that resources become subject to garbage collection 30 minutes after they are reserved or after rebooting. |
|||
[[Category:Software]] |
|||
[[Category:Security]] |
Latest revision as of 05:57, 1 March 2008
This page is a brain-storm page discussing how to implement the two basic access mode of the DataStore: read-only and write
- To prevent excessive copying of files, the datastore should have a way to provide access a specific file in the store to a specific instance of the Activity.
- Since we want to avoid API changes to the DS, we could tell activity authors that they need to do proper locking when writing using, e.g. POSIX 'fcntl(F_SETLK) and lockf()'. - MS, on behalf of JG
- Since most activity developers will be writing in Python, we should point them to this discussion of file locking in Python. It includes a way to lock read-only files.
- All groups and users are the normal unix /etc/group and /etc/passwd users. Because we are going to be writing to them a lot, we need a locking mechanism!
- Locking is not needed if we reserve a range of uids and gids for sugar/rainbow's use and manage them atomically via a spool directory. - MS
- If you grab a pty pair (/dev/ptmx and /dev/pts/*), which is useful for logging anyway, you have a unique ID. Add an offset of 10000 to your pty number and you have a UID. PID allocation can also be used, with the PID coming from a helper process that does not change UID. - AC
- Locking is not needed if we reserve a range of uids and gids for sugar/rainbow's use and manage them atomically via a spool directory. - MS
- All installed activities get their own group called 'ActivityName', this will be used for file permissions per-activity settings (this ignores name clashes...)
- I desperately want to avoid race conditions when grabbing names. I'll explain one mechanism for accomplishing this below. -MS
- All activity instances get their own UID and GID. These will be between 10000 and 20000 and should for simplicity always MATCH. Unix requires names for users and groups, lets call them 'ActivityNNN' where Activity is the ActivityName (first 3 letters) and NNN is the UID#
- What do the activities use their unique gid for? - MS
- That is what can make the files private to the instance. - PS
- It's perfectly fine to use a UID number that has no name. If there is no meaningful name, then just don't bother. The kernel knows nothing of these names, and nothing else cares all that much. - AC
- I as a developer care - BF
- What do the activities use their unique gid for? - MS
Atomically update /etc/group
- Read /etc/group
- Write the new version as /etc/group.tmp
- Copy /etc/group to /etc/group.old
- Move /etc/group.tmp /etc/group
- A better mechanism is just to make /etc/group a symlink, then to atomically swing the symlink. - MS
- No, that will get corrupted because it isn't standard. Multiple programs implement the logic necessary for updating these files. Fortunately though, you needn't bother at all. Linux is perfectly happy with users and groups that don't have names. - AC
- A better mechanism is just to make /etc/group a symlink, then to atomically swing the symlink. - MS
Read-Only
Example: 'Write' needs access to a document during load:
- The Sugar shell creates UID/GID 10001 for the instance, and updates 'WriteGroup', adding Wri10001 to the WriteGroup
- The DataStore creates the tree:
user.group permissions file: olpc.olpc 755 /ds olpc.Wri10001 750 /ds/<instance-uid>/ hardlink: ln /home/olpc/..../file-in-ds.ext /ds/<instance-uid>/somefile.doc olpc.olpc 644 /ds/<instance-uid>/somefile.doc
- Sugar Shell invokes read_file()
- When read_file() returns, the DataStore sets permission 600 on somefile.doc and/or unlinks.
- Hard links are no good because they don't support atomic updates via rename. You need to hand over a directory containing just that one file. Hard links are prohibited on directories, but bind mounts will do the job. "mount --bind /some/src/dir /some/dst/dir" is about what you want. BTW, bind mounts can be made read-only, and can of course be made processs-specific via CLONE_NEWNS. - AC
Problems to solve
- Clean up, this needs to be periodic / at startup (crashes happen...)
- Cleanup can be done by a cleaner-process that is forked off every few activity launches. - MS
- Two activities need read-only access to the same file, at the same time
- This can be handled by controlling access at the gate-dir and providing read access via the 'other' bits. - MS
- The datastore gets a write request while another activity is read-only-ing the same file.
- This is a bit ugly. A correct solution is to manually break the hardlink by having the DS copy the file to a new name, deleting the old link, then moving the copy into position. - MS
Resource Allocation Design
Several premises about how to use the filesystem.
- Directories are finite functions.
- Files are name-value bindings.
- Atomic updates are achieved by swinging symlinks, making hardlinks, renaming files, or by opening files O_EXCL | O_CREAT.
Say that we have been asked to start an instance of Paint-3.
This instance needs a restricted UID, an instance-specific GID, an activity-version specific GID, and perhaps some GIDs to access devices.
We need to acquire all these resources atomically with regard to other activity launches. Since we will be allocating these resources from reserved ranges, we are not overly concerned about contention from other sources (though we will endeavor to detect it).
We run some risk of racing with other processes that are manipulating /etc/passwd and /etc/group, but I don't know what we can do about this given the fundamentally non-composable access patterns that those files and the tools for manipulating them require.
- Simply do not place the temporary UID and GID values into those files. It is not needed. - AC
Primitive Operations
We need two primitive operations:
- don't-care-reservation - we need to atomically reserve a value from a finite but we don't care what value is chosen - we only care that no-one else will reserve the same value.
- This can be implemented by repeatedly attempting to open files O_CREAT | O_EXCL in a reservation-dir for as long as we believe that an unreserved value exists. Efficient implementations exist for choosing small integers.
- This can be done via pty allocation. - AC
- This can be done via PID allocation. Simply add 10000 to the PID of a process that will be watching over the temporary user. (probably the parent of the process that changes UID) By default this gives a range of 10300...42767, which can be reduced to 10300...19999 by putting kernel/pid_max=10000 into the /etc/sysctl.conf file. - AC
- This can be done via a SysV semaphore array or via POSIX semaphores. - AC
- first-customer-chooses - we need to define a value for a well-known name but don't care what value is chosen - we only care that everyone agrees on the choice that was made.
- This can be implemented by having each actor atomically reserve a value they like and then having them attempt to make a symlink at a well-known name in a reservation dir pointing to their chosen value-reservation. Actors who fail to write their symlink are guaranteed to be able to learn the value of the choice by reading the symlink that prevented their write.
Since no one cares what uid or what instance-specific gid we pick, don't-care-reservation is adequate to pick these values.
We can pick the activity-version specific and device-specific gids with the first-customer-chooses' pattern.
Once these values are picked, reverse indices can be created for client programs that would benefit from increased ease of access.
Filesystem Design and Resource Usage
- UIDs SHALL be reserved in the range [10000, 60000].
- GIDs SHALL be reserved in the range [10000, 60000].
- The uid reservation dir SHALL be /activities/uid_pool
- The gid reservation dir SHALL be /activities/gid_pool
- The activity-version specific gid binding-dir SHALL be /activities/bundle_id_to_gid
- The device-specific GIDs SHALL be pre-allocated.
Reverse indices of the form
- /activities/uid_to_instance_dir
- /activities/gid_to_data_dir
SHALL be created.
Access Control
First, the entry-point receiving DS messages from activities needs to record the UID of the process sending the message.
The DS can then perform an access check and place the requested file in /activities/uid_to_instance_dir/<uid> if the check succeeds.
If the file is being made available read-only, the file should be owned by olpc/olpc with permissions 604 and can be hardlinked into place.
- There is no need to change permissions, causing nand wear. Bind mounts can be made read-only. Simply bind-mount a directory containing the file onto a directory where the activity can get at it. - AC
Permissions and copying/locking semantics required for making a file available RW are currently undefined.
- If the file is provided in a directory, then a rename can be used for atomic update. Any file descriptor which references the old file will continue to do so and be perfectly valid. - AC
Resource Deallocation
This has not been fully worked out. The basic idea is that resources become subject to garbage collection 30 minutes after they are reserved or after rebooting.