Concurrent activity instances

From OLPC
Jump to navigation Jump to search

Situation: #5476

Browse is a web-browsing activity that is currently implemented on top of Mozilla's gecko and nss libraries. Unfortunately, these libraries create files in a profile directory with very restrictive permissions. This means that Browse may work beautifully the first time it is run and may fail miserably on the second when it finds itself unable to manipulate shared resources like its webcache, certificate database, and so on.

General Solutions

There are several ways to solve this problem, including:

  1. Patch the Mozilla libraries so that they respect our choice of umask.
  2. Patch the Mozilla libraries by adding a configuration option that indicates appropriate file creation permissions.
  3. Use an LD_PRELOAD wrapper to impose appropriate permissions on the libraries.
  4. Use the libraries as-is and attempt to clean up the permissions after the fact, either in the activity itself or in the shell.
  5. Create an adapter library which, under the control of the activity, will synchronize sanitized versions of the profile directory with other activity instances.

General Plan

Solutions which can be sent upstream are clearly preferable in the long term but we are unsure of how to approach the upstream maintainers.

In the mean time, we intend to use the adapter library approach because we think it is the safest choice (i.e. the least prone to race conditions, privilege-misuse, and unintended side-effects) available that will satisfy our schedule.

Nevertheless, there are many choices to make when designing an adapter library. In particular - which of the attributes of 'atomic', 'consistent', 'isolated', and 'durable', are necessary? Consistency means that you never see half-a-profile. Durability means, roughly, that profiles don't get carelessly overwritten.

Atomicity and consistency are fairly easy to achieve by themselves with careful manipulation of the filesystem. Isolation and durability require more thought because they require that contention be either resolved (e.g. by merging) or avoided (e.g. by careful naming).

At present, we believe that atomicity and consistency are sufficient for solving Browse's permissions issues because Browse already uses a locking scheme to attempt to ensure that only one Browse process will be running at the same time. In more detail: Browse's locking scheme should allow us to avoid having to merge profiles by forcefully ensuring that updates are made to only one profile at a time.

Detailed Solution

  1. Each instance of Browse, when it starts, should:
    1. create a tmpdir in $SAR/data with permissions at least as permissive as 0770. We'll call this $NEW_PROFILE.
    2. check for the existence of a shared profile located at $SAR/data/profile.
      1. If found, the new instance of Browse should shallow-copy this directory into $SAR/instance/profile.
      2. If not found, the new instance of Browse should initialize a clean profile in $SAR/instance/profile.
  2. Having constructed $SAR/instance/profile, the instance of Browse should begin running.
  3. From time to time (e.g. when the foreground activity changes) and upon graceful shutdown, an instance of Browse should update the shared profile.
    1. First, it should shallow-copy the contents of $SAR/instance/profile into a tmpdir in $SAR/instance which I shall call $STAGING_AREA.
    2. Next, it should change permissions on all the files and directories in $STAGING_AREA to allow manipulation by the instances' primary group.
    3. Then it should clone the contents of the $STAGING_AREA into $NEW_PROFILE.
    4. Finally, it should atomically swing the symlink $SAR/data/profile to point to $NEW_PROFILE.
  4. From time to time, instances of Browse should perform a garbage-collection step in order to reclaim the resources used by old revisions of the profile. It is safe to collect a directory $D in $SAR/data when two conditions hold:
    1. No processes are running under the uid owning $D.
    2. $SAR/data/profile does not point to $D.
    3. (NB: processes performing garbage collection should defend against the possibility of encountering directories with restricted permissions.)


Failures

Unfortunately, Marco thinks that Browse's web-cache is too big to copy around each time we start the activity. Michael can't tell from a cursory inspection of the web-cache code (in src/netwerk/cache in a xulrunner checkout) whether it is safe to hardlink the files instead of copying them.

Since this is a blocking bug against Update.1, we decided to de-isolate Browse until we find a workable solution. (We also started talking to Moz. about the problem (specifically with Boriz Zbarsky and Benjamin Smedberg) but we haven't got any solid fixes yet.) (Michael is also pondering how Rainbow could know that all the Web-based activities are supposed to be run under one (uid != 500))