Talk:Activity bundles

From OLPC
Revision as of 14:41, 9 April 2008 by Homunq (talk | contribs) (From the top)
Jump to: navigation, search

How to embed external applications?

From the LWN interview with Jim Gettys:

Chris' team is putting together a python based environment 
(into which conventional applications can be embedded) 
aimed at young children, temporarily called "sugar" (...)

How does this "embedding" work? How would a simple python "wrapper" script look like that embeds a simple X app?

Bundle activities in filesystem images?

Why not pack the entire bundle into a "disk image", e.g., using (z)isofs or cramfs? Hence, one application would be exactly contained within one file (similar to AppDirs, but one file instead of one directory).

This image would be mounted at the runtime, and unmounted again after usage has ended.

That way, the bundle becomes more "robust" (in the sense that it is read-only and cannot alter itself), plus it can be more easily exchanged (e.g., via mail/wlan) without the need to re-package it.

The impact of the additional overhead on speed is minimal, as my tests have shown.

I am in the process of providing a demo implementation. Some ideas are also provided on http://klik.atekon.de/wiki/index.php/OLPC -- [probono]

Bundle versioning? Bundle member versioning?

Will component members of a bundle be versioned?

If a computer joins an activity, and discovers that it has an earlier bundle in place (rather than a missing bundle), will it import the newer bundle? In other words, will a newer bundle version automatically propagate to all who join an activity, in the same way that a missing bundle would? -- [kritchie]

If a bundle is not a single (monolithic) file, can the computer that has an older bundle import just the newer compenent members of the bundle to bring its own bundle up to date without importing the whole bundle? -- [kritchie]

The contents of the bundle, as currently defined, are pretty much opaque. So it's effectively monolithic. Considering the disk constraints on the machine, bundles probably can't get too large, so maybe doing incremental upgrades of activities isn't worth the effort. -- Ian Bicking 13:13, 27 November 2006 (EST)

Version Keys: Interface Version Key vs. Build Version Key

On reviewing the .info structure, I am wondering why we don't have an *interface* version key (IVK). An IVK is a key that changes *only* when a component's "contract" changes with regard to either the interface structure (syntax, signature, parameter usage) or the agreed and expected functionality (semantics). Thus, newer builds that may be more efficient or have other performance benefits but which do not change the contract will have a higher *build* version key (BVK) while their compatibility remains unchanged (signified by no change in IVK). An unchanged IVK would usually indicate that upgrading is optional rather than mandatory to join the activity. If we want to declaratively force an upgrade, in the event that backward-compatibility is broken in a new interface contract, then we must use a third key, a *compatibility* version key (CVK). The CVK value sets the lower bound on acceptable values of the IVK. For example, (IVK=5, CVK=3) indicates that the new interface contract is simply an extension that is compatible back to IVK=3. Thus, a new IVK issued with a matching CVK would mandate upgrade before use, while later builds (of the same IVK, CVK) are optional and could be deferred without risk of harm to the activity.

So, in the present .info structure, if "activity_version" serves as a build version tag, we must introduce an "activity_contract" (or some such tag name) to serve as the interface version tag. On the other hand, if "activity_version" was already intended to signify interface version (IVK), then we could simply add an "activity_build" (or whatever) as BVK. The same line of reasoning applies to "host_version" key(s).

Observe that there is an implicit major/minor grouping of IVK/BVK (BVK may vary within IVK). We can introduce a new build version (BVK) within the scope of a given interface version (IVK), but it is not possible to change the interface version (IVK) without also introducing a new build. Given monotonically increasing integer keys, the only interesting implementation decision is whether to combine them (structured, as ivk.bvk) and whether to restart BVK when IVK increments. I see the most value in keeping them separate, as two distinct integer keys, and also in never restarting the build number. That allows the simplest logic for subproblems (e.g., "is this build newer than that one?").

Over the past 30 years, I have found the use of an Interface Version Key (IVK) to be remarkably valuable in automating the maintenance of distributed and componentized architectures, whether coupled by messages or remote methods (although I much prefer message-based coupling). The IVK also greatly simplifies many problem-solving strategies. I strongly recommend incorporation of something similar into the OLPC projects. -- [kritchie]

Security Question

Forgive my ignorance, but the idea of automatically downloading and running any 'activity' that announces itself on the network (subject to it having an exciting enough name for a child to try it) sounds like a huge security hole! are they intended to run in some sort of sandbox? if not, it seems to me almost as scary as ActiveX... [agundy]

The idea is that everything always runs in "some sort of sandbox" - see Bitfrost for details. Homunq 13:00, 29 January 2008 (EST)

Testing

In addition to running the activity, it would be nice to indicate how to run the tests for the activity. Running the tests is a key part of editing the activity; it's also an entry point for doing automated testing. If the tests look similar to how something like buildbot expects tests to look (non-zero exit code on failure, just keep the text output for the report) then we could test Sugar updates or other system updates against real code. -- Ian Bicking 13:18, 27 November 2006 (EST)

initial install?

Some bundles come with the OS. They can be transferred somehow, at least if there is a mesh network. They can be edited if they are Python. That's all nice, but... how does a developer get a bundle onto a machine?

Is it enough to just copy it into /usr/share/activities via the Linux console? Will the bundle be immediately usable, usable after reboot, ignored forever, deleted...?

From the web browser, can I download a bundle? (or anything...) Does that install it? Does the web server have to provide some non-standard MIME type? Would a my-new-project.xo.tar.bz2 file work?

It would be great to have a "Hello World" bundle that I could install directly from this wiki. Something that displays my test image would be nice.

AlbertCahalan 23:49, 9 March 2007 (EST)

looks like bad info

-bash-3.1# pwd
/usr/share/activities
-bash-3.1# rpm -qf Etoys.activity/
etoys-2.0.1224-2
-bash-3.1# ls */setup.py
BlockParty.activity/setup.py  Memosono.activity/setup.py    Web.activity/setup.py
GroupChat.activity/setup.py   NewsReader.activity/setup.py  Write.activity/setup.py
Journal.activity/setup.py     TamTam.activity/setup.py      Xbook.activity/setup.py
-bash-3.1# ls
BlockParty.activity  Etoys.activity      Journal.activity   NewsReader.activity  TamTam.activity  Write.activity
Camera.activity      GroupChat.activity  Memosono.activity  Slideshow.activity   Web.activity     Xbook.activity
-bash-3.1# find . -name localized
-bash-3.1# 

So the Etoys activity is really an RPM. Say, that looks like a workable solution with decent documentation. Etoys also massivly violates the rules about being self-contained; it depends on a separate RPM which isn't useful for anything else.

Actually, it is useful. You can take the etoys rpm and run Etoys on any regular Linux machine without Sugar. Squeak is one of the scripting languages on the XO, so it must be in the base image. You can write other Squeak-based activities which will use the Squeak kernel included in the etoys rpm, just as other activities depend on Python. Due to the monolithic nature of Squeak it is not possible to separate the Squeak kernel from the Etoys specific parts, otherwise they could be moved to the XO bundle indeed. Bert 17:21, 9 October 2007 (EDT)

There are 12 activities, but only 9 setup.py files. Obviously that isn't needed.

The 'activity/localized/' directory which "must be present" does not in fact need to be present at all. Not a single activity provides it.

24.110.145.57 20:10, 22 March 2007 (EDT)

Proposals for update

  • basic definition necessary for simplifying vocabulary in both of the proposals below: Each activity has a signing key which is used across versions as long as developer continuity allows. There is a predefined 1-to-1 mapping between bundle-id and signing key, so the two terms are interchangeable.

Desirable (this is a proposal for abstract properties, it does not define implementation):

  1. Auto-update for activities
    1. The dream is that if you have a better version of an activity than I do, and I share an instance of that activity with you, I will automatically start using the better version. But how do we define "better"?
      1. As bitfrost states explicitly, trust in a person != trust in code from that person
      2. A proposed definition: something is better if it:
        1. is signed as "created by" with the same key / has the same bundle-id
          1. There should be a predefined mapping between signing keys and bundleid (!!)
        2. is signed as "tested, confirmed improvement" by same key (separate kind of signature. Note that further such "tested" signatures are possible but ignored by Sugar. This is essentially "passes smoke test".)
        3. is a newer version
        4. claims to be able to reopen all relevant instances currently in the journal (or, for easier checking, all the same instances)
        5. Old version was not marked as "preferred stable version" on this xo.
          1. happens when user explicitly marks it as favorite
          2. activity sets can mark "preferred stable version" or not independently of favorite.
          3. can be set by user, though pretty well-hidden
    2. Even if Sugar cannot decide what is "better", it should at least associate "different versions of the same activity" and be able to collapse them together in the activity list, the donut, the "start with" palette, and other activity searches.
      1. possibility of forks inevitable, merges may be slightly desirable too.
      2. Definition: two activities are "the same activity" if:
        1. same bundle id / signing key OR
        2. the one with a lower version has a bundle id / signing key which is explicitly referred to as an ancestor of the higher version one; AND the lower version number is lower than or equal to the "version of forking" claimed by the higher one.
      3. Note that this is not a proper identity relation: the same older activity can be "the same as" two nonidentical (forked) recent activities. In this case, the older activity could be listed twice, or there could be a way to choose which fork was "mainline", or the UI could show the branching somehow
      4. Note also that this basic logic would work for forking versions of other files:
        1. each item has a UID which is stable across unproblematic version changes
        2. when doing something that could result in a fork (changing owners, opening an old version) change the UID but keep a reference to the old UID.
    3. Ability to resume an instance created by a different bundle-id/ signing signature.
      1. This is inherently insecure on that data, should never be done implicitly.
      2. In my scheme, the ability to do this can be synonymous with "same activity" as above.
        1. this uses version number of activity as a proxy for instance file format.
  1. Bundles can be generated in a standard way
    1. Right now, everybody has their own way
      1. maintenance nightmare
    2. Sugar should be able to regenerate for sharing
    3. There are several future possible uses for a well-defined hash of a bundle
      1. For instance, sugar could embed it into signatures whenever signing with a user's private key at the request of that bundle
    4. Since Develop uses this file format, bundle generation is a part of development
      1. source of bugs if activity authors use a different method to build
  1. Translations can be added without breaking signatures on a bundle
    1. editing an activity and adding translations are separate steps, and somebody qualified to judge the quality of one might not be for the other.
    2. using the same key for both is a hassle, and security which causes hassles will be circumvented at the cost of security.

Homunq 10:16, 7 April 2008 (EDT)

Signatures

This is a more-concrete proposal for an implementation. It does not include the same-activity stuff (1.2... and 1.3... above).

  1. UTF-8 filenames are acceptable, but carriage returns are not.
  2. MANIFEST must not include itself, TRANSLATABLES, HASHES, or any files in SIGNATURES, must not include directories, and must not have './'
  3. TRANSLATABLES is an optional file which uses the same format as .gitignore to indicate what files from MANIFEST should *not* be included in HASHES
    1. Obviously, this does introduce a security risk, as an unsigned TRANSLATABLES file could theoretically cause a buffer overflow (or, indeed, be deliberately run by malicious signed code). However, since the average python program is immune to buffer overflows, and since there is a separate (less-secure) signing mechanism for TRANSLATABLES files, this is considered acceptable.
Activities do not have to be written in Python so this is no good argument. Also, why except "translatables" at all? --Bert 03:21, 3 April 2008 (EDT)
L10n should not require bugging the original activity author - imagine having to sign multiple versions of dozens of languages for every activity version, besides the inconvenience it is a security risk because the underlying code could change between "pure-l10n" versions without the author realizing. Homunq 22:01, 6 April 2008 (EDT)
Correct me if I'm misunderstanding the process, but if TRANSLATABLES can be modified by anybody, then they can use that to remove signature coverage from any/all files in the bundle. Surely TRANSLATABLES should be signed? --morgs 11:08, 9 April 2008 (EDT)
  1. HASHES is an auto-generated file with the first line '#HASH-VERSION: 1.0; HASH-FUNCTION:sha256'. Linefeeds are unix-style.
  2. The further lines of HASHES alternate; one line with a path as in MANIFEST, and the base64-encoded sha256 hash of the binary contents of the file on the line which follows. There is no limit to line length.
Rather use the sha256 hexdigest, to use the same format as sha1sum, so that can be used on the command line to check HASHES. --morgs 11:08, 9 April 2008 (EDT)
    1. [The first two lines are TRANSLATABLES and its hash, if it exists]
    2. The rest of HASHES follows MANIFEST, in the same order, excluding those that match patterns in TRANSLATABLES.
  1. SIGNATURES is an optional directory with signatures
    1. to start out with, just have individual signatures of HASHES. (((REMOVED FROM PROPOSAL: Later it will have a list of maintainers, a list of developers, and the infrastructure to chain a history of who on which list. Maintainers sign the maintainers list (to leave the list, you need your own or everyone else's signature; to add, everyone currently on the list) and can individually grant or revoke developer's rights, developers just sign the code.)))
    2. files in TRANSLATABLES get per-file signatures by their authors
      1. the list of trusted translators can be broad.
    3. The signature format is a prefix, then the signature of (the prefix followed by the given data). This allows Sugar to attach metadata, such as date and activity hash, to a signature, in a way that is secure against malicious apps (but of course NOT so against malicious users with developer's keys).
Proposal Added by User:Homunq about 3 April

While this is a well written proposal, it does try to move the activity bundles towards becoming a package.

  1. Note that MANIFEST is not required and is not used.
  2. There is no consistency check on the OLPC to check directory contents against a HASH file. This type of consistency lives in the GIT system during development and building, the package management system during distribution and BitFrost once installed. I'm unclear what it adds by being in the .XO file.
  3. What specific benefit are you trying to achieve? Are you trying to reliably match built and installed activities against the GIT?

CharlesMerriam 13:11, 5 April 2008 (EDT)

This proposal doesn't address the actual signatures. I have some comments there:
  • Either you need a key per user (in which case there is already owner.key in ~/.sugar/default) if keys are not to be shared, or you need a key per activity. (I maintain 4 activities. If I co-maintain one of them I want to share a unique key for that activity with the co-maintainer, so they can't sign my other activities.)
  • Assuming the latter from IRC conversation (#olpc-meeting 8 April 2008) you need to store multiple private keys where Develop can access them to sign bundles. Although Develop may store these unencrypted (thus requiring no password to sign), I as a developer would want encrypted keys, and I imagine most upstream authors would as well.
  • Developers should not need to use Develop - there should be command line tools for those that want them. You therefore need tools to generate keys, view them, sign a bundle, verify a bundle etc.
  • Keys almost always need some identity bound with them - at least so you can differentiate your own keys - again, assuming multiple keys. Since the public key would be included with the signature, there should be a way (at least with abovementioned command line tool) to view the signature and see the identity. I think this identity could be any of: A person's name/email, A generic name ("Chat maintainer key"), or completely anonymous.
  • IMO the public key should be distributed self-signed with the signature, so that the public key itself can be verified. With this identity, you end up with a format like X.509 or PGP. It's best to use a well known format - in addition to tried and tested algorithms etc.
  • What I'm leading up to is that I think GPG may be a good fit for the key management (keyrings) as well as the signature format. All the command line options would be available.
--morgs 11:08, 9 April 2008 (EDT)

Related proposals

  • neuralis / ivan krstic : a proposal on dealing with default prefences / preference versions. Interesting but in my view does not belong in bundle spec, it is an installation issue.
  • bemasc / Benjamin M. Schwartz: a proposal which mostly deals with key management. [1]. m_stone suggests we should model this aspect after a reduced SPKI and I agree that the ideal system would use that as inspiration. However, to reduce scope, we may want to leave this aspect for later, and assume one eternal special-purpose key per activity/developer-group pair for now.

From the top (edited copy of my email)

Here's a copy of the email I sent. After pasting it, I will try to edit it in-place to use friendlier terminology.

The story so far:

The Develop activity uses the bundle format natively. Thus, it relies on bundlebuilder.py to create bundles and activitybundle.py to install them. But many activities have developed their own way of creating bundles, and do not follow the bundle spec (missing or incorrect MANIFEST). This means that their bundles do not rebuild correctly in Develop, which could lead to data loss in the current incarnation.

So, I have to fix bundlebuilder to have more error checking, and activitybundle to throw warnings to encourage people to follow the spec.

Also, there is a major part of the sugar spec unimplemented - the part where people can install activities by joining a shared instance of them. Implementing this will, almost inevitably, mean changes in the bundle spec. If I am going to start enforcing the spec, it would seem like the logical time to update the spec to work for updates.

Some terminology, and an idea from the mini-conference:

By "a bundle" I mean a specific set of bytes. Any change in those bytes means it's a different bundle. Yet actually there are some changes I generally don't care about. Using a different zip algorithm on the same set of files, for instance. Another *possible* instance is a change in translation strings/ localizable icons that does not touch any executable code. I have a proposal (see below) in which such "extras" would be exempt from signature. Thus, when precision is important, I will use "executable bundle" to denote everything with the same valid main signature - that is, everything which has identical executable code.

At the mini-conference, a shortcut to key management was proposed, where a developer would create a special-purpose key when they started a new activity, and sign all versions of that activity with their key. This is called the "activity's key" and the signature using this key is the "main signature". This idea assumes that key management is manual - either they send the key itself to collaborating developers, or they accept and apply patches and sign the results locally. If we decide that there are important key management issues missing from this picture, we could do something like SPKI. Thus the original developer would become an authority, and they would grant signing rights to other keys. Even in this case, "main signature" will refer to whatever set of data constitutes a valid signature under this scheme. Generally speaking, such data should not require network confirmation, although a finite lifetime is acceptable.

All bundles signed with the same activity's key are known as an "unbroken activity thread". It is assumed that key management would work and that there would be no forking within an activity thread (aside from, possibly, temporary experimental branches which never move beyond one machine). I propose that the unbroken activity thread should be the basic unit of analysis in sugar, and that each such thread should have a unique "bundle ID" (like org.laptop.392A7F). Any forks within an unbroken activity thread are very bad and are called "invisible forks".

If the key is lost or somebody wants to create a new version (and risk forking), there could be different bundle ID's that are ancestor and descendant (and handle the same files). These are part of the same "[broken] activity thread", which inevitably means that forks are possible. In my proposal, broken activity threads can be identified using the alleged history from the later version, which consists of prior bundle IDs and last-common-versions.

Also relevant to this discussion is the new planned journal design at Designs/Journal. Note that there are two kinds of things in the journal: "actions", which, as m_stone puts it, are "like UI completions", and "objects", which are more like normal files (pictures, documents, etc.) with mime types. Actions contain objects, but those objects are also accessible independently.

See also Designs/Activity_Management. "Favorite" bundles are visible in the "Home screen" (or "donut"), while all activities are available in the "activity list [view]".

The issues

The point of all of this is the user experience that it enables. There are three basic possibilities; sugar can understand just the bundle level, it can understand the unbroken activity thread level, or it can understand activity threads including breaks and forks. In the email I had some names for those based on sugar's perspective of what exists, I have renamed them below.:

Main options

  1. bundle level (was called "no such thing as versions"): all actions are associated with a given executable bundle, and can only be opened with that bundle. The favorites can be any set of bundles, whether or not these have an ancestry relationship. The XO does not garbage collect (GC) old bundles until there are no more instances which use them.
  1. unbroken thread level (was called "latest version, but no such thing as forks"): All actions are 100% upward-compatible across unbroken activity threads (when they aren't, you just break the thread). All actions open with latest version in an unbroken thread and "favorite" is an attribute of an unbroken thread - the latest version available is the one that opens. Broken activity threads are treated as different activities, as in bundle level.
  1. broken thread level (was called "no such thing as security"): As NSTAF, but auto-updates cross breaks in activity threads. If you have both sides of a fork, whichever one you got second shows up as a separate activity.

Ways of modifying one of the main options:

  • There could be some way to manually open an action with a different bundle. What is the UI to make this easier?
  • cute extra possibility: when you update your favorite activity to a new version, the UI asks you "why did you do that?". If you give an answer, this answer is visible in your shared instances of that activity to those with lower versions. This is safer than advertising new versions with changelogs from the author, since this way by nature they come from friends/ known sources. Dubbed "user-generated changelogs" on IRC, which moniker prompted "<cjb> homunq: OH MY GOD STOP".
  • "offloading garbage collection": The lower options above can easily lead to many actions on the same machine which refer to different bundles from the same thread. If disk space is short, it is possible to aggressively upload these to the school server, and download them as needed. This can lead to actions which do not work until you have connectivity. Note, however, that these actions would still be *visible* in the journal and that their object contents (the actual files) would still be accessible from there. Since we've all lived with just objects, no actions, until now (ca. 1987 MacOS "Switcher", and other "save workspace" gizmos, aside), I think this is acceptable.

Ways of combining two of the main options

  • "friendly reminders": Basic behavior is as one of the lower above options, but when you get a new bundle which, by one of the higher above options, would count as a different version of the same activity, there is some UI reminder (icon badges in the favorite view and on actions?) to update your favorite and your actions to the new version. Possibilities: bundle level with friendly reminders for unbroken threads (1 fr 2); bundle level with friendly reminders for broken threads (1 fr 3); unbroken thread level with friendly reminders for broken threads (2 fr 3).
  • "Serious magic": keep usage statistics of all bundles on the school server, including who manually chooses which bundle version and what their choices were. If these statistics show a clear and stable preference for version Y over version X, tell all local XOs to make Y a default over X. Possibilities: 1 sm 2, 1 sm 3, 2 sm 3.
  • "Serious local magic", where switching from X to Y is auto-defaulted the Nth time you do it manually on a given machine. Possibilities: 1 slm 2, 1 slm 3, 2 slm 3.

My vote

I would vote for "2 fr 3" - that is, pretty agressive about auto-updates. This allows for a decent level of garbage collection. One weakness that I do see with this option is its relatively strong assumption that later versions are better; I am open to proposals on how to weaken this assumption, though I do think it is good in 90% of the cases.

Added paragraph: I think that straight-out level 2 would be ignoring the real reasons that people fork (intentionally or unintentionally). By trying to legislate that "anything that might be a fork is a separate activity", it would create social pressures for poor key managment, that would eventually cause some combination of: extra trojans and extra invisible forks (from compromised activity keys); and on the other hand, extra breaks in threads (from overzealously protected activity keys). The scenarios leading to these possibilities are left as an exercise for the reader.

My proposal

I made the proposals above before I got pushback on this. They essentially assumed something like "2 fr 3".

My timeline

I have enough little bug fixes pending to occupy me for this week. I would like to have some kind of game plan starting next week. That could be just "this is too hard to decide now, just start enforcing the activity bundle format as it stands" but I would prefer to have a clear bundle format, or at least an agreement on the UI goals it should support, by that time.

Homunq 14:14, 9 April 2008 (EDT)