Bundles and updates
From the top (edited copy of my email)
Here's a copy of the email I (homunq) sent after my conversation with Mstone and others. After pasting it, I will try to edit it in-place to use friendlier terminology.
The story so far:
The Develop activity uses the activity bundle format natively. Thus, it relies on bundlebuilder.py to create bundles and activitybundle.py to install them. But many activities have developed their own way of creating bundles, and do not follow the bundle spec (missing or incorrect MANIFEST). This means that their bundles do not rebuild correctly in Develop, which could lead to data loss in the current incarnation.
So, I have to fix bundlebuilder to have more error checking, and activitybundle to throw warnings to encourage people to follow the spec.
Also, there is a major part of the sugar spec unimplemented - the part where people can install activities by joining a shared instance of them. Implementing this will, almost inevitably, mean changes in the bundle spec. If I am going to start enforcing the spec, it would seem like the logical time to update the spec to work for updates.
Some terminology, and an idea from the mini-conference:
By "a bundle" I mean a specific set of bytes. Any change in those bytes means it's a different bundle. Yet actually there are some changes I generally don't care about. Using a different zip algorithm on the same set of files, for instance. Another *possible* instance is a change in translation strings/ localizable icons that does not touch any executable code. I have a proposal (see below) in which such "extras" would be exempt from signature. Thus, when precision is important, I will use "executable bundle" to denote everything with the same valid main signature - that is, everything which has identical executable code.
At the mini-conference, a shortcut to key management was proposed, where a developer would create a special-purpose key when they started a new activity, and sign all versions of that activity with their key. This is called the "activity's key" and the signature using this key is the "main signature". This idea assumes that key management is manual - either they send the key itself to collaborating developers, or they accept and apply patches and sign the results locally. If we decide that there are important key management issues missing from this picture, we could do something like SPKI. Thus the original developer would become an authority, and they would grant signing rights to other keys. Even in this case, "main signature" will refer to whatever set of data constitutes a valid signature under this scheme. Generally speaking, such data should not require network confirmation, although a finite lifetime is acceptable.
All bundles signed with the same activity's key are known as an "unbroken activity thread". It is assumed that key management would work and that there would be no forking within an activity thread (aside from, possibly, temporary experimental branches which never move beyond one machine). I propose that the unbroken activity thread should be the basic unit of analysis in sugar, and that each such thread should have a unique "bundle ID" (like org.laptop.392A7F). Any forks within an unbroken activity thread are very bad and are called "invisible forks".
If the key is lost or somebody wants to create a new version (and risk forking), there could be different bundle ID's that are ancestor and descendant (and handle the same files). These are part of the same "[broken] activity thread", which inevitably means that forks are possible. In my proposal, broken activity threads can be identified using the alleged history from the later version, which consists of prior bundle IDs and last-common-versions.
Also relevant to this discussion is the new planned journal design at Designs/Journal. Note that there are two kinds of things in the journal: "actions", which, as m_stone puts it, are "like UI completions", and "objects", which are more like normal files (pictures, documents, etc.) with mime types. Actions contain objects, but those objects are also accessible independently.
See also Designs/Activity_Management. "Favorite" bundles are visible in the "Home screen" (or "donut"), while all activities are available in the "activity list [view]".
The issues (from a user experience point of view)
The point of all of this is the user experience that it enables. There are three basic possibilities; sugar can understand just the bundle level, it can understand the unbroken activity thread level, or it can understand activity threads including breaks and forks. In the email I had some names for those based on sugar's perspective of what exists, I have renamed them below. I believe all of the below options are technically possible, although of course some are easier than others.
Statement of problem (my biased view)
From a user point of view, when are two different bundles examples of "doing the same thing"? When do I want to stop using an old one and start using a new one?
Technically speaking, the closest thing to an incarnation of this issue are the "actions" in the journal. An action is a title, some tags or other journal-oriented metadata, a bundle of well-mime-typed "objects", a list of shared companions and some activity-specific data: "I am painting 'our houses', these two pictures, with Kim; mine is on the left and I'm using this brush". Of all of that, the activity-specific data is most subject to format change, but it is also the least important. If I want to come back and resume that action in 6 months or a year, and I get the pictures and the friend but they are in the wrong order and the wrong brush, no big loss. For me, fundamentally, the "same thing" is defined operationally (painting) and by object(s); therefore, versioning should attempt to have approximately the same granularity.
Another primary consideration is the user experience of using "develop" that this will support. Is it easy to make an experimental branch of an existing project? Is it easy yet minimally secure (that is, user-explicit in some way) to try to use my branch on my existing actions or objects? What happens when I get an update but I'm still working on my branch? How can my branch become "official"? Consider that most such branches will take on a minor feature and should not break format compatibility at all.
All of this means that the focus is more on what the UI presents as "another version of the same bundle". File format compatibility is a part of this question, but is not in itself necessary (or even sufficient, in a few cases).
Main options
- bundle level (was called "no such thing as versions"): all actions are associated with a given executable bundle, and can only be opened with that bundle. The favorites can be any set of bundles, whether or not these have an ancestry relationship. The XO does not garbage collect (GC) old bundles until there are no more instances which use them.
- unbroken thread level (was called "latest version, but no such thing as forks"): All actions are 100% upward-compatible across unbroken activity threads (when they aren't, you just break the thread). All actions open with latest version in an unbroken thread and "favorite" is an attribute of an unbroken thread - the latest version available is the one that opens. Broken activity threads are treated as different activities, as in bundle level.
- broken thread level (was called "no such thing as security"): As with level 2, but auto-updates cross breaks in activity threads. If you have both sides of a fork, whichever one you got second shows up as a separate activity.
Ways of modifying one of the main options:
- There could be some way to manually open an action with a different bundle. What is the UI to make this easier?
- cute extra possibility: when you update your favorite activity to a new version, the UI asks you "why did you do that?". If you give an answer, this answer is visible in your shared instances of that activity to those with lower versions. This is safer than advertising new versions with changelogs from the author, since this way by nature they come from friends/ known sources. Dubbed "user-generated changelogs" on IRC, which moniker prompted "<cjb> homunq: OH MY GOD STOP".
- "offloading garbage collection": The lower options above can easily lead to many actions on the same machine which refer to different bundles from the same thread. If disk space is short, it is possible to aggressively upload these to the school server, and download them as needed. This can lead to actions which do not work until you have connectivity. Note, however, that these actions would still be *visible* in the journal and that their object contents (the actual files) would still be accessible from there. Since we've all lived with just objects, no actions, until now (ca. 1987 MacOS "Switcher", and other "save workspace" gizmos, aside), I think this is acceptable.
Ways of combining two of the main options
- "friendly reminders": Basic behavior is as one of the lower above options, but when you get a new bundle which, by one of the higher above options, would count as a different version of the same activity, there is some UI reminder (icon badges in the favorite view and on actions?) to update your favorite and your actions to the new version. Possibilities: bundle level with friendly reminders for unbroken threads (1 fr 2); bundle level with friendly reminders for broken threads (1 fr 3); unbroken thread level with friendly reminders for broken threads (2 fr 3).
- "Serious magic": keep usage statistics of all bundles on the school server, including who manually chooses which bundle version and what their choices were. If these statistics show a clear and stable preference for version Y over version X, tell all local XOs to make Y a default over X. Possibilities: 1 sm 2, 1 sm 3, 2 sm 3.
- "Serious local magic", where switching from X to Y is auto-defaulted the Nth time you do it manually on a given machine. Possibilities: 1 slm 2, 1 slm 3, 2 slm 3.
Not considered
- "Push" - type updates
- Blacklists of known trojans (this is only remotely useful if there is a limited store of keys usable for signing, which means some kind of SPKI, probably including checking with school servers to see if a key is from an XO).
- Key management, esp. revocation (same problem, mostly)
Votes and arguments
homunq: 2 fr 3 - that is, pretty agressive about auto-updates.
- This allows for a decent level of garbage collection. One weakness that I do see with this option is its relatively strong assumption that later versions are better; I am open to proposals on how to weaken this assumption, though I do think it is good in 90% of the cases.
- I think that straight-out level 2 would be ignoring the real reasons that people fork (intentionally or unintentionally). By trying to legislate that "anything that might be a fork is a separate activity", it would create social pressures for poor key managment, that would eventually cause some combination of: extra trojans and extra invisible forks (from compromised activity keys); and on the other hand, extra breaks in threads (from overzealously protected activity keys). The scenarios leading to these possibilities are left as an exercise for the reader. (One reason for forking is to create an experimental development branch. I think we should support that.) (Oh, and by the way, I think that support for forking versions should be in the Journal, too, for any document. <trac>6007</trac>)
My proposal
I made the proposals below before I got pushback on this. They essentially assume something like "2 fr 3". The level of complexity required in the format is essentially dependent on the highest level offered by the interface. That is, both 1fr2 and plain 2 would be equally simple bundle-format-wise, and simpler than my proposal.
My timeline
I have enough little bug fixes pending to occupy me for this week. I would like to have some kind of game plan starting next week. That could be just "this is too hard to decide now, just start enforcing the activity bundle format as it stands" but I would prefer to have a clear bundle format, or at least an agreement on the UI goals it should support, by that time.
Homunq 14:14, 9 April 2008 (EDT)
Proposals for update
- basic definition necessary for simplifying vocabulary in both of the proposals below: Each activity has a signing key which is used across versions as long as developer continuity allows. There is a predefined 1-to-1 mapping between bundle-id and signing key, so the two terms are interchangeable.
Desirable (this is a proposal for abstract properties, it does not define implementation):
- Auto-update for activities
- The dream is that if you have a better version of an activity than I do, and I share an instance of that activity with you, I will automatically start using the better version. But how do we define "better"?
- As Bitfrost states explicitly, trust in a person != trust in code from that person
- A proposed definition: something is better if it:
- is signed as "created by" with the same key / has the same bundle-id
- There should be a predefined mapping between signing keys and bundleid (!!)
- is signed as "tested, confirmed improvement" by same key (separate kind of signature. Note that further such "tested" signatures are possible but ignored by Sugar. This is essentially "passes smoke test".)
- is a newer version
- claims to be able to reopen all relevant instances currently in the journal (or, for easier checking, all the same instances)
- Old version was not marked as "preferred stable version" on this xo.
- happens when user explicitly marks it as favorite
- activity sets can mark "preferred stable version" or not independently of favorite.
- can be set by user, though pretty well-hidden
- is signed as "created by" with the same key / has the same bundle-id
- Even if Sugar cannot decide what is "better", it should at least associate "different versions of the same activity" and be able to collapse them together in the activity list, the donut, the "start with" palette, and other activity searches.
- possibility of forks inevitable, merges may be slightly desirable too.
- Definition: two activities are "the same activity" if:
- same bundle id / signing key OR
- the one with a lower version has a bundle id / signing key which is explicitly referred to as an ancestor of the higher version one; AND the lower version number is lower than or equal to the "version of forking" claimed by the higher one.
- Note that this is not a proper identity relation: the same older activity can be "the same as" two nonidentical (forked) recent activities. In this case, the older activity could be listed twice, or there could be a way to choose which fork was "mainline", or the UI could show the branching somehow
- Note also that this basic logic would work for forking versions of other files:
- each item has a UID which is stable across unproblematic version changes
- when doing something that could result in a fork (changing owners, opening an old version) change the UID but keep a reference to the old UID.
- Ability to resume an instance created by a different bundle-id/ signing signature.
- This is inherently insecure on that data, should never be done implicitly.
- In my scheme, the ability to do this can be synonymous with "same activity" as above.
- this uses version number of activity as a proxy for instance file format.
- The dream is that if you have a better version of an activity than I do, and I share an instance of that activity with you, I will automatically start using the better version. But how do we define "better"?
- Bundles can be generated in a standard way
- Right now, everybody has their own way
- maintenance nightmare
- Sugar should be able to regenerate for sharing
- There are several future possible uses for a well-defined hash of a bundle
- For instance, sugar could embed it into signatures whenever signing with a user's private key at the request of that bundle
- Since Develop uses this file format, bundle generation is a part of development
- source of bugs if activity authors use a different method to build
- Right now, everybody has their own way
- Translations can be added without breaking signatures on a bundle
- editing an activity and adding translations are separate steps, and somebody qualified to judge the quality of one might not be for the other.
- using the same key for both is a hassle, and security which causes hassles will be circumvented at the cost of security.
Homunq 10:16, 7 April 2008 (EDT)
Signatures
This is a more-concrete proposal for an implementation. It does not include the same-activity stuff (1.2... and 1.3... above).
- UTF-8 filenames are acceptable, but carriage returns are not.
- MANIFEST must not include itself, TRANSLATABLES, HASHES, or any files in SIGNATURES, must not include directories, and must not have './'
- TRANSLATABLES is an optional file which uses the same format as .gitignore to indicate what files from MANIFEST should *not* be included in HASHES
- Obviously, this does introduce a security risk, as an unsigned TRANSLATABLES file could theoretically cause a buffer overflow (or, indeed, be deliberately run by malicious signed code). However, since the average python program is immune to buffer overflows, and since there is a separate (less-secure) signing mechanism for TRANSLATABLES files, this is considered acceptable.
- Activities do not have to be written in Python so this is no good argument. Also, why except "translatables" at all? --Bert 03:21, 3 April 2008 (EDT)
- L10n should not require bugging the original activity author - imagine having to sign multiple versions of dozens of languages for every activity version, besides the inconvenience it is a security risk because the underlying code could change between "pure-l10n" versions without the author realizing. Homunq 22:01, 6 April 2008 (EDT)
- Correct me if I'm misunderstanding the process, but if TRANSLATABLES can be modified by anybody, then they can use that to remove signature coverage from any/all files in the bundle. Surely TRANSLATABLES should be signed? --morgs 11:08, 9 April 2008 (EDT)
- Activities do not have to be written in Python so this is no good argument. Also, why except "translatables" at all? --Bert 03:21, 3 April 2008 (EDT)
- HASHES is an auto-generated file with the first line '#HASH-VERSION: 1.0; HASH-FUNCTION:sha256'. Linefeeds are unix-style.
- The further lines of HASHES alternate; one line with a path as in MANIFEST, and the base64-encoded sha256 hash of the binary contents of the file on the line which follows. There is no limit to line length.
- Rather use the sha256 hexdigest, to use the same format as sha1sum, so that can be used on the command line to check HASHES. --morgs 11:08, 9 April 2008 (EDT)
- [The first two lines are TRANSLATABLES and its hash, if it exists]
- The rest of HASHES follows MANIFEST, in the same order, excluding those that match patterns in TRANSLATABLES.
- SIGNATURES is an optional directory with signatures
- to start out with, just have individual signatures of HASHES. (((REMOVED FROM PROPOSAL: Later it will have a list of maintainers, a list of developers, and the infrastructure to chain a history of who on which list. Maintainers sign the maintainers list (to leave the list, you need your own or everyone else's signature; to add, everyone currently on the list) and can individually grant or revoke developer's rights, developers just sign the code.)))
- files in TRANSLATABLES get per-file signatures by their authors
- the list of trusted translators can be broad.
- The signature format is a prefix, then the signature of (the prefix followed by the given data). This allows Sugar to attach metadata, such as date and activity hash, to a signature, in a way that is secure against malicious apps (but of course NOT so against malicious users with developer's keys).
- Proposal Added by User:Homunq about 3 April
comments
While this is a well written proposal, it does try to move the activity bundles towards becoming a package.
- Note that MANIFEST is not required and is not used.
- There is no consistency check on the OLPC to check directory contents against a HASH file. This type of consistency lives in the GIT system during development and building, the package management system during distribution and Bitfrost once installed. I'm unclear what it adds by being in the .XO file.
- What specific benefit are you trying to achieve? Are you trying to reliably match built and installed activities against the GIT?
CharlesMerriam 13:11, 5 April 2008 (EDT)
- This proposal doesn't address the actual signatures. I have some comments there:
- Either you need a key per user (in which case there is already owner.key in ~/.sugar/default) if keys are not to be shared, or you need a key per activity. (I maintain 4 activities. If I co-maintain one of them I want to share a unique key for that activity with the co-maintainer, so they can't sign my other activities.)
- Assuming the latter from IRC conversation (#olpc-meeting 8 April 2008) you need to store multiple private keys where Develop can access them to sign bundles. Although Develop may store these unencrypted (thus requiring no password to sign), I as a developer would want encrypted keys, and I imagine most upstream authors would as well.
- Developers should not need to use Develop - there should be command line tools for those that want them. You therefore need tools to generate keys, view them, sign a bundle, verify a bundle etc.
- My assumption is that bundlebuilder (either inside or outside of sugar - no dependencies on sugar) would provide most or all of these functions. Develop would be only one user of bundlebuilder. I think that all introspection should be moved from activitybundle to bundlebuilder - activitybundle should only be responsible for actual installation.
- Keys almost always need some identity bound with them - at least so you can differentiate your own keys - again, assuming multiple keys. Since the public key would be included with the signature, there should be a way (at least with abovementioned command line tool) to view the signature and see the identity. I think this identity could be any of: A person's name/email, A generic name ("Chat maintainer key"), or completely anonymous.
- Good point.
- IMO the public key should be distributed self-signed with the signature, so that the public key itself can be verified. With this identity, you end up with a format like X.509 or PGP. It's best to use a well known format - in addition to tried and tested algorithms etc.
- Agreed. (I did look at jar and rpm formats for bundle format itself, decided there were enough differences to justify a separate format).
- What I'm leading up to is that I think GPG may be a good fit for the key management (keyrings) as well as the signature format. All the command line options would be available.
- --morgs 11:08, 9 April 2008 (EDT)
- I agree re: keyrings. I have some ideas about signature format which are not in GPG... more on that later.
- Homunq 15:00, 9 April 2008 (EDT)
I'm apparently a bugbear of very little brain. Could you please explain? 1.1 "..uses the bundle format natively", and 1.2 a bundle is "a specific set of bytes". So what's a bundle? How does it relate to the common usage, such as in the 2008_Debate_of_Build_and_Release#Naming_Scheme_Proposal
- Bundle, as in activity bundles. That link seems to be talking about bundles of bundles.
1.1 What is this spec you speak of? Most people use some form of setup.py which makes something almost, but not quite, entirely unlike Python Eggs' setup.py.
- ouch, you're right, I need to make a more prominent link to activity bundles. The generic setup.py uses bundlebuilder, which is the de facto spec. Many people use it, some abuse it, and some use their own made-up code. Homunq 12:02, 10 April 2008 (EDT)
1.2 You want to have separate content signing for a subset of a package because of translations? This still doesn't address that many people will only want to automatically pitch all but one or two language translations on installation, for space. How are you addressing this issue? Language translations generally accrete, so simply pulling the translations separately from the Activity Bundle should work.
- For the bundle format itself, I am not addressing that issue. That is an installation issue. By having the translations unsigned, though, I am making it *possible to* address that issue.
The branch thing is a bit weird. Many projects make branches all the time. Branches are created by feature and merged in when the feature quality is on par with the mainline.
- Good point.
Without going further, I keep falling over the scoping problem. On one side, you want Develop to be the build manager and packager. On the other?
- I don't understand you. I want bundlebuilder to do the work of building and package management, and I want Develop to use that functionality. I want bundlebuilder to have no dependencies on sugar, so that people can just put it in their pythonpath however they want and then setup.py will work on their non-xo box. I do not see any "other side".
BTW, why no talk page? CharlesMerriam 18:49, 9 April 2008 (EDT)
- No reason, you can talk there if you want. I just copied this *from* a talk page, so that's why comments ended up here. Homunq 12:02, 10 April 2008 (EDT)
Related proposals
- neuralis / ivan krstic : a proposal on dealing with default prefences / preference versions. Interesting but in my view does not belong in bundle spec, it is an installation issue.
- bemasc / Benjamin M. Schwartz: a proposal which mostly deals with key management. [1]. m_stone, on the other hand, suggests we should model this aspect after a reduced SPKI and I agree that the ideal system would use that as inspiration. However, to reduce scope, we may want to leave this aspect for later, and assume one eternal special-purpose key per activity/developer-group pair for now.
- m_stone / Michael Stone: fragmented commentary which ought to be merged into the present document.