Shared revisions

From OLPC
Jump to: navigation, search

There are a number of different ways to share the state of an activity with a group of collaborators / editors.

One is an unordered set of actions, which don't depend on one another. Any action can be undone independently of the others. (imagine a group of people adding bits of colored light to a large, prismatic collage; you can add or subtract light and color independent of all other lights/colors).

Another is a set of causal actions, which are partially ordered and do depend on one another. For instance a Write or Paint session, some of whose actions change actions that have come before.

There is also a punctuated equilibrium model which is fairly general...

punctuated equilibrium

Punctuated equilibrium supports a set of actions:

 sync
 undo to last sync
 soft-undo last action [of mine]
 soft-undo last action [of anyone's]
 save state
 fork
 soft-undo last action (of other) would be the last action before the sync, right?
 attempt merge [can happen while forked]
 push/suggest merge to others
<homunq> what is difference between merge and sync?
<_sj_> in between syncs, you have soft updates and attempted undo's
<_sj_> if it's obvious how to undo and there is nothing else that depends on the last action
<_sj_> you can undo it, whoever else did it; 
<_sj_> you can also try to undo your own action even if other changes have occurred elsewhere, say if you are working in one part of a shared system and other changes are independent of it
<_sj_> (it = the part you are working on)
<_sj_> sync is basically a 'save state + push merge' where the merge seems automatic
<_sj_> you can also fork and make changes on your own, no longer accepting changes made by others, and later come back and attempt/suggest a merge to those being shared with
<_sj_> you need UI indicators and elements that let you know when you are forked / out of sync
<_sj_> and when someone is pushing a sync
<_sj_> and that show you the expected diff b/t your current state and a suggested new state

punctuated soup (homunq's proposal)

A proposal for how to use a set of primitives to give self-syncing power to arbitrary data structures. For such generality to apply, the primitives must be reimplemented for each data structure. Generic primitives would give "one side wins" behavior which would be suboptimal -- usable for synchronous collaboration with good communications, but pretty poor for asynchronous or unreliable situations.

required primitives

  • underlying state data
  • diffs: a two-way map from a pair of (old data state, new data state) to (old data state, diff)
  • diffs will generally be reversible (inverse of A is ~A); if not, you can store both directions for a reversible diff.
  • (optional) the ability to decompose a diff into disconnected subsets
  • merges: a map from (old data state, diff1, diff2) to (compatible changes, subdiff1, subdiff2) where (compatible changes) is independent of order of diffs, subdiff1 and subdiff2 are "minimal conflicting subsets" of diff1 and diff2. Note that a stupid algorithm is always available when all else fails: the identity mapping.
  • When a merge is applied, there is a rule to decide which one is given precedence. If diff1 wins, the new state is the result of (compatible changes, subdiff1), but the diff from that to (compatible changes, subdiff2) is also computed and placed on the redo (not undo) "stack".
  • If
        state1   ---diffB-->   state3
        \     ^                ^
         \     \              /
        diffA  diffD         diffC
           \     \          /
            V     \        /
             ------===state2
and (state2, diffC, diffD) gives (something, <empty>, <empty>), then (state1, diffA, diffB) should give (state2, <empty>, diffC).

how this gives self-syncing

With the above and causal ordering, it is possible to make a data structure which synchronizes itself, follows a generalized undo semantics, and may have desirable efficiency.

All participants in an activity have a "rank", and the highest ranking one visible is treated as the current hub. Assume that all instances start out aware that they are synchronized in "best benchmark" state A, which to start out with is also the "latest confirmed benchmark" between the hub and any other instance. Any actions by a non-hub instance are sent to the hub as a single compound diff from the instances latest possible benchmark with the hub. The hub performs a merge with any other actions (or compunded series of actions) posterior to that benchmark, and then broadcasts the diff from that benchmark to the new state, which becomes the new best benchmark. (...would be easier to code than to write in English. Sharing enough knowledge about who knows which benchmark so that you can, if space is short, forget benchmarks which are known to be superceded by something for every pair of instances; starting out by sending diffs from the latest possibly-shared-knowledge benchmark and falling back to working from an older, confirmed-common-knowledge benchmark...)

If an instance gets a broadcast merge result after it has performed new changes, it merges the new changes with the result, giving precedence to the result. The diff from the result to the merge is now sent back to the hub as a new action, superceding more-intelligent merges that the hub may have done meantime (this can only happen when communication is one-way for some reason). Again, changes that are inapplicable due to conflict are pushed on the local redo stack.

The redo stack is cleared whenever the local instance performs a new, incompatible action. However, a new action or merge result newdiff coming in from another instance will cause the redo to be recalculated from the current state (that is, the diff to the merge of the redo).

So, at any time, any instance will have a set of document histories and/or the equivalent chain of diffs leading to its current state. It can, if desired, forget (and/or compound bracketing diffs for) histories which are not possibly fork points (last common benchmarks) between it and another instance. Undo and redo are both stacks, but you can try to (un)apply them out-of-order: if you have 1 -A-> 2 -B-> 3 (now), then trying to undo A without undoing B gives(2, B, ~A) -- the merge between B and ~A, with B given precedence. This is treated as a new action and placed on the stack. I can imagine some cool UIs for exploiting this possibility.

comparisons with other proposals

bemasc's dObject proposal (shared add-only sets, dicts, and strings)

  • all the sharing magic happens behind the scenes, and data consumers are alerted by callbacks of changes due to sharing.
  • you get super-undo functionality for free, too.
  • the same system works for synchronous and asynchronous collaboration.

This is different from bemasc's proposal, as I understand it, because:

  • arbitrary data structures can be made to self-sync.
  • For text data, you can drop in different existing three-way merge algorithms to change the overall behavior.
  • Some negotiation is occasionally required to choose a common benchmark for a merge. If communication is good, this should work on the first try. Results can depend, even dramatically, on which basis is chosen; whereas bemasc's idea is deterministic, once all messages percolate.
  • In cases where communication is intermittent between more than two participants, and where the merge algorithm is pathological for some reason, this could lead to changes getting lost or applied twice.
  • Except in cases with many participants and poor communication, this scheme allows a lot more space optimization by forgetting intermediate steps. This also allows the diff algorithm to be optimized for truly 3-way diffs, instead of trying to combine entire step-by-step series of actions. (In this sense, bemasc's algorithm is akin to the no-forgetting special case of mine).

sj's punctuated equilibrium proposal

As far as I can see, my proposal is a bit (not much) more specific about how things are achieved, and sj's unstated specific ideas may well differ from (and be better than) mine. But the main difference I can see now is that sj is looking at merges as optional: you may want to stay in a forked state for some temporary but significant time. I can't remember whether he thinks of that option as manual for the user, or programmatically decided. I think that, if you ever want to be intentionally but temporarily forked, it would be in the form of a "work offline" option on a document, not in the form of manually choosing "sync" whenever you remembered to and felt like it.

Homunq 13:03, 20 November 2008 (UTC) 05:20, 20 November 2008 (UTC)