Plan of Record-2008/Draft 2
This is the second draft of my thoughts on how the OLPC technical community should comprehend what software it should try to build and release in the next several months. It is based on substantial feedback on the original August planning notes and on several subsequent conversations with people like Wad, Kim, and SJ. More conversations to follow soon, hopefully with the functional directors.
Foreword
Think of a toddler struggling to learn to walk, to talk, and to think. Like that toddler, OLPC's software mixes flashes of talent and insight with a charming toothless grin and a general unsteadiness on its feet. Picture the poor parents of that software toddler, desperate for a few hours of peace in each day. Picture their search for time set apart from the daily struggle to quit bad habits for the sake of the child, for resources to sustain, enliven, and enlarge the family, and for the wisdom to fit old (technical) dreams to the new (support) reality. Picture us.
We have many "unsatisfied technical dreams" in every category of software development including real-time collaboration, long-term collaboration, battery life, software isolation, "undoability", explorability, hackability, and cost to deploy. Consequently, we face a wrenching question: what should we hope to achieve in our next few releases without neglecting our child or further straining the frayed strands binding us together?
Overview and Goals
This document is a statement of the "who, what, where, when, why, and how" of the development of the OLPC software platform as viewed from April, 2008. It is being written to ameliorate concerns published in several fora, including the development IRC channels, technical mailing lists, the OLPC News forum, and internal conversations which suggest that OLPC's software development effort lacks focus, direction, and predictability.
Its intended scope is generally the next eight months of development and particularly the next four months. It has four intended audiences; namely, the OLPC Functional Directors who will approve or amend it, the community of people who will carry out its directives (both OLPC employees and members at large), the people who use software produced by the OLPC software community, and the general public.
In short, this document must:
- represent the diversity of views within the technical community about the risks and opportunities afforded by the recognized paths forward,
- state and justify one primary path that we intend to take, and
- explain our fallback plan if we encounter unsurmountable roadblocks along our chosen path.
In its final form, it will also explain what feedback we have received from sales & deployment. In its draft form, it will propose a reasonable deadline for revisions based on new feedback from those teams.
"Who": Dramatis Personae
The following table gives a rough overview of the people and talents that the OLPC technical community can readily call upon in order to fulfill its goals.
(c): community (p): partner (o): employee/contractor (Support) Carol Lerché (c) Gary C. Martin (c) John Gilmore (c) Sandy Culver (c) Adam Holt (o) Emily Smith (o) (Organization) Charles Merriam (c) Aaron Kaplan (c) Kim Quirk (o) Jim Gettys (o) Greg de Koenigsberg (p) Sharon Lally (o) Mike Fletcher (c) Gustavo Mariotto (p) (also Michael and SJ at need) (Deployment) Walter Bender (c) Ivan Krstić (c) Greg Smith (c) Bryan Berry (c) Arjun Sarwal (o) Carla Gomez Monroy (o) Edgar Ceballo (p) Habib Khan (o) Fiorella Haim (p) Daniel G.S. (p) Javier Rodriguez (c) (Sales) Nicholas Negroponte (o) Jeff Mandell (o) Walter de Brouwer (o) Matt Keller (o) (Infrastructure) Noah Kantrowitz (c) Titus Brown (c) Dennis Gilmore (o) Henry Hardy (o) Bernardo Innocenti (o) (XO-chat? Collabora?) (User-facing Software) Benjamin Schwartz (c) Benjamin Mako Hill (c) Jameson Chema Quinn (c) Marco Pesenti Gritti (p) Tomeu Vizoso (o) Simon Schampijer (o) Eben Eliason (o) Sayamindu Dasgupta (o) SJ Klein (o) Bert Freudenberg (c) Richard Boulanger (c) (System Software) Dave Woodhouse (p) Jordan Crouse (p) Chris Ball (o) Scott Ananian (o) Mitch Bradley (o) Michael Stone (o) Andres Salomon (o) (Network) James Cameron (c) Marcus Leech (p) Michail Bletsas (o) Poly Ypodimatopolous (c) Ricardo Carrano (o) (Hardware) Richard Smith (o) V. Michael Bove (o) John Watlingon (o) (Collaboration): Dafydd Harries (o) Guillaume Desmottes (o) Sjoerd Simons (o) Robert McQueen (o) Morgan Collett (o) (School Server) Martin Langhoff (o) (Quality Assurance): sporadic volunteer efforts by members of the technical community episodic support from interns
Release Management Thoughts
- Pay close attention to the costs of releasing.
- When balancing proposed release dates, we must account for both the costs of developing software and for the costs of releasing the software we develop. (To date, the lack of adequate measurement and release resources has forced us to rob Peter in order to pay Paul.)
- (Kim: we might improve the accuracy of our forecasts by waiting to propose a release date until having formed a change-controlled release stream with a fixed set of features.)
- To date, OLPC has made "omnibus" releases which contain changes affecting many use cases or qualities simultaneously rather than making releases which change only one use case or quality at a time.
- This approach has some serious defects. One important defect is that this release method bottlenecks on the availability of QA resources which, as §2 demonstrates, are an extremely scarce resource. Exacerbating this defect is the fact that interacting changes are harder to QA than non-interacting changes because of the combinatorial growth of the number of control flow paths that must be tested in order to generate good release notes. Therefore, releases are considered to be expensive to make and are infrequently produced. Therefore, people who desire changes have to wait a long time in order to receive those changes.
- My conclusion is that by making smaller, more focused releases, we may increase "improvement throughput" simply by making releases in a fashion more compatible with the available QA and release management resources.
- Countries have substantial investments in existing builds.
- Perhaps the release cycle should be decoupled from the deployment cycle so that countries have more freedom to choose and to request builds appropriate for their needs? (i.e. balancing sunk costs in training and infrastructure alongside desires for mature "warhorse" builds and desires for "fresh" builds)
- (Would this attitude also address Walter's comments that countries would benefit from more lead time in which to conduct training and testing themselves?)
"What": Labor and Software Quality
OLPC's recent software development effort has been divided among more than ten different technical projects (some of which have been paired below because they are in ongoing competition for scarce resources):
sugar + system ----------- Compatibility / \ sugar kernel / Power management Isolation \ + / Networking Stability \ system \ Collaboration Performance / system \ Datastore UI Interaction / sugar \ / XO/XS Coordination ----------- server + system
Nicholas, Martin, and Wad strongly argue that we should be working toward "stability"; however, stability is a notoriously tricky concept to apply to software. I attribute this to the fact that word's fundamental sense is so irredeemably physical: "the power of remaining erect; freedom from liability to fall or be overthrown" (OED).
After stumbling on this ambiguity in many conversations, I wish to propose a definition that I believe describes the sort of stability which we desire:
- "We desire: local convergence of what happened, what the user wanted to happen, and what the user thinks happened."
This is a good metric against which to judge proposed changes and resource allocations because it accurately describes some important failings of our present software, for example:
- the failure of an activity launch is bad because it violates the user's immediate intention and the pulsing icon that remains is worse because it has become a misdirection
- invisible memory leaks are bad both because they separate the user's belief about the system state from its actual state and because they lead to violations the user's intentions
- activity sharing failure is bad both because it is counter to the user's intention and because it implies that the system was wrong when it led the user to believe that activity sharing was easy and reliable
It is also good because it is easy to reason about locally and because different areas of our software can be incrementally and independently changed to optimize it.
One important limitation of this metric is that it tends to devalue innovation and learning insofar as it places no innate value on changes to the set of use cases supported by the system or the user.
- (NB: SJ feels that this claim is inaccurate because (to paraphrase) he feels that I am willfully neglecting the existence of use cases like system exploration user experiences which specifically optimize the metric.)
Question: Can someone please encapsulate our remaining notions of software quality in an equally concise and actionable principle?
"How": Several Paths Forward
So far, I've argued that we can reliably deliver "progress" at higher throughput and lower latency by making smaller and faster releases (in particular by reducing the size of the critical sections in the release process starting with the time required to perform QA) and I've suggested one concise standard against which we might agree to judge claims that "progress" has been made.
I have not yet succeeded in articulating positive claims about what we should work on in order to satisfy these goals; fortunately, Greg Smith has outlined an algorithm for deciding this question in some detail.
Perhaps we should simply
- begin executing his algorithm in order to write this section,
- prepare our first "incremental" (rather than "omnibus") release with, say, Tomeu's "faster" work, and
- try queue up some more release-worthy changes by closing bugs deemed important according to the "convergence" standard I outlined in §4?
Afterword: Strategy or Tactics?
The analysis presented above is the best I have yet developed but I am somewhat uncomfortable with the quality of the information that justifies it and with the dearth of long-term strategic insight that it contains - I regard it as an essentially tactical "survival" plan that takes us a few steps in the right direction but which fails to change the balance of power between our resources and the obstacles which face us. In particular, I am unpersuaded that it addresses the long-term sustainability goals of our (software) enterprise, by, for example, teaching our users that they too bear responsibility for creating the software they need and for teaching others what they have learned.