User:Mstone/Commentaries/Releases 1

From OLPC
< User:Mstone
Revision as of 09:20, 11 April 2008 by 89.24.24.58 (talk) (Corrected my name)
Jump to navigation Jump to search

Chris Ball, Marco Pesenti Gritti, Michael Stone, Tomeu Vizoso, and several others conducted a 2-hour planning session this morning. I've created a transcript of that discussion, reproduced below. If you're interested, please review the questions that were raised and contribute your thoughts (preferably inline in this document).

The end goal of this effort is a convincing written statement of where we want to go in the next four months, why we want to go there, and how we intend to get there.

Questions

What are our goals for the next four months

  • Some advocate trying to "fix the network."
  • Some advocate trying to "stabilize the UI, particularly the Journal and the Datastore."
  • Some think we should deliver working power management, even if we can only do so in limited circumstances.

Should we divide our effort among several goals or not?

  • We could pour labor into our most critical goal until we reach the point of diminishing returns, then let the remainder spill over to the second goal, and so on.
  • Alternately, we can try to make sure that our top three goals all receive enough effort to achieve limited gains.
  • Alternately, we can try to make sure that all of our subsystems receive at least minimal maintenance.

How can we get better information with which to make our decision?

The team should go to a pilot site and see for themselves how the XO's and Sugar are being used. Their is a limit to how much information the deployment teams can communicate back to the developers. Direct customer/developer interaction is an essential aspect of agile software development. And we are all agile right? Berrybw 22:48, 10 April 2008 (EDT)


Annotated Transcript

Initial Unhappiness

 m_stone> I take it that you guys are concerned that you'll have sugar in a
          releasable state say around mid-June or early July? and that you'll
          be unable to make big changes for the remainder of the 1-1.5 months
          until the release?
   tomeu> well, I think the issue is not releasing something like last
          releases; some thing that more or less works but having a better API
          and based on components that are more predictable
   tomeu> "something that more or less works" == "something that passes the
          minimal QA work that has been done but that shows critical problems
          after being deployed" - we are not happy how new features have been
          rolled out


 m_stone> tomeu: keep explaining, please. I really want to understand what
          things look like from your vantage point. (incidentally, that sounds
          more like a QA failure rather than a software team failure to me.)


   tomeu> m_stone: building on an orphaned component like the DS is not very
          nice, either. well, project issues that needs to be solved - we
          cannot blame QA because it has been composed by interns
     cjb> and is currently composed of no-one :)
   tomeu> we already complained last summer about interns leaving just when
          they started being useful. Also, I'm not happy with the parts of the
          sugar API that I have contributed most, either




Present Unhappiness

 m_stone> tomeu: help me understand how this affects the work you think we
          should undertake to do for August 1 or the next release beyond.
   tomeu> m_stone: we want to have a software architecture more resilient to
          bugs, more maintainable, and want to give activity authors a better
          API. nothing of that are new features from the POV of the project
          manager, I think. just dedicate time to it ;). Then discuss, discuss,
          discuss, then implement, deploy.


   tomeu> so well, we have a problem deciding on what to spend time - every few
          months we are promised a new DS. the activity API depends on that so
          it's not evolving and really sucks because depends on important
          implementation things like storing entries with delta compression. so
          well, we would like to better plan things, and spend effort in
          invisible features


 m_stone> tomeu: cscott, cjb, and I spent several minutes yesterday debating
          whether cscott should try to fix networking things or DS things. (and
          consequently whether cjb should fix networking things or power
          things). (and whether the kernel folks should fix power things or
          ???). (and whether I should concentrate on collaboration or
          stability/compatibility things). we have two plans and we haven't
          figured out yet which one is better.


   tomeu> m_stone: right, that's also our problem. I was supposed to work in
          the journal. but had to choose between sugar, read, browse, ds, etc.
          at different points and performance


How should we allocate effort?

 m_stone> marcopg_: plan a) is cscott, cjb, & kernel folks on networking,
          mstone & collabora on collaboration, and sugar on ??? where ??? is
          probably the as yet unspecified "UI features that are already in the
          bag + any accessible stability". the other plan is cscott on DS, cjb
          on limited cases of power, kernel folks on power, mstone on ???, and
          sugar on ??? where ??? is the same as above.
     cjb> I think there's a third plan of cscott, mstone and collabora on
          networking, cjb and kernel on power, sugar on ???.
marcopg_> I'm not sure I understand why we are going through the two extremes
          about networking
marcopg_> we certainly need someone on it, but I doubt putting the whole team
          on it will help


 m_stone> I'll try to answer marcopg's question and I'll try to explain the
          other flaws here. first flaw: someone has to manage the release. at
          present, I'm the default candidate (though feel free to suggest
          others). that's not immediately a full time job. but it will turn
          into one and we can't forget it. also, effort will need to shift in
          the second half of the cycle toward making it happen.  i.e. the
          release manager will need to impose on erikos, or cjb, or... :)


 m_stone> second problem: there is lots of debate about how many people can
          profitably work on networking, on power, on collaboration, on UI, on
          "stability", or on "compatibility". in part, debate exists because
          there are different approaches to each of these problems.


Power Management

 m_stone> i.e. one way to do power is to make it work in limited circumstances.
          I think of this as the "low-hanging fruit" approach. It mainly
          requires cjb + some kernel, we think.
marcopg_> some kernel == not full time kernel person?
     cjb> Yeah.  I'd be surprised if we can't come up with an acceptable
          limited/incremental use of power that takes less than a day to
          implement.  I'll try to write something about this. For example, we
          don't do power management if you're currently connected to a jabber
          server, else we do.


 m_stone> the other way to deal with power (over a longer time frame) is to
          comprehend the networking stack (so that we know what timeouts &
          wakeups are needed), to fix the kernel to make those timeouts &
          wakeups occur, then to do more userland power management. similarly
          for networking.
     cjb> (There are still the timeouts.  It's hard to know how to tackle them,
          when no particular misbehavior has ever been reported due to them.)


Networking

 m_stone> one way to approach networking is to write down a believable
          networking architecture, then to follow through. (where a believable
          networking architecture would contain a small number of protocols
          that "stretch" to cover all of our present use cases). the other way
          is to try to follow the critical path toward one use case (e.g.
          reliable read sharing on a school server wifi with K/N laptops) and
          to fix every damn bug you find along the way until it works or until
          you recognize defeat. the latter might get you a single working use
          case (or maybe 3), but it's not going to get you reliability,
          stability, or leverage. (they also take different amounts of time)


Focus or Divide and Conquer?

   morgs> Are the approaches mutually exclusive from a resource point of view?
     cjb> morgs: my third approach does a fairly even split of three groups
          into three things. we can argue that this *sort* of approach has been
          the main problem with OLPC software development and has led to power
          management that doesn't really work, a mesh that doesn't really work,
          collaboration that doesn't really work, and a datastore that doesn't
          really work.
 m_stone> and, as cjb alluded, we presently feel like we have N demos rather
          than 1 solid product. some people believe this is because we sent N
          people in N+2 directions rather than either sending 10N people in N+2
          directions or sending N people in, say, 3 directions


marcopg_> I don't think that's matter of people; it's matter of too many
          ambitious goals


   tomeu> I don't think that having twice the number of people would have been
          a waste of resources - in many projects it is, but in our case...
 m_stone> marcopg_: the complete absence of a QA team definitely made itself
          felt here... (and the absence of software attacking the QA problem,
          e.g. test suites)
   tomeu> the absence of a DS maintainer, too


   tomeu> the problem is that it depends on what we aim to do and, regarding
          sugar, we just don't know right now... perhaps later today


marcopg_> let me rephrase: I don't think moving everyone to work on the network
          will help the network a lot, at least on 4 months schedule. gradually
          adding people to the current team would be much more sensible, ihmo


 m_stone> marcopg_: a) I think scott proposes to make sure that every aspect of
          networking has at least one person working on it; not to make
          everyone work on it regardless of whether they are doing something
          useful
 m_stone> marcopg_: b) why, precisely, do you anticipate that (a) would fail?
          (or would be too expensive to pursue?)


marcopg_> because it takes time for people to figure out how to work together
          in a productive way. if we have uncovered areas then we should cover
          them. which aspects of networking we are not covering right now?


   tomeu> m_stone: do you think that many of our orphaned or under-resourced
          areas can only be tackled efficiently for people already in the team?
          (in this case, without wasting most of the new resources' potential)
 m_stone> depends what you mean by "efficiently"; specifically: I think that we
          have individuals who have the experience required to 'begin labor' on
          many areas but I don't feel that they'll have the ability to close
          noticeable numbers of bugs


   tomeu> if we talk about how to distribute resources, we need to characterize
          every area as in how high is the entry point. some areas will require
          a longer effort to start being productive, some less


OLE Nepal: Journal & Datastore

 BryanWB> hey guys, if you dont' mind i have a few ideas of where OLPC should
          put its resources. I think OLPC should focus on the datastore and
          Sugar over the next serveral months, whether that means putting all
          or most of your human resources towards that. W/ the
          datastore/Journal being most important. the networking and power work
          well enough but it is fairly difficult to use the Journal to find
          stuff. Also, in my week's worth of QA, Sugar just does unexpected
          stuff. Over the last 2 weeks of working w/ os699, then 702, now 703,
          I have had the most problems w/ the Journal/datastore then Sugar
          weirdness.
marcopg_> I tend to agree with BryanWB, simply because the Journal is more a
          base/fundamental use case then network sharing; i.e. losing your work
          is more critical then not being able to share an activity


     cjb> I think my counter-argument would be that if you live somewhere with
          ample power, have a wireless AP and only one XO, of course you're
          going to say that networking and power work fine.  :)
 BryanWB> cjb: we are using an AP not active antenna
     cjb> BryanWB: Yes.  That's why I say it's natural that you would think it
          works fine. What *doesn't* work is the mesh (non-AP) layer.
 m_stone> cjb: rather, our choice of protocols and implementations is
          incompatible with the mesh. (I speak relationally because I suspect
          that we are more likely to change the protocols & implementations
          than we are to change the firmware)


 BryanWB> cjb: I know, but that doesn't work anyway when you've got 70 XO's at
          one school, AFAIK. mesh is less important than basic usability of
          datastore
   tomeu> BryanWB: that's another problem we have, knowing the very different
          problems of every deployment location. I think peru's next
          deployments will use mainly mesh, but I'm not sure.
 BryanWB> tomeu: they can buy a small AP and power backup for Peruvian schools.
          Power backup is pretty ubiquitous anywhere they have electricity and
          load-shedding. ** at least in south and east asia **  don't know
          south america
     cjb> BryanWB: what about when the kids go home from school, though?


  erikos> i think both are important use cases mesh and datastore and should
          both be addressed - i hope that for august we can do both - at least
          parts. we once started to provide mesh usability - and should
          continue to push for it; i think the statement that the datastore
          should work is out of question
     cjb> the problem is that no-one is confident we can get there by pushing,
          rather than by throwing out what we have and redesigning. I guess
          then we come to a question of whether the datastore can also be made
          to work properly incrementally.


 BryanWB> cjb: it needs to work at school first, that's where the other kids
          are and they spend 8+ hours per day. Home should be a lower priority
     cjb> since cscott's time is one of the main variables here, and he'd be
          doing olpcfs.
 m_stone> cjb: well, we might be able to have two people work on it. for
          example, one person could start implementing the testing!
  erikos> cjb: sure - i mean if we feel we have to throw something out - we
          might have to keep a beta solution in the hands as well. in the case
          of the datastore - if we decide we rewrite (i think that is what
          people feel) we should start now
     cjb> I think the short answer is that we shouldn't have to be making these
          decisions ourselves.  But if we write them up clearly enough, we can
          pass the decision up the chain.
   tomeu> having to choose between mesh and datastore is like choosing between
          drinking or eating - both are needed


Feedback from Peru?

 m_stone> so two questions: 1) what have the peruvian teachers discovered about
          703? carla was clearly upset about a mesh bug.
marcopg_> m_stone: that's something that the deployment team should figure out
          - we just don't have enough information at the moment to say it
 m_stone> marcopg_: in which case the priority should be to get that
          information, yes? or to change the decision so that it's no longer
          necessary.
marcopg_> m_stone: I think that should be the priority, yeah


What is "compatibility"?

 m_stone> marcopg_: you say the "compatibility" requirement is "unknown"?
marcopg_> m_stone: I don't know why it has been suddenly added to the list of
          top reqs. (I'm not against, I'd just like to understand better)


 m_stone> marcopg_: first, let's make sure we mean the same thing.
          compatibility means two things: 1) our software running on other
          people's platforms and 2) other people's software running on our
          platform.


 m_stone> we mean both of these, I think.
marcopg_> but there are a lot of possible approaches with pretty different end
          results and that's the main reason I want to know why it was pushed -
          to figure out solutions that meets the exact requirements
  bemasc> cscott specifically referred to #2, not #1


Why do we want it?

 m_stone> agreed that the reasoning behind making it a priority has not been
          explained - I'll try to articulate the justification. 1) we wish more
          people were supporting our platform. We thing that there are two ways
          to do this - first, to bring our software to them, and second, to
          give them an immediate userbase running on our platform and 2) we're
          concerned, from time to time, that the days of funded support for our
          platform are limited; therefore, we wish it to be less expensive to
          maintain.


marcopg_> how much limited? 1-2 years or 1-2 months? ;)
 m_stone> marcopg_: we've been greatly reassured on this point in recent weeks.
          (specifically by the fact that kim's budget was accepted). but I
          think that anxiety is a driving factor in the desire to increase
          compatibility. as for your question: "both, with different
          liklihoods"


   tomeu> we are having to guess too much, considering we have kids using the
          laptops in the thousands and adding anxiety to the guessing...
 m_stone> tomeu: I say the same thing all the time but I don't know what to do
          about it. the anxiety has greatly abated. we are now confident enough
          in survival to begin hiring.


Who wants it and how?

marcopg_> m_stone: do we expect countries to actually deploy machines with
          standard applications installed? and are we going to encourage or
          discourage it?
marcopg_> the degree of compatibility we need is also an important factor for
          the datastore work even though I think we will be forced to do
          *something* about the datastore situation by august
   tomeu> m_stone: which legacy software needs to be run in the laptops and
          which problems it has? if it's flash apps, then the compatibility
          stuff gains a totally different meaning if it's java, we may have
          different problems than APIs


 m_stone> marcopg_, tomeu: excellent questions which I do not have
          authoritative answers for.
marcopg_> m_stone: ok, that's a question we need to answer I think


How does it affect the DS (and other APIs)?

   tomeu> m_stone: I wonder why the compatibility thing is so tied to a new DS
 m_stone> tomeu: because of the desire to use the journal as the only
          navigation tool pointing to local persistent data
   tomeu> from my pov, the same mechanisms that olpcfs has for that, could be
          added on top of the current implementation. we could add a fuse
          thingy on top of the current DS
 m_stone> tomeu: I'm skeptical that the current DS would withstand that much
          stress...
marcopg_> yeah what m_stone said
   tomeu> m_stone: ok, I'm only concerned of mixing so many different issues
 m_stone> tomeu: it's a good concern. I have no good answers for you :(


   tomeu> we talk about compatibility, but do something because of performance
marcopg_> there is value in the idea of reusing the API though and to start by
          fixing the implementation
 m_stone> marcopg_: except for the fact that it's an API that we know is
          terrible.
     cjb> I don't think we should be too afraid about breaking APIs, if we have
          a principled reason for doing it.  It happens.


  bemasc> I don't think there is very much legacy software worth using
marcopg_> m_stone: it is, but we have tons of activities using it out there
 m_stone> marcopg_: not very many, actually.
marcopg_> so in some way we will probably need to keep supporting it


   tomeu> I'm perhaps the person who has suffered more because of the DS. I
          have been promised lots of DS replacements - let me count them... 4!
          Four times, I have been promised a DS, and have received only half!


Can we drop the current DS?

marcopg_> m_stone: do you think we can drop it and ask authors to port? (real
          question)
 m_stone> marcopg_: yes.
     cjb> And we can offer to update all activity code in our git repo to the
          new one.
     cjb> marcopg_: I think so too.
marcopg_> ok, I tend to agree. (I'm sure there will be people that don't
          though). but that's fine


 m_stone> marcopg_: I'm not sure that we should commit to doing much else. but
          I think we could be confident of achieving that thing.


   tomeu> I think there are cheap alternatives to breaking existing activities,
          worth discussing, at least
 m_stone> tomeu: I don't think there are cheap alternatives to breaking
          activities that will noticeably improve stability. I don't really
          believe it. But I'm happy to hear your thoughts.
marcopg_> yeah that's a good point


   tomeu> m_stone: using python namespaces, having daemons exporting two
          different dbus interfaces, etc
 m_stone> tomeu: what does that have to do with the causes of failure in the
          present system? (which I believe are primarily bitrot and invalid
          assumptions)
   tomeu> m_stone: was referring to maintain API compatibility, while revamping
          all of it


marcopg_> personally I think that if we are confident cscott can come up with
          something solid by august, then it's totally worth to have him focus
          on the ds
 m_stone> marcopg_: I'm very confident in him (modulo the risk that he gets
          sick, gets run over by a bus, gets "promoted", etc.) :) - recall
          olpc-update... once he understand a problem, he's quite difficult to
          stop.
marcopg_> then let's do it ;) and let's figure out how to properly resource
          network without Scott, it doesn't seem impossible
  erikos> m_stone: i think as well that it is worth to have at least one person
          focusing on the ds reimplementation


  bemasc> the high-level API will (read(), save(), etc.) will presumably remain
          unchanged, so most activities won't require any modification
   tomeu> bemasc: we have some problems with that API, I'm not sure yet how to
          solve them... well, we overcome those by redesigning, and use those
          techniques for old activities not breaking afterwards


 m_stone> marcopg_: let's see if we can get some better information from the
          deployment & sales folks first. even if it takes us a week to make a
          decision, it will be time well spent if it winds up being a better
          decision. many times over.
marcopg_> m_stone: I agree


In more detail, what do we want?

  bemasc> m_stone: it would be very nice to have a list of desired legacy-Linux
          programs. the only ones I know of are gnumeric and inkscape
 m_stone> bemasc: agreed. (though tomeu & marcopg rightly suggest that flash &
          java might also be good compatibility targets)
  bemasc> m_stone: well, flash doesn't write to disk, so it's immaterial.  It's
          essentially out of our hands
 m_stone> bemasc: totally false. we can decide to ask developers to work on
          gnash or not. rsavoye has repeatedly said that gnash is shorthanded.
          we've just never tried to actively assist him.


   tomeu> about gnumeric, embedding it like we do with abiword shouldn't be
          very complicated
marcopg_> tomeu: it would require to refactor the code heavily
   tomeu> marcopg_: yeah, but how many months would take it to you?
marcopg_> tomeu: that doesn't scale - we can't embed every possible application
 m_stone> tomeu: this would not improve compatibility though.


   tomeu> inkscape, had some memory usage problems that I don't know if have
          been solved yet
 m_stone> tomeu: scott apparently looked at inkscape when he was writing his
          icon editor - he thinks it would be quite hard to sugarize. (I don't
          fully understand why)


  bemasc> but for both gnumeric and inkscape, the datastore is still not the
          problem because both of them use the standard OS-provided open/save
          dialogs, which sugar can substitute for DS-based dialogs
   tomeu> marcopg_: I have heard gnumeric and inkscape, only
  bemasc> marcopg_: there really aren't very many applications
marcopg_> it's not much a problem of existing applications


   tomeu> marcopg_: we are getting back to the same point: we need to have
          someone from the ministries of education to tell us which legacy
          applications they feel the need for. we are struggling to provide
          features that maybe nobody will appreciate, while failing to provide
          the important ones


marcopg_> if I want to write a cool application for sugar, I know that it will
          be always confined to that platform and honestly I can see how it's
          not an appealing perspective. we have the possibility to change this
          situation without a lot of effort and so I think we should do it.
  bemasc> marcopg_: yes.  It's just a question of priority.
  homunq> marcopg_++ on should, no opinion on can


   tomeu> marcopg_: you, as a good software developer, will abstract the
          platform dependent code.
marcopg_> tomeu: don't design your platform for very good software developers
   tomeu> marcopg_: my point is not if we should do something or not, what I
          mean is that I'm not convinced of the urgency of this code-on-xo run-
          everywhere requirement. the current DS API is insufficient no matter
          what we decide. and we have already devised a scheme for legacy
          applications to store its files in the journal. where are we
          differing?


What compatibility might we try to provide?

marcopg_> I think it would make for a much better experience for developers
          which start working on sugar. it would put us in a better position in
          the case of a end-of-funding and it will make more likely that our
          applications are reused outside sugar. I think those are pretty
          strong reasons and I argue the effort is minimal.


  bemasc> tomeu: (I think we need to distinguish between legacy FD.o compliant
          applications and legacy non-FD.o-compliant)


   tomeu> marcopg_: we can make files dropped in ~/Documents to appear in the
          journal as well as possible. but those apps won't be able to give a
          so good experience as an activity. no matter what we do, "You created
          the document My Dog with the application Gimp" is not as good as "You
          drawed My Dog with Tim and Silvia".
marcopg_> tomeu: apps not specifically designed for sugar will never give you
          an as good experience as an activity, but that's fine
   tomeu> as I said, we agree on what needs to be done
marcopg_> then let's just do it :P


How should we try to get there?

   tomeu> and I think we agree in that the current DS API will need to be
          scraped, right? well, the problem is that nobody is discussing what I
          think needs to be discussed
marcopg_> tomeu: what do you think needs to be discussed... :)
   tomeu> marcopg_: what will be the new API(s)
marcopg_> my suggestion about that would be to look into scott prototype and
          build a prototype UI on the top of it. that would give us a much
          better base to discuss on. Scott is in charge of designing the API -
          the more feedback he gets, better the API will be. I'm not really a
          fan of a long abstract discussion of an API, without experimenting
          with it.
   tomeu> marcopg_: well, activity authors may be able to provide something
          more close to their real needs - I don't think we are at the point
          where activity authors feedback is useful, yet
marcopg_> when olpcfs is more mature, that will certainly be useful


Why is the current DS API unhappy?

 Blaketh> tomeu: Can you point me towards some failings of the current DS API?
  bemasc> activity authors are pretty much happy with the current high-level
          API, which is what almost all of them use. the only significant issue
          is an inability to save multiple files as a single entry
marcopg_> bemasc: well that's your opinion. I heard a bunch of complaints about
          the current API, and I've seen a bunch of confused developers
   tomeu> Blaketh: see http://wiki.laptop.org/go/DatastoreOpenIssues. being
          able to update a file without having to copy it around is one missing
          point.
   tomeu> bemasc: the issue here is API. right now, activity authors need to
          copy the whole file to a temp dir, modify it, and submit again to the
          DS. it's the same at the end, but would be good if activity authors
          weren't concerned about how the DS stores data.  the DS passes a path
          where the activity can read or write to so the DS should make the
          copy. after the activity is happy with it, commits to the DS, no
          matter if the activity is Read, VideoEditor, etc


  bemasc> anyone thinking about legacy compatibility should know about
          http://www.pygtk.org/docs/pygtk/class-gtkfilechooser.html - Sugar can
          provide its own filechooser, thus integrating any GTK application
          with the datastore without fancy filesystem footwork, and without
          ever showing paths to the user
   tomeu> bemasc: well, using the fs as a means for getting things into the
          journal is better than that, right?