Community testing meetings/2008-11-06/Prioritizing activities to test
There is some distinction between building the tools for enabling meaningful community testing and efforts to build *community* for community testing, but they are inter-related efforts and done well, these are mutually reinforcing.
Test Activities we ship first
I'm not 100% sure that a whole lot of time thinking it through will yield significantly better results than taking list of activities in images used for large deployments (including both countries and G1G1) and ordering those by which activity author steps up first. Ideally all activities hosted on dev get tested at some point, I don't know that there is a lot of meaningful distinction in which order they get their "first testing".
After a first round, events that would merit a second round of testing would include significant version change (e.g. rebasing on a changed upstream element abiword, xulrunner, etc.), significant build change or new inclusion by a significant deployment e.g. (Conozco uruguay). Ideally community testing is something tha a new user can just jump into solo (after the hard work of building test cases and infrastructure is done), but there are distinct advantages of scheduled joint cooperative activity on that inital round of test case creation and testing. Activity author involvement seems critical to me during this phase.
Gathering user data
If there is indeed some interest in gathering real data about usage from users, I do not think any covert means could or should be employed, on the otherhand, there is theoretically a "Poll Builder" activity that could possibly be leveraged to this end. A "customer survey" using it would also serve the purpose of providing a demonstrable use case for this activity to users.
Classes and weightings
Another method of prioritizing (for first round) would be to define certain classes and weightings and assign semiquantitative scores. I will assign weightings somewhat arbitrarily for discussion purposes.Start with a semi-arbitrary scoring scheme and adjust intutively until it gives sensible ranking. Add or delete classes as needed to a list like this, don't get hung up on the numbers themselves, their only meaning in roughly assigning weighting to factors that would otherwise be entirely subjective.
Deployment (0-30) Higher score for use by larger (or multiple) deployments (include G1G1 as single deployment)
Target user sophistication (0-20) Higher score for less sophisticated (younger) user on basis that they may be less fault-tolerant
Educational focus (0-20) Higher score for more educational activities
Activity maturity (0-10) Higher score for higher version number, reward authors who produce revisions
Localized (+5) Increase weighting for activities with translations
Brand new activity bonus (+5) New activities (version 1) probably need scrutiny
Sharing bonus (+5) Reward author effort to leverage sharing/collaboration features
Upstream bonus (+1) Increase weighting for activities leveraging upstream development
Hardware (+1) Increase weighting for activities taking advantage of XO hardware features
Fudge factor (0-3)
From 1cc's internal QA meeting on 2008-11-05
There were some differing opinions on this from OLPC's internal QA staff - it was agreed, however, that the decision was totally up to the community test group.
- Volunteers should test activities we don't test at 1cc. We test the behavior of Activities in a particular system situation.
- Volunteers should focus on testing activities that OLPC directly works on because we consider it core functionality, like Browse and Write, because we have more direct contact with the developers that make them. Browse, Write, Record, and Chat.
- Connect to jabber.laptop.org [so you can test Activity behavior independent of the mesh network] because it tests both collaboration and the XS.
- People with one laptop can create test cases and help write up the 'documentation' of how an activity works.
From Sameer Verma
Here it is. It saw a bunch of traffic on one of the lists...some criticism, some "whaaa..." type responses. I did have a good chat with Walter about it though.
Let me know what you think. I think moving in a direction of weighted scoring is generally good because it forces us to get a more stratified input as opposed to "Yay!"