Community testing meetings/2008-11-06/Displaying testing metrics in motivating ways

From OLPC
< Community testing meetings‎ | 2008-11-06
Revision as of 17:31, 6 November 2008 by Cjl (talk | contribs) (typo)
Jump to: navigation, search

From Cjl

Semantic mediawiki

1) Semantic media wiki forms of some sort will be used to gather results from testers running test cases.

There is a certain level of comfort with using SMW on the OLPC wiki. It is very public facing, which is good for gathering community input. In a hand-wavy way (glossing over technical details) it satisfies some of the requirements of "gathering many individual elements (test cases, results) and presenting them in a summarized for" that one might otherwise achieve with a database application. In particular, it provides some affordance for the semi-automated summarization and presentation of those results in multiple forms , which offers the possibility of using (and reusing) such metrics to satisfy the interests of multiple stakeholders (OLPC mgmt, activity authors, activity testers, activity users).

Format of test data

2) The exact form of these test results is not entirely clear at present, but at a minimum let's assume that we gather the following data as as minmal subset.

Test X run by Tester T of Activity A, Version V on Build B with result R.

For the present. lets assume that R is a simple pass/fail result for Test case X.

An activity may have multiple test cases, but there is an expressed preference for short 5-10 minute scripts for test cases to a) improve chances someone will bother to run the case, and b) reduce chances that problems encountered during testing in fact are related to other factors that may not be reproducible (aim for high test-retest reliability of the test itself).

Stakeholders: OLPC management

3) There are a variety of ways in which such metrics can be used to satisfy the interests of various stakeholders.

3a) From the point-of-view of OLPC mgmt. having some record of activity testing builds confidence in recommending deployment of an activity that is not "owned" or supported by OLPC / Sugar Labs. Very few activities will truly merit a direct OLPC involvement or maintenance (Journal, Terminal, a few can be classified by inclusion in Fructose), while at the same time, a "bad activity user experience" will nonetheless reflect on OLPC, and so there is an OLPC interest in supporting quality mechanisms for activities.

Stakeholders: Activity authors

3b) From the point-of-view of activity authors, these metrics could be an additional mechanism for feedback.

Of course trac exists, but this is developer-focused, not general public oriented and it doesn't collect statistically useful numbers of results and usually only negative information, only positive information on a resolution (ticket closure). If, hypothetically, activity works as designed, every time, there is unlikely to be a trac ticket on it. Consider the motivational aspects of a little badge on the activity's wiki page saying 75 passes, no fails, this badge to be automatically updated by SMW tabulation of results as they are submitted. Authors become more engaged in recruiting additional testing for their activities and in addressing community reported failures. We're not talking UA or Good Housekeeping, simple awareness-raising of testing (and the need for community input) may be it's most important feature.

Stakeholders: Activity testers

3c) From the point-of-view of activity testers it is essential for there to be a mechanism for recording the results of their testing. The metrics could be used motivationally as a means of recording their service to the community, consider how Wikipedians proudly self-report metrics on the number of edits or other community supporting activities they take part in, similarly for ticket tracking systems (RT or trac), metrics are motivational as a measure of contribution. Running a series of test cases is a means of collecting OLPC merit badges / karma for themselves, as their submission of their test results are tabulated to their credit. Much of testing will still be "itch-scratching" of most popular activities, but this would provide incentive for *recording* those results and maybe for trying a few others to scratch the community recognition itch.

Stakeholders: Activity users

3d) From the point-of-view of potential activity users, having some confidence (despite the usual disclaimers) that downloading and trying an activity will not be disappointing, or worse, disruptive.