Activity testing automation

From OLPC
Jump to: navigation, search
  This page is part of the OLPC Community testing Project. How to test an Activity | Reporting test results | Meetings
XO Checkbox

The Plan

We still need to determine the scope of work and effort that each step will take. Investigation volunteers? Mchua 05:56, 13 November 2008 (UTC)

Step 1: We can write test scripts.

Make a python library for testing, which can be run externally against an Activity and return a True/False value as to whether it passed. This may be a resurrection/readaptation of Sugarbot.

For a tester or developer, the procedure to run a test on Foo Activity might look like this:

  1. Download Foo Activity (let's say the files for this are inside a folder called Foo.Activity)
  2. Create the folder Foo.Activity/tests
  3. Create the file Foo.Activity/tests/some_test.py (by following yet-to-be-written instructions for usage of this python library)
  4. Run 'python foo_test.py'
  5. Observe results

Step 2: Test repository

  • Make a central repository for such tests.
  • It's still unclear whether tests will be in a separate branch, whether they'll be included inside the Activity they refer to, or something else. When you download an Activity, should that also download the tests for that Activity? This is an open question.

Step 3: Automation

  • Automate the running of all the tests from Step 2 on an XO somewhere, maintained by somebody.
  • In practice, this probably means "at 1cc, maintained by Mchua," for starters - but it doesn't have to.
  • Make the running of all a tests for an Activity trigger when a new version of that Activity is released.

Step 4: Replication

Make that "somewhere, maintained by somebody" XO able to be set up by anybody, anywhere. This likely involves a resurrection of Tinderbox.

Transcripts

Brainstormed ideas list

* less than 1hr setup required for testers
* less than 1hr setup required for developers
* centrally administered VM or box that testers/devs post scripts to
* new Activities can be made compatible with testing tool by their
developers in <1hr
* central repository for tests, separate from (but clearly linked to)
the repo for that Activity
* process for translating natural language bug reports into Activity
test scripts
* all bugs submitted to developers to fix have such test scripts attached
(test-driven development)
* tests should live in the projects source repo
* all automated scripts for an Activity (from fixed bugs or otherwise)
are automatically run against new builds of that Activity
* pass/fail for Activity tests conveyed to the maintainer immediately
* interface for users to be about like etoys or turtleart.
* first phase of capturing bug data should be able to be carried out by
the average 10-year-old
* should catch memory leaks
* should test i18n
* Activities be drivable without modification
* easily extensible - Activity test automation scripts should be in
a format that's completely specified and published, so that automated
tools could also later generate said test scripts
* inspired by openqa's selenium and mozilla's litmus
* have XML output like openqa's Selenium
* tests need to be aimed at correct releases
* tests need to be available for old releases for historical purposes
* tests need to stay in repor for effectively forever.
* tests should be separately downloadable and locally runnable outside of
the centralized tinderbox-ish setup so that those who wish to experiment
with local changes before pushing to central repo can
* testsuites can be targeted to releases or familys (test these for 8.2.x)
* developers doing true test driven development by having test frameworks
in the code itself...
* test framework should be open-source
* The harness has to run on a single XO as well as on our virtual behemoth
* guidelines on how devs can make their code "more testable" (this
sentence full of vagaries, not sure how/who can do this)
* different testsuites for full integration and for each tiny dev cycle ..
* developers should be able to opt-out of having their code tested,
*but* know that all shipped-with-XOs code *must* be tested
* Ponies for everyone!

IRC

<mchua> dah, sorry. getting water took longer than I thought.
<mchua> Ok, so brainstorm time... finding our charter... one sec...
* mchua notes that others are welcome to participate as well
* robertofaga (n=robertof@201-43-56-79.dsl.telesp.net.br) has joined #olpc
* aa has quit (Read error: 110 (Connection timed out))
<mchua> Dream up automation designs to present to group next week. "I'm a
tester. I want to automate this boring thing. What is my ideal interface
to do so / the most beautiful tool I could imagine for it?"
<garycmartin> mchua: sugarbot
<mchua> garycmartin: what makes sugarbot so beautiful?
<garycmartin> mchua: I've emailed zach to see if it runs on XOs, but the
downside seems to be it's been more degigned as an infrastructure tool
(jubuild and buildbot).
<mchua> http://code.google.com/p/sugarbot/
<mchua> http://code.google.com/p/sugarbot/wiki/HowDoesSugarbotWork has
a usage example
<mchua> qualities of sugarbot that I really like:
<garycmartin> mchua: we write test cases for a bunch of the bugs found
as sugarbot scripts. That makes us define them well, and test when fixed.
<mchua> * it's a completely separate thing from the Activity code itself,
and from Sugar
<mchua> garycmartin: in my ideal world, a bug report would not go to
developers for fixin' without having a test attached that fails due to
the bug, and will pass when the bug is resolved.
<mchua> (in my very, very ideal world.)
<garycmartin> mchua: We then just need to focus on collecting natural
language type reports and turning them into scripts, that over time
hopefully cover a good chunk of test issues/cases.
<mchua> garycmartin: agreed! what kinds of things and tools - and
qualities/features of those things and tools - do we want?
* dsaxena (n=dsaxena@c-67-160-162-157.hsd1.or.comcast.net) has joined
#olpc
<garycmartin> mchua: Your world is truly a perfect sphere ;-)
<mchua> garycmartin, adricnet: do you want to just take 10 minutes and
type out wishlists as fast as possible, and then review?
<mchua> the brainstorming "quality < quantity" thing
<mchua> garycmartin: yeah, I know. :) I like to start with a platonic
ideal, and then figure out how close to it I can get.
<adricnet> Maybe, I was looking for a linka nd got pulled away. If the
levers and button in Sugarbot are solid enough it sound pretty ideal..
* ivazquez1 (n=ivazquez@fedora/ignacio) has joined #olpc
<garycmartin> mchua: usually ends up 'oblate spheroid' when all said
and done :-)
<mchua> adricnet: I think Sugarbot is one possible (and very attractive)
option for Activity test automation - the fact that I can't figure out
how to generate and run a test with it in <15min, starting cold, means
that there's work to be done yet
* vpovirk` (n=urk@c-76-17-237-120.hsd1.mn.comcast.net) has joined #olpc
* vpovirk has quit (Read error: 113 (No route to host))
<mchua> I'll start spewing out my wishes, with a 10min timer set (I may
run out before them)
<mchua> these are wishes for an Activity testing framework/procedure,
and will get pulled into sanity-land later ;)
<mchua> * less than 1hr setup required for testers
<mchua> * less than 1hr setup required for developers
<garycmartin> mchua: sugarbot is better as a centraly admined VM or box
that we would post scripts to.
<mchua> * new Activities made compatible with testing tool also in < 1hr
<mchua> garycmartin: like [[Tinderbox]] in hypothetical?
<garycmartin> mchua: yep
<mchua> * central repository for tests, separate from (but clearly linked
to) the repo for that Activity
* dwmw2 is now known as dwmw2_gone
* J5 has quit (Read error: 110 (Connection timed out))
<mchua> * process for translating natural language bug reports into
Activity test scripts
<mchua> * all bugs submitted to developers to fix have such test scripts
attached (test-driven development)
<adricnet> this suggests that the tests should live in the projects
source repo, as $deity intended ..
<mchua> * all automated scripts for an Activity (from fixed bugs or
otherwise) are automatically run against new builds of that Activity,
with a pass/fail display conveyed to the maintainer immediately
<adricnet> I'd like the Sugarbot interface for users to be about like
etoys or turtleart.
* ivazquez has quit (Operation timed out)
* vpovirk` has quit (Remote closed the connection)
<mchua> adricnet: awesome ideas, keep going!
<mchua> garycmartin: I'm trying to rephrase yours into feature
requests (which may or may not already be implemented in
sugarbot/tinderbox/otherwise), but I'm sure you have a lot more good ideas
* ivazquez1 is now known as ivazquez
<mchua> * should be usable by the average 10-year-old
<mchua> (to report the first pass of the Activity bug, not necessarily
to finish the automation script)
<garycmartin> mchua: sugarbot works through X as far as I can tell,
so all activities 'should' be drivable without modification
<mchua> * easily extensible - Activity test automation scripts should be
in a format that's completely specified and published, so that automated
tools could also later generate said test scripts
<adricnet> openqa's selenium and mozilla's litmus
<adricnet> sugarbot needs to have XML output like Selenium seems to..
* vpovirk (n=urk@c-76-17-237-120.hsd1.mn.comcast.net) has joined #olpc
<mchua> garycmartin: it doesn't work through X, iirc; it uses XML-RPC
which calls the functions of the Activity itself (not through the GUI,
I believe)
<garycmartin> * tests need to be aimed at correct releases (do we keep
old tests for supporting old deployments not yet upgraded?)
<mchua> garycmartin: I'd need to read the code more to confirm that
too, though
<mchua> * tests need to be available for old releases for historical
purposes
<garycmartin> mchua: me too by the sounds of it :-)
<adricnet> tests need to stay in repor for effectively forever.
<mchua> * tests should be separately downloadable and locally runnable
outside of the centralized tinderbox-ish setup so that those who wish
to experiment with local changes before pushing to central repo can
<adricnet> although testsuites can be targeted to releases or familys
(test these for 8.2.x)
<adricnet> mchua:  ++
<morgs> I'd love to see us (developers) doing true test driven development
by having test frameworks in the code itself...
<mchua> oh! also, test framework should be open-source...
<adricnet> The harness has to run on a single XO as well as on our
virtual mehemoth
<mchua> i.e. *no* proprietary anything should be required to do testing
or development for an Activity
<adricnet> pre-depends OSS/Free software and data, eyah
<garycmartin> morgs: is that a war I can hear starting... ;-)
<mchua> morgs: <3
<adricnet> morgs: They opt to have the option. Later, we break out
the stick.
<morgs> It has to be a really big stick :)
<mchua> * guidelines on how devs can make their code "more testable"
(this sentence full of vagaries, not sure how/who can do this)
<garycmartin> morgs: (test driven developent can be very slow, painful,
and boring for devs)
<mchua> ding ding ding! my timer says 10 minutes, do we want to keep
spewing ideas for another 5? we have 40 minutes left in this brainstorm
<morgs> garycmartin: yes. So can weeks of lost time tracking down
unnoticed regressions :)
<garycmartin> morgs: (well get even less devs scratching that itch...)
<adricnet> different testsuites for full integration and for each tiny
dev cycle ..
* morgs isn't ready to scratch that itch, so it will lie until somebody
actually makes it happen
<adricnet> mchua: Seems like we ahve enough to start arguing about
* vpovirk has quit (Remote closed the connection)
<mchua> * developers should be able to opt-out of having their code
tested, *but* know that all shipped-with-XOs code *must* be tested
<adricnet> Errr .. g'luck on that one..
<mchua> adricnet: yeah, it's a wishlist :)
<mchua> for that matter
<mchua> * a ponoy
<mchua> er,
<mchua> * pony
<adricnet> Ponies for everyone!
<adricnet> And cash. Yay cash!
* GoatCheezWork (n=Miranda@rrcs-97-76-61-66.se.biz.rr.com) has joined
#olpc
<garycmartin> morgs: I know, swings and roundabouts. Do you like a
life of grey coding bordom, or 2 months of joy followed by 2 months of
hell. Some where in the middle is a likely sweet spot :-)
<mchua> adricnet, garycmartin, morgs - I'm gonna clean this up into a list
of ideas and pastebin so it's easier to read, can you guys elucidate on
the current state of what it's like to test Activities from the tester/dev
perspective in the meantime? (should take me 5min or less)
<mchua> (particularly interesting: "My god, it's full of pain!" areas,
and "I love this part of the process" things.)
* vpovirk (n=urk@c-76-17-237-120.hsd1.mn.comcast.net) has joined #olpc
<adricnet> hmm .. some want to name a victim activity
<garycmartin> mchua: with my Moon dev hat on, I test it all through
before each release, but that's because it's simple and I can use all
the possible inputs.
<adricnet> For Capture there's trying to aim the screen at the cat..
<kevix> mel, is this wishlist going on w.l.o or somewhere else?
<garycmartin> mchua: I make sure I both resume and clean start through
all it's view modes.
<garycmartin> mchua: I make sure I've tested in the primary languages
to make sure translation sctrings all come through.
<mchua> kevix: yes - at the end of this we should decide where everything
is getting published
dogmeat danjared dbagnall_ dgilmore dirkx dmead dsaxena dwmw2_gone
* vpovirk has quit (Client Quit)
<mchua> (aside from on the testing mailing list - kevix, are you
subscribed?)
<garycmartin> mchua: I leave it running for long durations watching both
memory and cpu (for leeks or hogging).
* mchua adds * should catch memory leaks and * should test i18n to
wishlist
<kevix> no. I get the mails from the bug list. not from testing-*-*
<adricnet> Lol. *wishes really hard*
<garycmartin> mchua: I look at other sources of information to make sure
it's not telling me fibs.
<garycmartin> mchua: Oh darn, I'm beeping. Sorry all have to go now
(hard stop for me).
* vpovirk (n=urk@c-76-17-237-120.hsd1.mn.comcast.net) has joined #olpc
<kevix> bye, gary.
<garycmartin> kevix: bye!
* garycmartin has quit (Remote closed the connection)
<mchua> thanks, garycmartin!
<mchua> dah, too late
<mchua> pastebin!
<mchua> kevix, adricnet: http://pastebin.ca/1254589
<mchua> referring to these by line number... which do you think are (1)
highest priority, (2) unrealistic, and (3) already done?
<kevix> snarking the URL
<adricnet> If someone could tell me the state of Sugarbot today ...
* rgs_ (n=rgs@190.128.250.238) has joined #olpc
<mchua> My 'highest priority' list: 27, 18, 5, 11, 8
<mchua> adricnet: http://code.google.com/p/sugarbot/ is all we have,
unless we can track down the project owners (no luck so far)
<mchua> adricnet: it looks like no work has been done on it since
september 8, 2008
<mchua> my 'unrealistic' list: 22 (in that we shouldn't write test
scripts for old and no-longer-relevant releases; that's a lot of burden
- we should keep all the scripts we write, though, so when the current
builds become no-longer-relevant they'll still have their associated
tests with them), and 31-22 are... probably... yeah.
<mchua> I don't think any of these are completely implemented, to my
knowledge (where "implemented" == "in widespread use by the olpc dev/test
community") but [[Tinderbox]] and [[Sugarbot]] have some elements of
these requests in them
<adricnet> What does (a) OLPC Tinderbox setup test for?
<adricnet> Mozilla tiderbox is more useful when there is compliation?
<mchua> adricnet: OLPC-tinderbox is afaik currently not maintained
<mchua> adricnet: the Big Cool Thing it does is that it takes hw
measurements as well
<adricnet> mchua: Roger. do we know what it did?
<adricnet> "hw measurements" ?
<mchua> (there's an XO at 1cc that has tiny voltage probes sticking out
of it for measuring stuff like power consumption for the different builds)
<mchua> adricnet: Not... very well. I mean, we can always read the
code. http://dev.laptop.org/git/projects/tinderbox
* bjordan (n=bjordan@cpc2-hitc2-0-0-cust908.lutn.cable.ntl.com) has
joined #olpc
<mchua> adricnet: this is something I'm supposed to clean up and work on
<adricnet> High voltage!
<adricnet> sorry, random Dolby moment.
<kevix> so activities can be tested by the developers framework and by
an OLPC framework
<mchua> adricnet: ...ostensibly after the g1g1 crazy washes over
<mchua> kevix: what would the difference between the two be?
* robertofaga has quit (Read error: 60 (Operation timed out))
<adricnet> mchua: Sure. Afaik, tbox would be great for does ita ll still
build, but I dunno if it does functional testing
* ctyler has quit ("returning to Spare Oom")
<adricnet> Well, there's unit tests, functional tests, and QA ... they
should all at least share some trade languages
<adricnet> I _think_ the comm testting for activities is going to be QA,
which hopefully will come up with some unit/functional tests to feed
the devs..
* ctyler (n=chris@global.proximity.on.ca) has joined #olpc
<mchua> adricnet: tbox does really, really minimal func testing
<mchua> adricnet: is my understanding
<mchua> adricnet: maybe a better way of putting it... *rummages for words*
<morgs> tinderbox seems to basically test that things boot up and
start up.
<mchua> adricnet: "Whenever a new build goes out, tinderbox runs a certain
(python) script to be automatically run on an individual XO that's hooked
up in 1cc."
<mchua> This script (afaik) currently tests whether the build loads,
whether Activities start, and also logs some power measurements
(and... possibly makes sure those measurements fall within a certain
numerical range.)
<adricnet> Ah, kk
<adricnet> That about syncs up with what I was thinking, cool. And this
will need to be ressurrected, later. Cool.
* morgs -> $HOME
<mchua> The script currently running on 1cc-tinderbox can be modified. It
is also possible (but difficult, right now) for others to set up their
own tinderbox machines, and run the same (or different) scripts on them.
<mchua> adricnet: Yeah, and I don't yet have a good view of the scope
of work required for that resurrection.
<mchua> (Alas.)
<mchua> adricnet: since we have 12 min to wrap up, how does this sound,
in order of implementation?
<mchua> 1) make a python library for testing, which can be run externally
against an Activity and return a True/False value as to whether it passed
(possibly/probably resurrecting sugarbot code or design)
<mchua> (this would be separate python files, and import
sugarbot-or-something-like-it, and import the Activity as specified in
some filename in the test-python-file code, and run, and return True
or False.)
<adricnet> Right. Might want to start with Py Test::Unit stuffs
<mchua> Yep.
<mchua> So for a tester or developer, the procedure to run a test would
look like this
<mchua> * download foo_test.py
<mchua> * open Foo.Activity folder
<marcopg> I missed all of this meeting...
<mchua> * throw foo_test.py into Foo.Activity/tests
<mchua> * python foo_test.py
<mchua> * observe results
<marcopg> something that I would like to see, is these scripts to not
be tinderbox specific
<marcopg> I'd like to run them also on SL buildbot
<marcopg> (sorry to interrupt your attempt to summarize!)
<mchua> marcopg: not at all!
<adricnet> marcopg: we're dreaming up a harness interface that should
run everwhere, yes :)
<marcopg> adricnet: great!
<mchua> marcopg: ooh, I should ask you about buildbot in about 6
minutes ;)
<marcopg> :)
<mchua> adricnet: after that, 2) would be "make a central repository
for such tests"
<adricnet> that's tricky .. but yeah they have to be kept somewhere
<mchua> adricnet: and then 3) automate the running of all the tests in
(2) on an XO...somewhere... maintained by somebody....
<marcopg> central repository or per activity?
<mchua> adricnet: and then 4) make that "somewhere, maintained by
somebody" XO able to be set up by anybody, anywhere (this probably is the
"resurrect tinderbox" part)
<marcopg> that's something we discussed with zach and I'm not sure what
is better
<adricnet> well yes on three where someone is 1..x people and somewhere
is 1..x XO
<adricnet> Yeah, the virtual thingy ..
<mchua> marcopg: btw, do you know how to get in touch with zach? (or
titus, or grig?) They probably have figured much of this out already
<adricnet> marcopg: It's up for argument .. should these QA level tests
be in repo with the software or all live together somewhere?
<marcopg> mchua: you mean other than sending them mail? ;)
<mchua> adricnet: maybe another way of rephrasing that question is
"when you download an Activity, should that also download the tests for
that Activity?"
<mchua> marcopg: yeah.
<marcopg> adricnet: right, I don't have an answer unfortunately :)
<adricnet> Need to clarify our terms, but yes
<marcopg> mchua: nope, but I sent them mail and they have been responsive
usually
* shenki has quit (Read error: 104 (Connection reset by peer))
<mchua> marcopg: ah ok, I'll try again. maybe it's my mail acting up
(it has been, lately. I'm not sure why.)
<marcopg> if you post about the plans somewhere
<mchua> adricnet: whoo. almost at time for today - anything else?
<marcopg> I can have a look too
<adricnet> l.o mail was down yesterday?
<marcopg> I thought a *little* bit about the issues already
<marcopg> and worked some on infrastructure (buildbot only)
<mchua> I think I have a much better idea of what I want from an Activity
test framework, at least, so this has been helpful to me
<adricnet> Yay helpful.
<mchua> adricnet: anything we should do to make this more adric-helpful,
too?
<mchua> (I'm planning on writing up the notes/plans on the wiki, mailing
to the testing list, asking people to shoot at it)
<adricnet> mchua: Not yet. Need to have some examples of these Comm
Testing tests so that we can argue about format and where to keep them
<mchua> adricnet: Aye, right. Concrete examples, working code...
<mchua> I think there's enough constraints in the design doc that we'll
get from this discussion to toss out a couple prototypes, though
<adricnet> mchua: Well, complete-ish PoCs at least
<marcopg> are you thinking to write your custom scripts? or to base on
existing frameworks?
<adricnet> prototypes, yeah
<mchua> adricnet: cool, then we're done, I think
<mchua> adricnet: thank you!
<mchua> marcopg: base on existing, whenever possible
<mchua> marcopg: I am a lazy bum ;)
<marcopg> heh
<adricnet> Laziness is a virtue
<marcopg> there is sugarbot
<marcopg> and also another one which I can't remember right now mmm
<marcopg> (both for gtk)
<adricnet> Selenium ?
<adricnet> Oh, notthat low-level oops
<marcopg> oh dogtail
<mchua> ah! I think you sent me dogtail
<mchua> I haven't gotten a chance to look at it, yet - been working on
remote testing scripts (...incredibly slowly, alas)
<marcopg> yeah me neither...
<marcopg> would be nice to compare sugarbot and dogtail
<mchua> it sure would.
<marcopg> would be also useful to just ask zach about it
<marcopg> perhaps he looked into dogtail before doing his own thing
<mchua> Ooh, that's a great question to ask zach.
<marcopg> zach was supposed to integrate sugarbot into jhbuild btw
<mchua> marcopg: btw, do you have this channel logged, or do you want
me to send you the log from the start of the brainstorm?
<marcopg> but we haven't heard anything from him about ti
<marcopg> busy with school I guess :/
* kevix has quit (Read error: 110 (Connection timed out))
<marcopg> mchua: don't think I have all of it, logs would be great