Activity testing automation
Jump to navigation
Jump to search
This page is part of the OLPC Community testing Project. How to test an Activity | Reporting test results | Meetings
The Plan
We still need to determine the scope of work and effort that each step will take. Investigation volunteers? Mchua 05:56, 13 November 2008 (UTC)
Step 1: We can write test scripts.
Make a python library for testing, which can be run externally against an Activity and return a True/False value as to whether it passed. This may be a resurrection/readaptation of Sugarbot.
For a tester or developer, the procedure to run a test on Foo Activity might look like this:
- Download Foo Activity (let's say the files for this are inside a folder called Foo.Activity)
- Create the folder Foo.Activity/tests
- Create the file Foo.Activity/tests/some_test.py (by following yet-to-be-written instructions for usage of this python library)
- Run 'python foo_test.py'
- Observe results
Step 2: Test repository
- Make a central repository for such tests.
- It's still unclear whether tests will be in a separate branch, whether they'll be included inside the Activity they refer to, or something else. When you download an Activity, should that also download the tests for that Activity? This is an open question.
Step 3: Automation
- Automate the running of all the tests from Step 2 on an XO somewhere, maintained by somebody.
- In practice, this probably means "at 1cc, maintained by Mchua," for starters - but it doesn't have to.
- Make the running of all a tests for an Activity trigger when a new version of that Activity is released.
Step 4: Replication
Make that "somewhere, maintained by somebody" XO able to be set up by anybody, anywhere. This likely involves a resurrection of Tinderbox.
Transcripts
Brainstormed ideas list
* less than 1hr setup required for testers * less than 1hr setup required for developers * centrally administered VM or box that testers/devs post scripts to * new Activities can be made compatible with testing tool by their developers in <1hr * central repository for tests, separate from (but clearly linked to) the repo for that Activity * process for translating natural language bug reports into Activity test scripts * all bugs submitted to developers to fix have such test scripts attached (test-driven development) * tests should live in the projects source repo * all automated scripts for an Activity (from fixed bugs or otherwise) are automatically run against new builds of that Activity * pass/fail for Activity tests conveyed to the maintainer immediately * interface for users to be about like etoys or turtleart. * first phase of capturing bug data should be able to be carried out by the average 10-year-old * should catch memory leaks * should test i18n * Activities be drivable without modification * easily extensible - Activity test automation scripts should be in a format that's completely specified and published, so that automated tools could also later generate said test scripts * inspired by openqa's selenium and mozilla's litmus * have XML output like openqa's Selenium * tests need to be aimed at correct releases * tests need to be available for old releases for historical purposes * tests need to stay in repor for effectively forever. * tests should be separately downloadable and locally runnable outside of the centralized tinderbox-ish setup so that those who wish to experiment with local changes before pushing to central repo can * testsuites can be targeted to releases or familys (test these for 8.2.x) * developers doing true test driven development by having test frameworks in the code itself... * test framework should be open-source * The harness has to run on a single XO as well as on our virtual behemoth * guidelines on how devs can make their code "more testable" (this sentence full of vagaries, not sure how/who can do this) * different testsuites for full integration and for each tiny dev cycle .. * developers should be able to opt-out of having their code tested, *but* know that all shipped-with-XOs code *must* be tested * Ponies for everyone!
IRC
<mchua> dah, sorry. getting water took longer than I thought. <mchua> Ok, so brainstorm time... finding our charter... one sec... * mchua notes that others are welcome to participate as well * robertofaga (n=robertof@201-43-56-79.dsl.telesp.net.br) has joined #olpc * aa has quit (Read error: 110 (Connection timed out)) <mchua> Dream up automation designs to present to group next week. "I'm a tester. I want to automate this boring thing. What is my ideal interface to do so / the most beautiful tool I could imagine for it?" <garycmartin> mchua: sugarbot <mchua> garycmartin: what makes sugarbot so beautiful? <garycmartin> mchua: I've emailed zach to see if it runs on XOs, but the downside seems to be it's been more degigned as an infrastructure tool (jubuild and buildbot). <mchua> http://code.google.com/p/sugarbot/ <mchua> http://code.google.com/p/sugarbot/wiki/HowDoesSugarbotWork has a usage example <mchua> qualities of sugarbot that I really like: <garycmartin> mchua: we write test cases for a bunch of the bugs found as sugarbot scripts. That makes us define them well, and test when fixed. <mchua> * it's a completely separate thing from the Activity code itself, and from Sugar <mchua> garycmartin: in my ideal world, a bug report would not go to developers for fixin' without having a test attached that fails due to the bug, and will pass when the bug is resolved. <mchua> (in my very, very ideal world.) <garycmartin> mchua: We then just need to focus on collecting natural language type reports and turning them into scripts, that over time hopefully cover a good chunk of test issues/cases. <mchua> garycmartin: agreed! what kinds of things and tools - and qualities/features of those things and tools - do we want? * dsaxena (n=dsaxena@c-67-160-162-157.hsd1.or.comcast.net) has joined #olpc <garycmartin> mchua: Your world is truly a perfect sphere ;-) <mchua> garycmartin, adricnet: do you want to just take 10 minutes and type out wishlists as fast as possible, and then review? <mchua> the brainstorming "quality < quantity" thing <mchua> garycmartin: yeah, I know. :) I like to start with a platonic ideal, and then figure out how close to it I can get. <adricnet> Maybe, I was looking for a linka nd got pulled away. If the levers and button in Sugarbot are solid enough it sound pretty ideal.. * ivazquez1 (n=ivazquez@fedora/ignacio) has joined #olpc <garycmartin> mchua: usually ends up 'oblate spheroid' when all said and done :-) <mchua> adricnet: I think Sugarbot is one possible (and very attractive) option for Activity test automation - the fact that I can't figure out how to generate and run a test with it in <15min, starting cold, means that there's work to be done yet * vpovirk` (n=urk@c-76-17-237-120.hsd1.mn.comcast.net) has joined #olpc * vpovirk has quit (Read error: 113 (No route to host)) <mchua> I'll start spewing out my wishes, with a 10min timer set (I may run out before them) <mchua> these are wishes for an Activity testing framework/procedure, and will get pulled into sanity-land later ;) <mchua> * less than 1hr setup required for testers <mchua> * less than 1hr setup required for developers <garycmartin> mchua: sugarbot is better as a centraly admined VM or box that we would post scripts to. <mchua> * new Activities made compatible with testing tool also in < 1hr <mchua> garycmartin: like [[Tinderbox]] in hypothetical? <garycmartin> mchua: yep <mchua> * central repository for tests, separate from (but clearly linked to) the repo for that Activity * dwmw2 is now known as dwmw2_gone * J5 has quit (Read error: 110 (Connection timed out)) <mchua> * process for translating natural language bug reports into Activity test scripts <mchua> * all bugs submitted to developers to fix have such test scripts attached (test-driven development) <adricnet> this suggests that the tests should live in the projects source repo, as $deity intended .. <mchua> * all automated scripts for an Activity (from fixed bugs or otherwise) are automatically run against new builds of that Activity, with a pass/fail display conveyed to the maintainer immediately <adricnet> I'd like the Sugarbot interface for users to be about like etoys or turtleart. * ivazquez has quit (Operation timed out) * vpovirk` has quit (Remote closed the connection) <mchua> adricnet: awesome ideas, keep going! <mchua> garycmartin: I'm trying to rephrase yours into feature requests (which may or may not already be implemented in sugarbot/tinderbox/otherwise), but I'm sure you have a lot more good ideas * ivazquez1 is now known as ivazquez <mchua> * should be usable by the average 10-year-old <mchua> (to report the first pass of the Activity bug, not necessarily to finish the automation script) <garycmartin> mchua: sugarbot works through X as far as I can tell, so all activities 'should' be drivable without modification <mchua> * easily extensible - Activity test automation scripts should be in a format that's completely specified and published, so that automated tools could also later generate said test scripts <adricnet> openqa's selenium and mozilla's litmus <adricnet> sugarbot needs to have XML output like Selenium seems to.. * vpovirk (n=urk@c-76-17-237-120.hsd1.mn.comcast.net) has joined #olpc <mchua> garycmartin: it doesn't work through X, iirc; it uses XML-RPC which calls the functions of the Activity itself (not through the GUI, I believe) <garycmartin> * tests need to be aimed at correct releases (do we keep old tests for supporting old deployments not yet upgraded?) <mchua> garycmartin: I'd need to read the code more to confirm that too, though <mchua> * tests need to be available for old releases for historical purposes <garycmartin> mchua: me too by the sounds of it :-) <adricnet> tests need to stay in repor for effectively forever. <mchua> * tests should be separately downloadable and locally runnable outside of the centralized tinderbox-ish setup so that those who wish to experiment with local changes before pushing to central repo can <adricnet> although testsuites can be targeted to releases or familys (test these for 8.2.x) <adricnet> mchua: ++ <morgs> I'd love to see us (developers) doing true test driven development by having test frameworks in the code itself... <mchua> oh! also, test framework should be open-source... <adricnet> The harness has to run on a single XO as well as on our virtual mehemoth <mchua> i.e. *no* proprietary anything should be required to do testing or development for an Activity <adricnet> pre-depends OSS/Free software and data, eyah <garycmartin> morgs: is that a war I can hear starting... ;-) <mchua> morgs: <3 <adricnet> morgs: They opt to have the option. Later, we break out the stick. <morgs> It has to be a really big stick :) <mchua> * guidelines on how devs can make their code "more testable" (this sentence full of vagaries, not sure how/who can do this) <garycmartin> morgs: (test driven developent can be very slow, painful, and boring for devs) <mchua> ding ding ding! my timer says 10 minutes, do we want to keep spewing ideas for another 5? we have 40 minutes left in this brainstorm <morgs> garycmartin: yes. So can weeks of lost time tracking down unnoticed regressions :) <garycmartin> morgs: (well get even less devs scratching that itch...) <adricnet> different testsuites for full integration and for each tiny dev cycle .. * morgs isn't ready to scratch that itch, so it will lie until somebody actually makes it happen <adricnet> mchua: Seems like we ahve enough to start arguing about * vpovirk has quit (Remote closed the connection) <mchua> * developers should be able to opt-out of having their code tested, *but* know that all shipped-with-XOs code *must* be tested <adricnet> Errr .. g'luck on that one.. <mchua> adricnet: yeah, it's a wishlist :) <mchua> for that matter <mchua> * a ponoy <mchua> er, <mchua> * pony <adricnet> Ponies for everyone! <adricnet> And cash. Yay cash! * GoatCheezWork (n=Miranda@rrcs-97-76-61-66.se.biz.rr.com) has joined #olpc <garycmartin> morgs: I know, swings and roundabouts. Do you like a life of grey coding bordom, or 2 months of joy followed by 2 months of hell. Some where in the middle is a likely sweet spot :-) <mchua> adricnet, garycmartin, morgs - I'm gonna clean this up into a list of ideas and pastebin so it's easier to read, can you guys elucidate on the current state of what it's like to test Activities from the tester/dev perspective in the meantime? (should take me 5min or less) <mchua> (particularly interesting: "My god, it's full of pain!" areas, and "I love this part of the process" things.) * vpovirk (n=urk@c-76-17-237-120.hsd1.mn.comcast.net) has joined #olpc <adricnet> hmm .. some want to name a victim activity <garycmartin> mchua: with my Moon dev hat on, I test it all through before each release, but that's because it's simple and I can use all the possible inputs. <adricnet> For Capture there's trying to aim the screen at the cat.. <kevix> mel, is this wishlist going on w.l.o or somewhere else? <garycmartin> mchua: I make sure I both resume and clean start through all it's view modes. <garycmartin> mchua: I make sure I've tested in the primary languages to make sure translation sctrings all come through. <mchua> kevix: yes - at the end of this we should decide where everything is getting published dogmeat danjared dbagnall_ dgilmore dirkx dmead dsaxena dwmw2_gone * vpovirk has quit (Client Quit) <mchua> (aside from on the testing mailing list - kevix, are you subscribed?) <garycmartin> mchua: I leave it running for long durations watching both memory and cpu (for leeks or hogging). * mchua adds * should catch memory leaks and * should test i18n to wishlist <kevix> no. I get the mails from the bug list. not from testing-*-* <adricnet> Lol. *wishes really hard* <garycmartin> mchua: I look at other sources of information to make sure it's not telling me fibs. <garycmartin> mchua: Oh darn, I'm beeping. Sorry all have to go now (hard stop for me). * vpovirk (n=urk@c-76-17-237-120.hsd1.mn.comcast.net) has joined #olpc <kevix> bye, gary. <garycmartin> kevix: bye! * garycmartin has quit (Remote closed the connection) <mchua> thanks, garycmartin! <mchua> dah, too late <mchua> pastebin! <mchua> kevix, adricnet: http://pastebin.ca/1254589 <mchua> referring to these by line number... which do you think are (1) highest priority, (2) unrealistic, and (3) already done? <kevix> snarking the URL <adricnet> If someone could tell me the state of Sugarbot today ... * rgs_ (n=rgs@190.128.250.238) has joined #olpc <mchua> My 'highest priority' list: 27, 18, 5, 11, 8 <mchua> adricnet: http://code.google.com/p/sugarbot/ is all we have, unless we can track down the project owners (no luck so far) <mchua> adricnet: it looks like no work has been done on it since september 8, 2008 <mchua> my 'unrealistic' list: 22 (in that we shouldn't write test scripts for old and no-longer-relevant releases; that's a lot of burden - we should keep all the scripts we write, though, so when the current builds become no-longer-relevant they'll still have their associated tests with them), and 31-22 are... probably... yeah. <mchua> I don't think any of these are completely implemented, to my knowledge (where "implemented" == "in widespread use by the olpc dev/test community") but [[Tinderbox]] and [[Sugarbot]] have some elements of these requests in them <adricnet> What does (a) OLPC Tinderbox setup test for? <adricnet> Mozilla tiderbox is more useful when there is compliation? <mchua> adricnet: OLPC-tinderbox is afaik currently not maintained <mchua> adricnet: the Big Cool Thing it does is that it takes hw measurements as well <adricnet> mchua: Roger. do we know what it did? <adricnet> "hw measurements" ? <mchua> (there's an XO at 1cc that has tiny voltage probes sticking out of it for measuring stuff like power consumption for the different builds) <mchua> adricnet: Not... very well. I mean, we can always read the code. http://dev.laptop.org/git/projects/tinderbox * bjordan (n=bjordan@cpc2-hitc2-0-0-cust908.lutn.cable.ntl.com) has joined #olpc <mchua> adricnet: this is something I'm supposed to clean up and work on <adricnet> High voltage! <adricnet> sorry, random Dolby moment. <kevix> so activities can be tested by the developers framework and by an OLPC framework <mchua> adricnet: ...ostensibly after the g1g1 crazy washes over <mchua> kevix: what would the difference between the two be? * robertofaga has quit (Read error: 60 (Operation timed out)) <adricnet> mchua: Sure. Afaik, tbox would be great for does ita ll still build, but I dunno if it does functional testing * ctyler has quit ("returning to Spare Oom") <adricnet> Well, there's unit tests, functional tests, and QA ... they should all at least share some trade languages <adricnet> I _think_ the comm testting for activities is going to be QA, which hopefully will come up with some unit/functional tests to feed the devs.. * ctyler (n=chris@global.proximity.on.ca) has joined #olpc <mchua> adricnet: tbox does really, really minimal func testing <mchua> adricnet: is my understanding <mchua> adricnet: maybe a better way of putting it... *rummages for words* <morgs> tinderbox seems to basically test that things boot up and start up. <mchua> adricnet: "Whenever a new build goes out, tinderbox runs a certain (python) script to be automatically run on an individual XO that's hooked up in 1cc." <mchua> This script (afaik) currently tests whether the build loads, whether Activities start, and also logs some power measurements (and... possibly makes sure those measurements fall within a certain numerical range.) <adricnet> Ah, kk <adricnet> That about syncs up with what I was thinking, cool. And this will need to be ressurrected, later. Cool. * morgs -> $HOME <mchua> The script currently running on 1cc-tinderbox can be modified. It is also possible (but difficult, right now) for others to set up their own tinderbox machines, and run the same (or different) scripts on them. <mchua> adricnet: Yeah, and I don't yet have a good view of the scope of work required for that resurrection. <mchua> (Alas.) <mchua> adricnet: since we have 12 min to wrap up, how does this sound, in order of implementation? <mchua> 1) make a python library for testing, which can be run externally against an Activity and return a True/False value as to whether it passed (possibly/probably resurrecting sugarbot code or design) <mchua> (this would be separate python files, and import sugarbot-or-something-like-it, and import the Activity as specified in some filename in the test-python-file code, and run, and return True or False.) <adricnet> Right. Might want to start with Py Test::Unit stuffs <mchua> Yep. <mchua> So for a tester or developer, the procedure to run a test would look like this <mchua> * download foo_test.py <mchua> * open Foo.Activity folder <marcopg> I missed all of this meeting... <mchua> * throw foo_test.py into Foo.Activity/tests <mchua> * python foo_test.py <mchua> * observe results <marcopg> something that I would like to see, is these scripts to not be tinderbox specific <marcopg> I'd like to run them also on SL buildbot <marcopg> (sorry to interrupt your attempt to summarize!) <mchua> marcopg: not at all! <adricnet> marcopg: we're dreaming up a harness interface that should run everwhere, yes :) <marcopg> adricnet: great! <mchua> marcopg: ooh, I should ask you about buildbot in about 6 minutes ;) <marcopg> :) <mchua> adricnet: after that, 2) would be "make a central repository for such tests" <adricnet> that's tricky .. but yeah they have to be kept somewhere <mchua> adricnet: and then 3) automate the running of all the tests in (2) on an XO...somewhere... maintained by somebody.... <marcopg> central repository or per activity? <mchua> adricnet: and then 4) make that "somewhere, maintained by somebody" XO able to be set up by anybody, anywhere (this probably is the "resurrect tinderbox" part) <marcopg> that's something we discussed with zach and I'm not sure what is better <adricnet> well yes on three where someone is 1..x people and somewhere is 1..x XO <adricnet> Yeah, the virtual thingy .. <mchua> marcopg: btw, do you know how to get in touch with zach? (or titus, or grig?) They probably have figured much of this out already <adricnet> marcopg: It's up for argument .. should these QA level tests be in repo with the software or all live together somewhere? <marcopg> mchua: you mean other than sending them mail? ;) <mchua> adricnet: maybe another way of rephrasing that question is "when you download an Activity, should that also download the tests for that Activity?" <mchua> marcopg: yeah. <marcopg> adricnet: right, I don't have an answer unfortunately :) <adricnet> Need to clarify our terms, but yes <marcopg> mchua: nope, but I sent them mail and they have been responsive usually * shenki has quit (Read error: 104 (Connection reset by peer)) <mchua> marcopg: ah ok, I'll try again. maybe it's my mail acting up (it has been, lately. I'm not sure why.) <marcopg> if you post about the plans somewhere <mchua> adricnet: whoo. almost at time for today - anything else? <marcopg> I can have a look too <adricnet> l.o mail was down yesterday? <marcopg> I thought a *little* bit about the issues already <marcopg> and worked some on infrastructure (buildbot only) <mchua> I think I have a much better idea of what I want from an Activity test framework, at least, so this has been helpful to me <adricnet> Yay helpful. <mchua> adricnet: anything we should do to make this more adric-helpful, too? <mchua> (I'm planning on writing up the notes/plans on the wiki, mailing to the testing list, asking people to shoot at it) <adricnet> mchua: Not yet. Need to have some examples of these Comm Testing tests so that we can argue about format and where to keep them <mchua> adricnet: Aye, right. Concrete examples, working code... <mchua> I think there's enough constraints in the design doc that we'll get from this discussion to toss out a couple prototypes, though <adricnet> mchua: Well, complete-ish PoCs at least <marcopg> are you thinking to write your custom scripts? or to base on existing frameworks? <adricnet> prototypes, yeah <mchua> adricnet: cool, then we're done, I think <mchua> adricnet: thank you! <mchua> marcopg: base on existing, whenever possible <mchua> marcopg: I am a lazy bum ;) <marcopg> heh <adricnet> Laziness is a virtue <marcopg> there is sugarbot <marcopg> and also another one which I can't remember right now mmm <marcopg> (both for gtk) <adricnet> Selenium ? <adricnet> Oh, notthat low-level oops <marcopg> oh dogtail <mchua> ah! I think you sent me dogtail <mchua> I haven't gotten a chance to look at it, yet - been working on remote testing scripts (...incredibly slowly, alas) <marcopg> yeah me neither... <marcopg> would be nice to compare sugarbot and dogtail <mchua> it sure would. <marcopg> would be also useful to just ask zach about it <marcopg> perhaps he looked into dogtail before doing his own thing <mchua> Ooh, that's a great question to ask zach. <marcopg> zach was supposed to integrate sugarbot into jhbuild btw <mchua> marcopg: btw, do you have this channel logged, or do you want me to send you the log from the start of the brainstorm? <marcopg> but we haven't heard anything from him about ti <marcopg> busy with school I guess :/ * kevix has quit (Read error: 110 (Connection timed out)) <marcopg> mchua: don't think I have all of it, logs would be great