XO-1.75/Kernel/Testing

From OLPC
< XO-1.75
Revision as of 22:40, 3 January 2012 by Greenfeld (talk | contribs) (5ba0b446 - arm-3.0-wip-wfi)
Jump to: navigation, search

Kernel developers, put your new kernel in a new section here. Latest at top. Testers place links to reports indented under the kernel.


5ba0b446 - arm-3.0-wip-wfi

  • olpc-ec: allow unknown commands to be executed
  • test dependencies: EC logging (S7), kernel logging with EC debug enabled ("echo 1 > /sys/module/olpc_ec_1_75/parameters/ec_debug").
  • purpose: improve EC driver; fewer (or no?) races/crashes..
  • testing by
    • Samuel Greenfeld
      • C1 SKU 201 EC communications failure (#3), OLS runin disabled, Q4C11 modified EC code ecimage-0.3.07pgf-668.bin. host ec

ff199462 - arm-3.0-wip-wfi

  • olpc-ec: ensure gpio cmd is left low if something screws up
  • EC logged in S7
  • Still has FIQ debugger
  • Should help reset EC bus to avoid subsequent failures after the first command fails.
  • testing by
    • Samuel Greenfeld
      • C1 hang on resume (#3), OLS runin disabled, Q4C11 normal EC code host [1]
      • C1 hang on resume (#6), OLS runin disabled, Q4C11 normal EC code
      • C1 hang on resume (#3), OLS runin disabled, Q4C11 modified EC code ecimage-0.3.07pgf-668.bin host ec

3d4cf36c - arm-3.0-wip-wfi

  • mmp2_fiq_debugger.c add file missing from other merge
  • EC logged in S7
  • This enables a FIQ debugger. Reproduce kernel hangs, then from a serial console send a Break which should drop you to a debug prompt. run bt to see what is going on.
  • testing by
    • James Cameron, C1 SKU201 host (known audio problem that doesn't need to be reported again), C1 SKU202 host (known audio problem), B4, and two B1 hung, no response to BREAK. FIQ was verified as working before starting.

26f404e3 - arm-3.0-wip-wfi

  • Revert "olpc-ec: don't process/ack packets when there's an underrun error"
  • EC logged in S7
  • further code to work around EC communications race; discussion with pgf brought up another issue, and include patch from pgf. Test that audio pop during suspend/resume is gone from jnettlet.
  • testing by
    • Samuel Greenfeld
      • 3xC1 running os23 with olpc-runin-tests-0.16.7-1 installed instead of bringup build, Q4C11, all tests (2 with EC serial), 3xC1 running with battery test disabled. Test in progress.
        • 1xC1 SKU201 (#6) running aggressive (10s on/10s off) suspend failed after 15 minutes during resume, power & full battery LEDs on, all other LEDs off. All tests except the battery test were running on this unit at the time of failure. host
        • 1xC1 SKU201 (#3) running aggressive (10s on/10s off) suspend failed at a point TBD with EC communications failure and the eMMC root filesystem remounted read-only. It continued suspend & resume testing after the failure point(s). host ec
    • James Cameron
      • C1 C1 B4 B1 B1 B1, os20, runin 0.16.7, 10sec/10sec, hangs still occur one, two, audio pops during suspend and resume are still present.

7dad6c10 - arm-3.0-wip-wfi (BROKEN)

afa391a5 - arm-3.0-wip-wfi

9177e6a8 - arm-3.0-wip-wfi

  • olpc-1.75: back off the hardware clock gating for MMC devices
  • purpose: kill off the SET_BLOCK_COUNT errors seen by Quozl in the prior tests
  • result of test: no change (the patch did not affect the outcome).
  • testing by
    • James Cameron, Q4C11, os20, 10sec/10sec S/R runin,
      • C1 SKU201 passed, ec host,
      • C1 SKU202 hung, after 11 minutes, SET_BLOCK_COUNT eMMC failure at kernel timestamp 1159.762854, host
      • B1 SKU199 hung, after 2.5 hours, SET_BLOCK_COUNT eMMC failure at kernel timestamp 7409.481875, host
      • B1 SKU199 hung,
      • B4 SKU199 hung, (connecting serial port afterwards did not show streaming eMMC messages),
      • B1 SKU198 hung.
    • James Cameron, Q4C11, os20, single, dortc,
      • C1 SKU202 manually stopped early, had been needing keyboard wakeup,
      • B1 SKU199 manually stopped early,
      • B1 SKU199 manually stopped early, had been needing keyboard wakeup,
      • B4 SKU199 manually stopped early.
  • purpose: test theory that battery state of charge changes are associated with hangs
    • Samuel Greenfeld, os23, 4 C1 SKU201 2 C1 SKU202, olpc-runin-tests-0.16.7-1 installed instead of bringup build, runin-battery test disabled
      • 2xC1 SKU201 hung on resume after 10.5 hours host#3/ec#8 & host #6
      • 1xC1 SKU202 hung after 11.75 due disabling pm_async or keyboard events during runin host#4, possibly while resetting system #3 in front of it.
      • All systems then reset with pm_async disabled, 4 C1 SKU201 & 3 C1 SKU 202 total.
      • 1xC1 SKU202 hung citing phantom keyboard events followed by an illegal instruction, pm_async disabled [2]
      • 1xC1 SKU201 hung with MMC problems, pm_async disabled [3]
    • Samuel Greenfeld, os23, 3 B1 SKU198 4 B1 SKU199, olpc-runin-tests-0.16.7-1 installed instead of bringup build, runin-battery test enabled, runin-camera, runin-wlan disabled, pm_async enabled'
      • 1xB1 SKU 199 failure to properly handle EC interrupt on resume [4]
    • James Cameron, Q4C11, os20, 10sec/10sec S/R runin, without runin-battery, without battery inserted.
      • C1 SKU201 hung, after 67 minutes, host, C1 SKU202 hung, after 15 minutes, host, B1 SKU199 x 2, B4 SKU199 hung, after 90 minutes, host tail, B1 SKU198.
  • purpose: disable asynchronous S/R
  • purpose: no-suspend-contention runin branch
    • five units run to 100 suspend cycles, 2.5 hours each, no issues.

58360582 - arm-3.0-wip-wfi

  • Revert "olpc-ec-1-75: clean up cmd state locking and other things"
  • purpose: test the old EC driver across runin and s/r.
  • result of test: no change (the patch did not affect the outcome).
  • additional tests: against normal runin, and 10sec/10sec S/R runin.
  • testing by
    • Richard Smith, os21, C1 SKU201, three runs, against two-stage runin, pass.
    • James Cameron, Q4C11, os20, 10sec/10sec S/R runin,
      • C1 SKU201 stopped, SOC display loss at EC timestamp 5380673 kernel timestamp 745.92 (same symptom as seen in #4239902), ec host,
      • C1 SKU202 stopped, eMMC failure, host
    • James Cameron, Q4C11, os20, 10sec/10sec S/R runin,
      • C1 SKU201 hung, ec host,
      • C1 SKU202 hung, within 30 minutes, host
      • B4 SKU199 hung, after two hours,
      • B1 SKU199 was going fine for 2:15, but manually stopped, host
      • B1 SKU199 hung, after one hour,
      • B1 SKU198 hung.
    • Samuel Greenfeld, os23, 4 C1 SKU201 2 C1 SKU202, olpc-runin-tests-0.16.7-1 installed instead of bringup build
      • C1 SKU201 hung on resume after 21 cycles (10s on/10s suspend), host#3 ec#8
      • C1 SKU201 hung on resume overnight (10s on/10s suspend), host#6
      • C1 SKU202 EC communications failure overnight host#1 ec#7
      • Three remaining C1 systems running default runin suspend cycle timings did not hang after 18 hours.

2d8e7cc - arm-3.0-wip-wfi

  • sdhci: ignore interrupts received after suspend
  • zImage-2d8e7cc-wip-wfi
  • os: os20, ofw: q4c08 or q4c09, runin in build, runin-fscheck disabled, runin-battery disabled, runin-sus set to 10sec/10sec
  • purpose, test looking for non-ec related hangs.
  • result of test: no change (the patch did not affect the outcome).
  • testing by
    • Samuel Greenfeld
      • Passed, os21 not os20, B1 test bed, seven units.
    • James Cameron,
      • C1 SKU201 host ec, hung at 31 cycles 23:46:01 remaining,
      • C1 SKU202 host, hung at 36 cycles 23:43:41 remaining,
      • B1 SKU199 host, hung at 1737 cycles 11:25:02 remaining,
      • B4 SKU199, hung at 398 cycles 12:03:29 remaining,

6f125d7 - arm-3.0-wip

  • sdhci: ignore interrupts received after suspend (same commit comment but different git branch than above)
  • zImage-6f125d7-wip
  • os: os20, ofw: q4c08, runin in build, runin-sus disabled.
  • purpose: verification of a build for manufacturing testing.
  • result of test: success, the kernel is stable for runin testing in manufacturing if used without suspend and resume.
  • testing by
    • Samuel Greenfeld
      • Four C1 SKU 201, Three C1 SKU202 passed 24 hr testing
    • James Cameron,
      • C1 SKU201 host ec, passed,
      • C1 SKU202 host, passed.
  • was added to build os23.
  • os23 testing by
    • James Cameron, C1 SKU201, C1 SKU202, B4, B1, B1, B1, all passed 24 hr testing.

4239902

  • os20 q4c09 runin 0.16.7 10sec 10sec 24hrs
  • testing by
    • James Cameron, fail, one unit hung, all units lost EC communications (blank SOC display, hang after runin pass),