XO-1.75/Kernel/Testing

From OLPC
< XO-1.75
Revision as of 18:18, 6 January 2012 by Dilinger (talk | contribs)
Jump to navigation Jump to search

Kernel developers, put your new kernel in a new section here. Latest at top. Testers place links to reports indented under the kernel.

See also:


ebf24ea6 - arm-3.0-wip-wfi

196c2f806 - arm-3.0-wip-wfi

46e079fe - arm-3.0-wip-wfi

  • http://dev.laptop.org/git/olpc-kernel/commit/?h=arm-3.0-wip-wfi&id=46e079fe
  • EC logged in S7
  • Should fix the hang caused by suspend being aborted due to a wakeup event. May also help with other IRQ hangs as I have fixed a previous mistake I made when moving the audio island code.
  • Testing aborted suspends can be done by running this command and then hitting the keyboard right away.

echo $(cat /sys/power/wakeup_count) > /sys/power/wakeup_count && echo mem > /sys/power/state

  • testing by
    • James Cameron, with a local debugging patch, saw one instance of eMMC "error -110 sending SET_BLOCK_COUNT command", on C1 SKU201, see log tail, and several instances of the hang in dpm_resume().

bfc1b92b - arm-3.0-wip-wfi

b7f22e1d - arm-3.0-wip-wfi

10ebd28f - arm-3.0-wip-wfi

ae48be89 - arm-3.0-wip-wfi

  • a tar.gz by James,
  • testing by
    • Richard?
    • James Cameron, 10sec/10sec, with audio disabled, patch,

5ba0b446 - arm-3.0-wip-wfi

  • olpc-ec: allow unknown commands to be executed
  • test dependencies: EC logging (S7), kernel logging with EC debug enabled ("echo 1 > /sys/module/olpc_ec_1_75/parameters/ec_debug"), pm_async disabled.
  • purpose: improve EC driver; fewer (or no?) races/crashes..
  • testing by
    • Samuel Greenfeld
      • C1 SKU 201 EC communications failure (#3), OLS runin disabled, Q4C11 modified EC code ecimage-0.3.07pgf-668.bin. host ec

ff199462 - arm-3.0-wip-wfi

  • olpc-ec: ensure gpio cmd is left low if something screws up
  • EC logged in S7
  • Still has FIQ debugger
  • Should help reset EC bus to avoid subsequent failures after the first command fails.
  • testing by
    • Samuel Greenfeld
      • C1 hang on resume (#3), OLS runin disabled, Q4C11 normal EC code host (mmp2_pm_finish: Enable audio island ... mmp2_pm_finish: Done ... 72.616ms ... mmp-camera mmp-camera.0: resume) [1]
      • C1 hang on resume (#6), OLS runin disabled, Q4C11 normal EC code
      • C1 hang on resume (#3), OLS runin disabled, Q4C11 modified EC code ecimage-0.3.07pgf-668.bin host ec

3d4cf36c - arm-3.0-wip-wfi

  • mmp2_fiq_debugger.c add file missing from other merge
  • EC logged in S7
  • This enables the FIQ debugger. Reproduce kernel hangs, then from a serial console send a Break which should drop you to a debug prompt. run bt to see what is going on.
  • testing by
    • James Cameron, 10sec/10sec,
      • C1 SKU201 hang at mmp-camera mmp-camera.0: resume, host,
      • C1 SKU202 hang at mmp-camera mmp-camera.0: resume, host,
      • C1 SKU202 hang at mmp-camera mmp-camera.0: resume, host,
      • no response to BREAK,
    • James Cameron, 0sec/3sec, FIQ was verified as working before starting,
      • C1 SKU201 host (known audio problem that doesn't need to be reported again),
      • C1 SKU202 host (known audio problem),
      • B4, and two B1 hung, no response to BREAK.

26f404e3 - arm-3.0-wip-wfi

  • Revert "olpc-ec: don't process/ack packets when there's an underrun error"
  • EC logged in S7
  • further code to work around EC communications race; discussion with pgf brought up another issue, and include patch from pgf. Test that audio pop during suspend/resume is gone from jnettlet.
  • testing by
    • Samuel Greenfeld
      • 3xC1 running os23 with olpc-runin-tests-0.16.7-1 installed instead of bringup build, Q4C11, all tests (2 with EC serial), 3xC1 running with battery test disabled. Test in progress.
        • 1xC1 SKU201 (#6) running aggressive (10s on/10s off) suspend failed after 15 minutes during resume, power & full battery LEDs on, all other LEDs off. All tests except the battery test were running on this unit at the time of failure. host (mmp2_pm_finish: Enable audio island ... mmp2_pm_finish: Done)
        • 1xC1 SKU201 (#3) running aggressive (10s on/10s off) suspend failed at a point TBD with EC communications failure and the eMMC root filesystem remounted read-only. It continued suspend & resume testing after the failure point(s). host ec
    • James Cameron
      • C1 C1 B4 B1 B1 B1, os20, runin 0.16.7, 10sec/10sec, hangs still occur one, two, audio pops during suspend and resume are still present.

7dad6c10 - arm-3.0-wip-wfi (BROKEN)

afa391a5 - arm-3.0-wip-wfi

9177e6a8 - arm-3.0-wip-wfi

  • olpc-1.75: back off the hardware clock gating for MMC devices
  • purpose: kill off the SET_BLOCK_COUNT errors seen by Quozl in the prior tests
  • result of test: no change (the patch did not affect the outcome).
  • testing by
    • James Cameron, Q4C11, os20, 10sec/10sec S/R runin,
      • C1 SKU201 passed, ec host,
      • C1 SKU202 hung, after 11 minutes, SET_BLOCK_COUNT eMMC failure at kernel timestamp 1159.762854, host
      • B1 SKU199 hung, after 2.5 hours, SET_BLOCK_COUNT eMMC failure at kernel timestamp 7409.481875, host
      • B1 SKU199 hung,
      • B4 SKU199 hung, (connecting serial port afterwards did not show streaming eMMC messages),
      • B1 SKU198 hung.
    • James Cameron, Q4C11, os20, single, dortc,
      • C1 SKU202 manually stopped early, had been needing keyboard wakeup,
      • B1 SKU199 manually stopped early,
      • B1 SKU199 manually stopped early, had been needing keyboard wakeup,
      • B4 SKU199 manually stopped early.
  • purpose: test theory that battery state of charge changes are associated with hangs
    • Samuel Greenfeld, os23, 4 C1 SKU201 2 C1 SKU202, olpc-runin-tests-0.16.7-1 installed instead of bringup build, runin-battery test disabled
      • 2xC1 SKU201 hung on resume after 10.5 hours host#3/ec#8 & host #6
      • 1xC1 SKU202 hung after 11.75 due disabling pm_async or keyboard events during runin host#4, possibly while resetting system #3 in front of it.
      • All systems then reset with pm_async disabled, 4 C1 SKU201 & 3 C1 SKU 202 total.
      • 1xC1 SKU202 hung citing phantom keyboard events followed by an illegal instruction, pm_async disabled [2]
      • 1xC1 SKU201 hung with MMC problems, pm_async disabled [3]
    • Samuel Greenfeld, os23, 3 B1 SKU198 4 B1 SKU199, olpc-runin-tests-0.16.7-1 installed instead of bringup build, runin-battery test enabled, runin-camera, runin-wlan disabled, pm_async enabled'
      • 1xB1 SKU 199 failure to properly handle EC interrupt on resume [4]
    • James Cameron, Q4C11, os20, 10sec/10sec S/R runin, without runin-battery, without battery inserted.
      • C1 SKU201 hung, after 67 minutes, host, C1 SKU202 hung, after 15 minutes, host, B1 SKU199 x 2, B4 SKU199 hung, after 90 minutes, host tail, B1 SKU198.
  • purpose: disable asynchronous S/R
  • purpose: no-suspend-contention runin branch
    • five units run to 100 suspend cycles, 2.5 hours each, no issues.

58360582 - arm-3.0-wip-wfi

  • Revert "olpc-ec-1-75: clean up cmd state locking and other things"
  • purpose: test the old EC driver across runin and s/r.
  • result of test: no change (the patch did not affect the outcome).
  • additional tests: against normal runin, and 10sec/10sec S/R runin.
  • testing by
    • Richard Smith, os21, C1 SKU201, three runs, against two-stage runin, pass.
    • James Cameron, Q4C11, os20, 10sec/10sec S/R runin,
      • C1 SKU201 stopped, SOC display loss at EC timestamp 5380673 kernel timestamp 745.92 (same symptom as seen in #4239902), ec host,
      • C1 SKU202 stopped, eMMC failure, host
    • James Cameron, Q4C11, os20, 10sec/10sec S/R runin,
      • C1 SKU201 hung, ec host,
      • C1 SKU202 hung, within 30 minutes, host
      • B4 SKU199 hung, after two hours,
      • B1 SKU199 was going fine for 2:15, but manually stopped, host
      • B1 SKU199 hung, after one hour,
      • B1 SKU198 hung.
    • Samuel Greenfeld, os23, 4 C1 SKU201 2 C1 SKU202, olpc-runin-tests-0.16.7-1 installed instead of bringup build
      • C1 SKU201 hung on resume after 21 cycles (10s on/10s suspend), host#3 ec#8
      • C1 SKU201 hung on resume overnight (10s on/10s suspend), host#6
      • C1 SKU202 EC communications failure overnight host#1 ec#7
      • Three remaining C1 systems running default runin suspend cycle timings did not hang after 18 hours.

2d8e7cc - arm-3.0-wip-wfi

  • sdhci: ignore interrupts received after suspend
  • zImage-2d8e7cc-wip-wfi
  • os: os20, ofw: q4c08 or q4c09, runin in build, runin-fscheck disabled, runin-battery disabled, runin-sus set to 10sec/10sec
  • purpose, test looking for non-ec related hangs.
  • result of test: no change (the patch did not affect the outcome).
  • testing by
    • Samuel Greenfeld
      • Passed, os21 not os20, B1 test bed, seven units.
    • James Cameron,
      • C1 SKU201 host ec, hung at 31 cycles 23:46:01 remaining,
      • C1 SKU202 host, hung at 36 cycles 23:43:41 remaining,
      • B1 SKU199 host, hung at 1737 cycles 11:25:02 remaining,
      • B4 SKU199, hung at 398 cycles 12:03:29 remaining,

6f125d7 - arm-3.0-wip

  • sdhci: ignore interrupts received after suspend (same commit comment but different git branch than above)
  • zImage-6f125d7-wip
  • os: os20, ofw: q4c08, runin in build, runin-sus disabled.
  • purpose: verification of a build for manufacturing testing.
  • result of test: success, the kernel is stable for runin testing in manufacturing if used without suspend and resume.
  • testing by
    • Samuel Greenfeld
      • Four C1 SKU 201, Three C1 SKU202 passed 24 hr testing
    • James Cameron,
      • C1 SKU201 host ec, passed,
      • C1 SKU202 host, passed.
  • was added to build os23.
  • os23 testing by
    • James Cameron, C1 SKU201, C1 SKU202, B4, B1, B1, B1, all passed 24 hr testing.

4239902

  • os20 q4c09 runin 0.16.7 10sec 10sec 24hrs
  • testing by
    • James Cameron, fail, one unit hung, all units lost EC communications (blank SOC display, hang after runin pass),