XO-1.75/Kernel/Issues: Difference between revisions

From OLPC
Jump to navigation Jump to search
(the dpm_resume hang was proven to be caused by the serial driver not properly handling suspend and resume, workaround is to remove no_console_suspend)
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
== hang, mmp2-pcm-audio.0 resume ==
== hang, SET_BLOCK_COUNT ==


This is a hang with the last serial console message:
This is a hang with a repeating message:
mmcblk0: error -110 sending SET_BLOCK_COUNT command, response 0x0, card status 0x700
mmp2-pcm-audio mmp2-pcm-audio.0: resume


<trac>11137</trac>
No FIQ. No SysRq. Always preceeded by a ''PM: Some devices failed to suspend''.
<trac>11525</trac>

<trac>11528</trac>
Jon Nettleton says known problem. Caused by interrupted suspend, and during the unwind of the resume it hangs.

Can be worked around by removing the audio driver from the kernel.

Disabling runin-sound has no effect.

Seen within half an hour if SUS_TIME is set to 3000. Rarely or never seen if SUS_TIME is set to 10000.

Theory: once a suspend takes too long, the RTC alarm goes off before suspend has finished, and the suspend is interrupted, leading to this issue.

Impact: may trigger also with other wake sources, leading to a failed runin.

Instances:
* http://dev.laptop.org/~quozl/z/1RiDms.txt
* http://dev.laptop.org/~quozl/z/1RiEIr.txt

Might be fixed by http://dev.laptop.org/git/olpc-kernel/commit/?h=arm-3.0-wip-wfi&id=92ef8264199818b91518f7fe7af365c5998381fa

== hang, dpm_resume ==

This is a hang shortly after the serial console message:
mmp2_pm_finish: Enable audio island

The time period may vary.

Only occurs with SUS_TIME not set to 3000.

Instances from previous testing:

* http://dev.laptop.org/~quozl/z/1RiHT6.txt (75.904ms)
* http://dev.laptop.org/~greenfeld/temp/175bringup/os23-ff199462/screenlog.3-resumehang2ecmod.bz2 (80.341ms)
* http://dev.laptop.org/~greenfeld/temp/175bringup/os23-26f404e/screenlog.6-resumehang.bz2 (2.683ms)
* http://dev.laptop.org/~greenfeld/temp/175bringup/os23-26f404e/screenlog.3-ecfail1.bz2 (72.616ms)
* http://dev.laptop.org/~quozl/z/1Rhvmt.txt
* http://dev.laptop.org/~quozl/z/1RhvpV.txt
* http://dev.laptop.org/~quozl/z/1RiK9g.txt (proving it does not require the body of the function)

Diagnosis.

Tracing the point of hang by gradually adding printk has shown the problem occurs in dpm_resume(), before dpm_complete() is called by dpm_resume_end().

Adding a 60ms mdelay() per device within the dpm_resume() function, within the list processing, [http://dev.laptop.org/~quozl/z/1Rijw0.txt patch] has shown results:
* [http://dev.laptop.org/~quozl/z/1Rijy3.txt C1 SKU201 after 27 suspend cycles],
* [http://dev.laptop.org/~quozl/z/1RijyO.txt C1 SKU202 after 9 suspend cycles],
* [http://dev.laptop.org/~quozl/z/1Rim3S.txt C1 SKU202 after 123 suspend cycles],
* [http://dev.laptop.org/~quozl/z/1Rim4h.txt B1 after 211 suspend cycles],
* [http://dev.laptop.org/~quozl/z/1Rim62.txt B4 after 137 suspend cycles].

Note that the elapsed time shows no correlation with the previous instances of the issue ... which suggests that it is the operations being performed rather than the time they are performed.

Latest revision as of 05:41, 13 January 2012

hang, SET_BLOCK_COUNT

This is a hang with a repeating message:

mmcblk0: error -110 sending SET_BLOCK_COUNT command, response 0x0, card status 0x700

<trac>11137</trac> <trac>11525</trac> <trac>11528</trac>