XO-1.75/Kernel/Issues: Difference between revisions

From OLPC
Jump to navigation Jump to search
Line 31: Line 31:
The time period may vary.
The time period may vary.


Only seen to occur with olpc-runin-tests in aggressive mode; a ten second awake time, followed by a ten second rtcwake suspend.
Only occurs with SUS_TIME not set to 3000.
Easily reproduced.


Instances from previous testing:
Instances from previous testing:
Line 43: Line 44:
* http://dev.laptop.org/~quozl/z/1RiK9g.txt (proving it does not require the body of the function)
* http://dev.laptop.org/~quozl/z/1RiK9g.txt (proving it does not require the body of the function)


Analysis.
Diagnosis.


Tracing the point of hang by gradually adding printk has shown the problem occurs in dpm_resume(), before dpm_complete() is called by dpm_resume_end().
Tracing the point of hang by gradually adding printk has shown the problem occurs in dpm_resume(), before dpm_complete() is called by dpm_resume_end().
Line 58: Line 59:


Note that the elapsed time shows no correlation with the previous instances of the issue ... which suggests that it is the operations being performed rather than the time they are performed.
Note that the elapsed time shows no correlation with the previous instances of the issue ... which suggests that it is the operations being performed rather than the time they are performed.

James is using the watchdog on six units to capture as many instances of the hang as possible, to help in identifying next step in analysis.

Revision as of 23:08, 5 January 2012

hang, mmp2-pcm-audio.0 resume

This is a hang with the last serial console message:

mmp2-pcm-audio mmp2-pcm-audio.0: resume

No FIQ. No SysRq. Always preceeded by a PM: Some devices failed to suspend.

Jon Nettleton says known problem. Caused by interrupted suspend, and during the unwind of the resume it hangs.

Can be worked around by removing the audio driver from the kernel.

Disabling runin-sound has no effect.

Seen within half an hour if SUS_TIME is set to 3000. Rarely or never seen if SUS_TIME is set to 10000.

Theory: once a suspend takes too long, the RTC alarm goes off before suspend has finished, and the suspend is interrupted, leading to this issue.

Impact: may trigger also with other wake sources, leading to a failed runin.

Instances:

Might be fixed by http://dev.laptop.org/git/olpc-kernel/commit/?h=arm-3.0-wip-wfi&id=92ef8264199818b91518f7fe7af365c5998381fa

hang, dpm_resume

This is a hang shortly after the serial console message:

mmp2_pm_finish: Enable audio island

The time period may vary.

Only seen to occur with olpc-runin-tests in aggressive mode; a ten second awake time, followed by a ten second rtcwake suspend. Easily reproduced.

Instances from previous testing:

Analysis.

Tracing the point of hang by gradually adding printk has shown the problem occurs in dpm_resume(), before dpm_complete() is called by dpm_resume_end().

Adding a 60ms mdelay() per device within the dpm_resume() function, within the list processing, patch has shown results:

Note that the elapsed time shows no correlation with the previous instances of the issue ... which suggests that it is the operations being performed rather than the time they are performed.

James is using the watchdog on six units to capture as many instances of the hang as possible, to help in identifying next step in analysis.