XO Troubleshooting Guide

From OLPC
Revision as of 19:38, 28 May 2008 by 64.203.28.150 (talk) (Added repair procedures for "bad RTC" systems.)
Jump to: navigation, search

This is a troubleshooting guide for the XO laptop. It is geared toward troubleshooting production units running firmware version Q2D03 or greater.

Still very much a work in progress!! Feel free to add

Contents

Problems powering on

For an introduction to the different boot options selectable while powering on, see the Cheat codes.

All of these tests assume that a known good source of power is available, either a charged battery or power adapter. To debug problems with the power source, and battery charging issues, see Power and Battery problems.

Is the Power LED On ?

When the power button is pressed once, the power LED doesn't turn on.

Please try Resetting the Embedded Controller.

If that doesn't work, test that the power adapter and battery used in the test are working (using another laptop). A laptop may be unable to power itself from its battery or power adapter (for a number of reasons).

If both power sources are working, then the motherboard is broken. The possible failure modes are numerous, and deserve a second troubleshooting guide.

The display doesn't light up

When the power button is pressed once, the power LED turns on, but the screen doesn't light up.

Display dead and no boot sound plays

This usually indicates a broken motherboard.

Early production machines had firmware earlier than Q2D07, which had a bug that caused this symptom if the RTC clock lost all power, including the power from an internal "coin cell" battery. To make matters worse, some of the early units were manufactured with defective holders for that coin cell battery, so that the batteries became loose or even popped out entirely. To determine conclusively that this is the problem, you must connect a serial terminal via a low voltage interface adapter to J1, the processor serial port, which is inside the enclosure. This problem cannot be repaired without opening the enclosure. If the system has this problem, Open Firmware will halt the boot process and display "Page Fault" on the serial console.

Repairing a "bad RTC setting" system

Repairing a system with the "early firmware, bad RTC setting" problem requires several steps. If you have lots of XO machines, you might consider using the bad system for spare parts. Otherwise, you can repair it as follows:

  • First, you must re-seat the coin cell battery in its holder and secure it so it doesn't come out again. One way to secure it is to put a drop of glue where the battery contacts the holder, away from the metallic contact. The best glue that I have found for this purpose is clear solvent-based household cement. Technically, it is "nitro cellulose" cement. It is also known as "model airplane glue", marketed under various trade names such as "Duco Cement" and "Tarzan's Grip". Stronger adhesives like epoxy or cyanoacrylate (super glue) would probably work too, but it might be difficult to remove the battery later without damaging the holder.
  • After securing the battery, you must replace the boot firmware with a version that is newer than Q2D06 (use the latest official release). There are two ways:
  1. Option 1 requires electronic rework (soldering and desoldering) skills and tools, and access to a standalone SPI FLASH programmer device. You unsolder the SPI FLASH chip, reprogram it (or a new one) with a recent firmware version, then resolder the fresh chip.
  2. Option 2 requires a serial terminal with a low-voltage interface adapter that will connect to J1 inside the XO. The steps are:
    1. Insert a USB key containing the new firwmare version.
    2. Connect the serial terminal (115200,8,n,1) and power on the machine. You should see "Page Fault" on the serial terminal, followed by an "ok" prompt.
    3. Type the following command line at the ok prompt, substituting the correct .rom filename:
 ok probe-pci probe-usb  flash u:\q2d14.rom

After you have updated the firmware, you must set the RTC clock. There are several ways:

  • If you can boot to Linux (because the machine is permanently activated or you have a developer key), you can use the Linux "hwclock" command.
  • If you have an open wireless network that is connected to the internet, you can type this at the ok prompt (replacing "MY SSID" as appropriate):
ok ssid MY SSID
ok ntp-set-clock
  • Otherwise, you can use OFW to set the date as follows. First determine the date and time in UTC, then:
ok select /rtc
ok decimal  0 10 22  27 5 2008  set-time
ok unselect

The above example is for 2008-5-27 22:10:00 UTC, i.e. the numbers are entered starting with the least significant - second, minute, hour, day, month, year.

  • If you can't get to the ok prompt in the normal way because the machine is secure and you don't have a developer key, you can still set the date via a serial terminal. Power up the system and immediately start typing "iiiiii..." on the serial terminal. You should get an ok prompt within about 2 seconds. (You can't reflash the firmware from this ok prompt, but you can set the clock).

Erase any extra 'i' characters with Backspace, then type the "select /rtc ..." command sequence as shown above.

Machine boots normally, but no boot sound plays

If no boot sound is played, but the machine boots normally and has audio, it is likely that the user has changed the default boot volume to 0. While the boot sound is playing, a user can adjust the volume using the volume adjust keys. This modified volume setting is saved and used for future boots. Try increasing the volume right after starting the laptop a few times, and see if the boot sound returns.

If no boot sound is played, and the machine boots normally but has no audio see Audio Problems.

and the boot sound plays

If the display doesn't light up, but the boot sound plays, see Display Problems.

The display says "Connect to power to proceed"

Not quite the correct wording. Anyone remember the exact words ?

Early versions (before Q2D14) of the firmware would stop execution if a firmware update was scheduled, but two sources of power (a battery and a power adapter) aren't present ( [#5422]). If this is the problem, provide both sources of power and reboot. The laptop should proceed with a firmware update and boot normally.

The display is showing an XO icon

This means that Open Firmware has started the boot process.

with a single dot below it

If laptops powers up, but stops when just displaying the XO icon in the middle, with a single dot below it, it means that something is wrong when the Linux OS starts operation. Try booting with the

One solution to try is upgrading or re-install the software.

with a "sad face"

This means that Open Firmware couldn't find a signed operating system on the internal flash memory. (It will also look on USB memory sticks and SD cards.)

Try upgrading or re-install the software.

with a serial number and three icons below it

If the laptop powers up, but stops when displaying the XO icon in the middle of the screen, followed by a serial number (e.g. CSN74902B22) and three icons (SD disk, USB disk, Network signal strength), it is looking for its activation lease. This should eventually print "Activation lease not found" at the top of the screen and power-off soon thereafter.

The solution is to re-activate the laptop. Obtain a copy of the lease (or a new lease) from your country activation manager, place it (named "lease.sig") on the root directory of a USB key and boot the laptop. See What to do with your activation keys.

Display problems

The display doesn't turn on

Use a strong light shining on the display to confirm that the problem isn't that the backlight won't turn on.

Use a known good display to check whether the display or the motherboard is broken.

If the known good display also does not turn on, the motherboard is broken and should be replaced.

One half of the display looks bad

Is the display cable properly connected ?

Disassemble the laptop to access the larger flex cable from the display to the motherboard. Make sure that it is properly seated in its connector and properly clamped down. The white stripe on the cable should be close to and parallel to the black tab clamping the cable down.

Photo needed here, a closeup of the display connector

If the connector is broken (the small black tab fails to stay in place), the motherboard will need replacement.

Is the display broken ?

Use a known good display to check whether the display or the motherboard is broken.

If the known good display shows the same problem, the motherboard is broken and should be replaced.

The display fades to white

The backlight won't turn on

The backlight isn't even

In this case, a vertical pattern of light and dark is seen on the display. The difference between light and dark regions being strongest at the bottom of the screen.

Is the backlight cable properly connected ?

Disassemble the laptop to access the small flex cable from the display to the motherboard. Make sure that it is properly seated in its connector, and properly clamped down.

Photo needed here, a closeup of the backlight connector

If the connector is broken (the small black tab fails to stay in place), the motherboard will need replacement.

A temporary solution is to wind a small strip of paper several times, jamming a small piece of it into the connector to hold the flex cable in place. A piece of tape covering the connector provides additional stability.

Is the lightbar broken ?

If the cable is well seated, then it is likely that the actual lightbar in the display has failed. This lightbar can be replaced (the display itself is probable fine.)

Is the motherboard backlight driver broken ?

If replacing the lightbar fails to correct the problem, it is a motherboard problem. Check the voltages across R147, R148, and R149 when the backlight is operating. They should be equal. If they are not, replace the corresponding switch transistor (Q13, Q14, or Q15, respectively) with a generic NPN low power switching transistor (2N3904 equivalent).

Audio and Video problems

A stereo sweep pattern is output for testing the speakers/headset jack when running the Open Firmware hardware tests. A microphone test, where audio is recorded, then played back (at low volume) is also included. See Hardware Self-Test.

No sound through the speakers

Garbled sound through the speakers

No sound through the headphones

If the speakers function correctly, this is caused by the headphone connector.

The microphone doesn't work

The microphone jack doesn't work

Images from the camera are black

Keyboard and Touchpad problems

Does the keyboard generate unexpected responses ?

Stuck keys

Does the touchpad only allow vertical (horizontal) movement ?

USB problems

A full set of eye patterns is generated for testing the wiring of each USB port when running the Open Firmware hardware tests. See Hardware Self-Test.

One of the USB ports doesn't work

This is typically due to broken connector or solder joint at the affected connector.

All of the USB ports receive power from a common switch, so if there is a problem with power it is likely due to the above reasons.

None of the USB ports work

Measure the +5V supply at the USB connector (or across C509 or C551). If it is not present, the USB power switch and current limiter, U56, may be broken. It may be bypassed by a 0 ohm resistor at R466 for testing purposes.

If the audio (speakers) is also non-functional, then the problem is the +5V supply on the motherboard.

Power and Battery problems

The laptop won't run using the battery

The laptop won't run using the power adapter

The laptop emits a high pitched whine when using the power adapter

The laptop can't charge the battery

Common Procedures

Rebooting the Embedded Controller

The XO embedded controller (EC) occasionally becomes confused. To reset it, remove all power sources from the laptop:

  1. Take the battery out and remove the power adapter
  2. Wait 10 seconds to allow the embedded controller to lose power and reset
  3. Replace at least one source of power (battery or power adapter)

The battery LED should flash orange momentarily (about a quarter of a second) when power is first reapplied. If you do not see this flash, you either have a motherboard hardware problem or faulty EC firmware installed. The general solution to both of these problems is to replace the motherboard.

Boot options

Different boot options are available through pressing buttons around the screen during the initial boot process (immediately after pressing the power button). These are included here for completeness. More information is available at Cheat codes.

  • '✓' (check) game pad key: forces a more detailed display while booting. See Startup Diagnosis for more details. This is useful for debugging activation problems.
  • 'O' game pad key: alternate between the current boot image and a previous one. In laptops which have never been upgraded, there is no previous boot image to use.
  • Rocker left: invoke hardware self-test.

Hardware Self-Test

Open Firmware includes hardware diagnostics routines for most major components of the laptop. It is triggered by pressing down the left hand side of the "Rocker switch" to the left of the screen while booting a laptop.

The following components of the laptop are tested sequentially by the hardware diagnostics:

  • Battery - The current status of the battery is read from it and printed out
  • SPI flash - The manufacturing data of the laptop is printed out
  • Memory - The SDRAM on the motherboard is quickly tested.
  • Processor - The Processor is exercised. Press any key to skip to the next test.
  • USB - The USB ports are exercised (for use with an oscilloscope).
  • Audio - A stereo sweep is output over the speakers (headphones, if plugged in) and then audio is recorded using the microphone, and output (and low volume) over the speakers.
  • Camera - Video is displayed on the screen from the camera for twenty seconds
  • SD Storage - Any SD storage is quickly (and non-destructively) tested
  • NAND Flash - The motherboard's internal NAND Flash storage is quickly (and non-destructively) tested.
  • Display - The display is only marginally tested with color bars, then the drawing capabilities of the CPU are displayed for a while. Press any key to skip to the next test.
  • WLAN - The firmware is loaded, the network co-processor booted and communicated with.
  • RTC
  • Timer
  • Touchpad - Press any key to exit.
  • Keyboard - Press ESC to exit.

If using firmware later than Q2D08, you can pause between individual tests by holding down the "rotate" button on the left hand side of the screen (below the "Rocker switch").

Reinstalling Software