Mesh Testing

From OLPC
Revision as of 11:06, 23 February 2008 by 209.237.225.236 (talk) (Try basic ping connectivity among the laptops, and with the school server.)
Jump to navigation Jump to search

This page describes the network testing that will be performed on Monday Feb 25, 2008 at 1cc.

Setup:

  • Start with ten machines, keep adding ten at a time while it is useful to do so.

Measurements to make during each test (in addition to workload-specific measurements):

  • Spectrum utilization -- as measured w/ spectrum analyzer and/or from wireshark
    • Wireshark may be able to break down bandwidth by packet type
  • Remaining bandwidth -- attempt to download a large file on one machine during test, record time taken or bandwidth achieved.
  • Total # of laptops seen on mesh view on all numbers (should be n^2).

Workloads -- tests to perform, along with their quantitative metrics:

  1. Idle load.
  2. Every machine coming out of suspend (or booting).
  3. Every machine trying to register with school server -- Number of machines that failed the first attempt, failed second attempt, etc.
  4. Try pinging each node from each other node. Establish whether you have basic unicast connectivity. Record success/failure for each pair. Or record final packet's ping time (zero for no packets getting through).
  5. Try pinging the "all-nodes" IPv6 multicast address from each node. Record which other nodes respond to each node's multicast ping. Establish whether you have basic multicast connectivity.
  6. Ricardo's web spider at various rates of download (download 1k page/second, etc)
  7. Read -- if one laptop shares a PDF, how many laptops fail to retrieve it?
  8. Distance -- binary success/fail. Are there other metrics?
From the GUI, the relevant failure mode is apparent because the initiator's One Big Button never becomes clickable. This indicates that the stream-tube server is not being notified of a new socket connection. It is not clear whether this is purely binary for any mesh, or whether it is statistical. In January, at 1CC, I had 100% failure rate. Ben 00:38, 22 February 2008 (EST)
  1. Write -- cjb has automated pressing a character every second; run this and look at received rate/update time, increase number of participants per document.
  2. olpc-update -- number of machines upgraded in 1 hour

Variables to investigate:

  • Set mesh ttl to 1 for every packet
  • Change bcast/mcast rate on every node
  • Jim's Avahi config 30% fixes?
  • Presence: Benchmark bandwidth use of Avahi vs. Cerebro vs. no presence?
  • Collaboration: Benchmark switching from multicast to unicast?
  • Suspend/resume: Off vs. on, wake-on-unicast vs. wake-on-multicast
  • Block multicast in route table (are there other sources of multicast packets other than the above?)
  • Would turning down the tx power (globally, since we can't do it per-packet) help with the dense mesh problems?
"iwconfig eth0 txpower 3mw" appears to work until there is an active connection, at which point the firmware's automatic txpower adjustment appears to take over. If this analysis is correct, setting transmit power is not possible with the current firmware. Ben 00:38, 22 February 2008 (EST)
  • Are there other mesh parameters to tweak? Path request timeout, for example.
    • I suggest running the same test, having every machine associated to a single access point (rather than the mesh). This will give you a baseline value to which you can compare how the mesh behaves. I found so many failures in mesh multicast (e.g. #6527) that I think the results of mesh testing will be dubious without fixes. --gnu