Mesh Testing: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
(Goals?)
 
(7 intermediate revisions by 6 users not shown)
Line 1: Line 1:
This page describes the network testing that will be performed on Monday Feb 25th at 1cc.
This page describes the network testing that will be performed on Monday Feb 25, 2008 at 1cc.

Goals:

It would be good to state the goal(s) of the testing. What are you proving or disproving? What are you measuring? Why?


Setup:
Setup:
Line 17: Line 21:
# Every machine coming out of suspend (or booting).
# Every machine coming out of suspend (or booting).
# Every machine trying to register with school server -- Number of machines that failed the first attempt, failed second attempt, etc.
# Every machine trying to register with school server -- Number of machines that failed the first attempt, failed second attempt, etc.
# Try pinging each node from each other node. Establish whether you have basic unicast connectivity. Record success/failure for each pair. Or record final packet's ping time (zero for no packets getting through).
# Try pinging the "all-nodes" IPv6 multicast address from each node. Record which other nodes respond to each node's multicast ping. Establish whether you have basic multicast connectivity. [ping6 -I msh0 ff02::1]
# Ricardo's web spider at various rates of download (download 1k page/second, etc)
# Ricardo's web spider at various rates of download (download 1k page/second, etc)
# Read -- if one laptop shares a PDF, how many laptops fail to retrieve it?
# Read -- if one laptop shares a PDF, how many laptops fail to retrieve it?
#* We found a problem with salut stream tubes for Read, resulting in failure to send all of the document ([http://dev.laptop.org/ticket/6483 #6483]). The fix is in '''joyride-1721''' so please use that build to test PDF sharing. --[[User:Morgs|morgs]] 06:11, 23 February 2008 (EST)
# Distance -- binary success/fail. Are there other metrics?
# Distance -- binary success/fail. Are there other metrics?
#* From the GUI, the relevant failure mode is apparent because the initiator's One Big Button never becomes clickable. This indicates that the stream-tube server is not being notified of a new socket connection. It is not clear whether this is purely binary for any mesh, or whether it is statistical. In January, at 1CC, I had 100% failure rate. [[User:Bemasc|Ben]] 00:38, 22 February 2008 (EST)
# Write -- automate pressing N characters a second for small N, look at received rate/update time, increase number of participants per document.
#* We should also try invitations. In the past, at 1CC, I have seen Activity invitations (for any Activity) fail to arrive. [[User:Bemasc|Ben]] 12:12, 23 February 2008 (EST)
# olpc-update -- number of machines upgraded in 1 hour
# Write -- cjb has automated pressing a character every second; run this and look at received rate/update time, increase number of participants per document.
# olpc-update -- number of machines upgraded in 1 hour [from what prior release to what new release?]


Variables to investigate:
Variables to investigate:
Line 33: Line 42:
* Block multicast in route table (are there other sources of multicast packets other than the above?)
* Block multicast in route table (are there other sources of multicast packets other than the above?)
* Would turning down the tx power (globally, since we can't do it per-packet) help with the dense mesh problems?
* Would turning down the tx power (globally, since we can't do it per-packet) help with the dense mesh problems?
: "iwconfig eth0 txpower 3mw" appears to work until there is an active connection, at which point the firmware's automatic txpower adjustment appears to take over. If this analysis is correct, setting transmit power is not possible with the current firmware. [[User:Bemasc|Ben]] 00:38, 22 February 2008 (EST)
* Are there other mesh parameters to tweak? Path request timeout, for example.
* Are there other mesh parameters to tweak? Path request timeout, for example.
** I suggest running the same test, having every machine associated to a single access point (rather than the mesh). This will give you a baseline value to which you can compare how the mesh behaves. I found so many failures in mesh multicast (e.g. #6527) that I think the results of mesh testing will be dubious without fixes. --gnu


Opportunistic reception:
pcapy python module: [http://hany.sk/mirror/fedora/updates/8/i386/pcapy-0.10.5-1.fc8.i386.rpm]
--ypod

Latest revision as of 20:19, 25 February 2008

This page describes the network testing that will be performed on Monday Feb 25, 2008 at 1cc.

Goals:

It would be good to state the goal(s) of the testing. What are you proving or disproving? What are you measuring? Why?

Setup:

  • Start with ten machines, keep adding ten at a time while it is useful to do so.

Measurements to make during each test (in addition to workload-specific measurements):

  • Spectrum utilization -- as measured w/ spectrum analyzer and/or from wireshark
    • Wireshark may be able to break down bandwidth by packet type
  • Remaining bandwidth -- attempt to download a large file on one machine during test, record time taken or bandwidth achieved.
  • Total # of laptops seen on mesh view on all numbers (should be n^2).

Workloads -- tests to perform, along with their quantitative metrics:

  1. Idle load.
  2. Every machine coming out of suspend (or booting).
  3. Every machine trying to register with school server -- Number of machines that failed the first attempt, failed second attempt, etc.
  4. Try pinging each node from each other node. Establish whether you have basic unicast connectivity. Record success/failure for each pair. Or record final packet's ping time (zero for no packets getting through).
  5. Try pinging the "all-nodes" IPv6 multicast address from each node. Record which other nodes respond to each node's multicast ping. Establish whether you have basic multicast connectivity. [ping6 -I msh0 ff02::1]
  6. Ricardo's web spider at various rates of download (download 1k page/second, etc)
  7. Read -- if one laptop shares a PDF, how many laptops fail to retrieve it?
    • We found a problem with salut stream tubes for Read, resulting in failure to send all of the document (#6483). The fix is in joyride-1721 so please use that build to test PDF sharing. --morgs 06:11, 23 February 2008 (EST)
  8. Distance -- binary success/fail. Are there other metrics?
    • From the GUI, the relevant failure mode is apparent because the initiator's One Big Button never becomes clickable. This indicates that the stream-tube server is not being notified of a new socket connection. It is not clear whether this is purely binary for any mesh, or whether it is statistical. In January, at 1CC, I had 100% failure rate. Ben 00:38, 22 February 2008 (EST)
    • We should also try invitations. In the past, at 1CC, I have seen Activity invitations (for any Activity) fail to arrive. Ben 12:12, 23 February 2008 (EST)
  9. Write -- cjb has automated pressing a character every second; run this and look at received rate/update time, increase number of participants per document.
  10. olpc-update -- number of machines upgraded in 1 hour [from what prior release to what new release?]

Variables to investigate:

  • Set mesh ttl to 1 for every packet
  • Change bcast/mcast rate on every node
  • Jim's Avahi config 30% fixes?
  • Presence: Benchmark bandwidth use of Avahi vs. Cerebro vs. no presence?
  • Collaboration: Benchmark switching from multicast to unicast?
  • Suspend/resume: Off vs. on, wake-on-unicast vs. wake-on-multicast
  • Block multicast in route table (are there other sources of multicast packets other than the above?)
  • Would turning down the tx power (globally, since we can't do it per-packet) help with the dense mesh problems?
"iwconfig eth0 txpower 3mw" appears to work until there is an active connection, at which point the firmware's automatic txpower adjustment appears to take over. If this analysis is correct, setting transmit power is not possible with the current firmware. Ben 00:38, 22 February 2008 (EST)
  • Are there other mesh parameters to tweak? Path request timeout, for example.
    • I suggest running the same test, having every machine associated to a single access point (rather than the mesh). This will give you a baseline value to which you can compare how the mesh behaves. I found so many failures in mesh multicast (e.g. #6527) that I think the results of mesh testing will be dubious without fixes. --gnu


Opportunistic reception: pcapy python module: [1] --ypod