Path Discovery Mechanism/Sanity: Difference between revisions
Line 133: | Line 133: | ||
Since PDM traffic self-interferes it seems safe to assume that any traffic will interfere with the path discovery traffic and, as a result, increase the average path length. But why? |
Since PDM traffic self-interferes it seems safe to assume that any traffic will interfere with the path discovery traffic and, as a result, increase the average path length. But why? |
||
=== Cause #1: Not all nodes decode the same frames === |
|||
Two answer this question we'll use a different approach. We'll formulate and try to prove a hypothesis. |
|||
⚫ | A common misunderstanding about dense wireless environments is that every node will capture the same traffic simply because they are very close and share a common transmission area. If this was the case, either a frame would be received by all the nodes or a collision would render a frame useless to all of them. But this is not true at all. |
||
=== Hypothesis #1: The capture effect === |
|||
* Obstructions and reflections (the unpredictable multipath effect) change the probability that a frame is successfully decoded by a node. Even nodes some centimetres apart will experience different signal levels. |
|||
* The internal condition of the node (from the signal levels to the protocol stack) may cause a frame that was successfully decoded by a neighbour to to be lost. |
|||
* Slight differences in the antennae gain and in the radio sensitivity between two neighbours (to name only two items) are normal. |
|||
* And, even if there were no obstructions to the path between any two nodes or no reflections were introduced by walls for example; and even if two nodes were perfectly identical and all the internal state were completely synchronized; even under these ideal circumstances two nodes will not always successfully decode the same subset of the transmitted frames because of the capture effect [http://en.wikipedia.org/wiki/Capture_effect capture effect] (not to be confused with the channel capture effect). |
|||
To prove that the assumption in cause 1 is true, we performed some simple tests. |
|||
⚫ | |||
⚫ | |||
* The [http://en.wikipedia.org/wiki/Capture_effect capture effect] (not to be confused with the channel capture effect). |
|||
Why the fact that cause#1 is true will cause multihop paths? |
|||
==== The Capture Effect ==== |
|||
⚫ | |||
⚫ | |||
* the fact that not all nodes decode the same frames |
|||
⚫ | A common misunderstanding about dense wireless environments is that every node will capture the same traffic simply |
||
==== Path costs and path length ==== |
==== Path costs and path length ==== |
||
Line 222: | Line 227: | ||
(wlan_mgt.fixed.mesh_action == 0x0000 ) && (wlan_mgt.fixed.sa == 00:17:c4:0c:e8:eb) && (wlan.sa == 00:17:c4:0c:e8:eb) |
(wlan_mgt.fixed.mesh_action == 0x0000 ) && (wlan_mgt.fixed.sa == 00:17:c4:0c:e8:eb) && (wlan.sa == 00:17:c4:0c:e8:eb) |
||
In another experiment, 4 XOs (B,D,G,I) captured traffic while 2 other XOs (A,J) exchanged UDP datagrams traffic. |
|||
{| border = "1" |
|||
|+ |
|||
! node !! fromA !! from J !! from A via J !! from J via A |
|||
|- |
|||
| '''A''' || ''100'' || ''97'' || - || - |
|||
|- |
|||
| B || 98 || 100 || 98 || 97 |
|||
|- |
|||
| D || 98 || 100 || 98 || 97 |
|||
|- |
|||
| G || 98 || 98 || 97 || 97 |
|||
|- |
|||
| I || 98 || 100 || 97 || 97 |
|||
|- |
|||
| '''J''' || ''98'' || ''100'' || - || - |
|||
|- |
|||
| ''S'' || 98 || 100 || 98 || 97 |
|||
|} |
|||
Not only the quantities captured differ but in some cases, the number is the same but the subset of frames captured is different. |
|||
Although G and I both captured 97 frames retransmitted by J the subset is slightly different (96 are the same) that's the case of the 97 frames that G cap |
|||
Not only some frames are lost |
|||
wlan.bssid == 00:17:c4:0c:e8:eb and wlan.sa == 00:17:c4:0c:e8:eb and ip.dst == 224.2.3.72 |
|||
wlan.bssid == 00:17:c4:02:2f:59 and wlan.sa == 00:17:c4:02:2f:59 and ip.dst == 224.2.3.72 |
|||
wlan.bssid == 00:17:c4:0c:e8:eb and wlan.sa != 00:17:c4:0c:e8:eb and ip.dst == 224.2.3.72 |
|||
wlan.bssid == 00:17:c4:02:2f:59 and wlan.sa != 00:17:c4:02:2f:59 and ip.dst == 224.2.3.72 |
Revision as of 16:36, 10 May 2008
This page is dedicated to the following investigation: Does the Path Discovery Mechanism (PDM) behaves as expected in terms of the path length?
There's been observations that the implemented PDM may wrongly result in unnecessarily long path lengths. The canonical example is the ultra-dense mesh cloud of a classroom, where all the nodes are within the transmission range of each other. In such a scenario, does the PDM form multihop paths too often? How often?
And if so, can we expect that all the paths will be single hop in this scenario? Should we try to fix this or reduce this and how could we do it?
In trying to answer the questions above, others will came. For instance: how data traffic and PDM traffic influence each other. Will the burstiness of PDM's traffic significantly degrade data throughput? And will data traffic prevent path formation or induce unnecessarily long paths?
This page is dedicated to the formulation of this questions and, hopefully, some of the answers.
Question #1: Is it true that multihop paths form even if all the nodes are close to each other?
For this initial step, we'll use a Ten Kids Under the Tree testbed in which ten XOs are placed in a very dense configuration (less than 2 square-meters area) and every node are within transmission range of each other.
In such a situation one would expect the paths to be mostly, if not all, single-hop paths. Because a successful transmission at 54Mbps between any two nodes are theoretically always possible. A path with two or mode hops will have to be explained and, the longer the path, the more under optimal and suspicious the results would be.
The test
The tests were performed in a quiet environment (noise floor < -96dbm), and the used build was 703 (Update.1-703) with libertas firmware 22.p10. In order to use such firmware release, a special test kernel was used. Also, the following configurations were applied:
- mesh ttl was set to 5
- multicast pings were enabled
The workload used for this particular test was multicast pings (addressed to 224.0.0.1, default size, 1Hz). The test was performed three times, (1 minute for each test).
What are we measuring ?
One possibility would be just log the forwarding table of each node during the test and then check the path lengths. That's a good test. But this method was designed to be non-intrusive and, because of that, applicable to any traffic captures available (for instance from the Collaboration testbed).
So, to have a taste (or measure, if you will) of how (in)sane the PDM is in terms of path length, we'll analyse traffic captures and calculate an indexes for every path - the Data Path Length Index (DPLI)
The DPLI are calculated over the average hop count for each data frame exchanged between two nodes (see example in table 1). Of course this index is not strictly necessary (the table itself brings interesting information) and of course it has no meaning by itself. It is not an actual path length, but an indicator of the actual path lengths that formed during the test. Apart from confirming that multihop frames are being formed, it gives us some baselines for future comparisons and make it easier to analyse overall results.
Table 1, shows the DPLIs for a node (B) in a real test. We can see that the paths to C, J and H are pure single hop paths. But for all other neighbours a longer path did exist at some point. One should keep in mind that the paths are constantly rediscovered because (1) of expiration (which defaults to 10 seconds) of a path or (2) of fail to forward traffic. We can also clearly note that the path from B to D was much longer than any other path in this test, and this may be worth investigating.
neigh | 1hop | 2hops | 3hops | 4hops | 5hops | DPLI |
---|---|---|---|---|---|---|
G | 90.0% | 10.0% | 0.0% | 0.0% | 0.0% | 1.05 |
A | 82.1% | 17.9% | 0.0% | 0.0% | 0.0% | 1.09 |
J | 100.0% | 0.0% | 0.0% | 0.0% | 0.0% | 1.00 |
D | 59.0% | 24.0% | 4.0% | 6.0% | 7.0% | 1.33 |
F | 87.1% | 8.6% | 4.3% | 0.0% | 0.0% | 1.08 |
H | 100.0% | 0.0% | 0.0% | 0.0% | 0.0% | 1.00 |
E | 89.6% | 10.4% | 0.0% | 0.0% | 0.0% | 1.05 |
I | 86.5% | 13.5% | 0.0% | 0.0% | 0.0% | 1.07 |
C | 100.0% | 0.0% | 0.0% | 0.0% | 0.0% | 1.00 |
Test results
Table 2, consolidates the results for the 3 runs of the multicast ping test described above. It clearly indicates that multihop paths do form in the ten kids under the tree scenario.
node | DPLI | stdev |
---|---|---|
A | 1.07 | 0.08 |
C | 1.05 | 0.07 |
B | 1.07 | 0.08 |
E | 1.04 | 0.06 |
D | 1.06 | 0.07 |
G | 1.05 | 0.07 |
F | 1.05 | 0.07 |
I | 1.04 | 0.05 |
H | 1.06 | 0.07 |
J | 1.06 | 0.08 |
Global result: 1.05 (stdev = 0.01)
The answer
Yes. It is true that multihop paths form even in a Kids under the Tree scenario. Exactly 9.99% of the data frames captured in this test were transmitted over a multihop path. Is it excessive? What are the root causes for this? Keep reading ;-)
Question #2 - Does PDM traffic self-interfere?
We already confirmed that multihop paths do form in the ten kids under the tree scenario, now it is time to investigate this deeper.
In the "ten nodes under the tree scenario" the measured was DPLI 1.05, maybe not too high, but can it become intolerable in case more nodes are added or more traffic are generated? Another suspicion worth investigating is that path discovery traffic can self-interfere. There is a lot of sense to this, since PDM is bursty traffic.
The test
To investigate this, we varied the route expiration time in the ten kids under the tree scenario and repeated the same test as above (*** link). We calculated the DPLI and also the percentage of data frames that were transmitted over multihop paths.
Test results
Route exp time (s) | Mean DPLI | Stdev DPLI | Multihop frames % |
---|---|---|---|
5 | 1.09 | 0.02 | 16.58 |
10 | 1.05 | 0.01 | 9.99 |
15 | 1.05 | 0.01 | 8.72 |
20 | 1.05 | 0.01 | 9.65 |
25 | 1.04 | 0.01 | 7.44 |
30 | 1.03 | 0.01 | 6.24 |
The answer
Yes. The more you have path discovery traffic the longer your paths will get. But why? That's question #3
Question #3 - Why more traffic will cause longer paths?
Since PDM traffic self-interferes it seems safe to assume that any traffic will interfere with the path discovery traffic and, as a result, increase the average path length. But why?
Cause #1: Not all nodes decode the same frames
A common misunderstanding about dense wireless environments is that every node will capture the same traffic simply because they are very close and share a common transmission area. If this was the case, either a frame would be received by all the nodes or a collision would render a frame useless to all of them. But this is not true at all.
- Obstructions and reflections (the unpredictable multipath effect) change the probability that a frame is successfully decoded by a node. Even nodes some centimetres apart will experience different signal levels.
- The internal condition of the node (from the signal levels to the protocol stack) may cause a frame that was successfully decoded by a neighbour to to be lost.
- Slight differences in the antennae gain and in the radio sensitivity between two neighbours (to name only two items) are normal.
- And, even if there were no obstructions to the path between any two nodes or no reflections were introduced by walls for example; and even if two nodes were perfectly identical and all the internal state were completely synchronized; even under these ideal circumstances two nodes will not always successfully decode the same subset of the transmitted frames because of the capture effect capture effect (not to be confused with the channel capture effect).
To prove that the assumption in cause 1 is true, we performed some simple tests.
Why the fact that cause#1 is true will cause multihop paths?
The answer to question #3 might be related to two things:
- The way links costs are associated to given datarates, i.e. the metrics calculations of the PDM
- the fact that not all nodes decode the same frames
Path costs and path length
Table 4 brings the first twelve combinations of link costs (The complete table is here)
total cost | datarates for each link |
---|---|
13 | 54 |
26 | 54,54 |
28 | 36 |
39 | 54,54,54 |
41 | 54,36 |
46 | 11 |
52 | 54,54,54,54 |
54 | 54,54,36 |
56 | 36,36 |
59 | 54,11 |
64 | 1 |
65 | 54,54,54,54,54 |
Hypothesis Formulation
Although the receptors experience roughly the same radio environment it is not true that they will successfully decode the same subset of the transmitted frames and this makes the following scenario possible:
Node "A" broadcasts a path request at 54Mbps to find node "B". Node "B" does not decode this frame, but some other nodes around, say node "C" does. Some milliseconds before that, node "C" will rebroadcast the PREQ from A, and this time, "B" will respond and listen with a "PREP" and you have a two hops path, where a single hop would be preferred (in principle). Note that it doesn't make any difference if B receives or not any of the other PREQs sent from A (at the datarates of 36, 11 or 1Mbps). And this is because a two 54Mbps PREQ (second line of table 4) will beat one 36Mbps (third line of table 4).
But how can we design a test that proves (or disproves) this hypothesis?
Testing the hypothesis
[work in progress]
If we can prove that an PREQ would be successfully decoded by a group of nodes, but not by the intended destination of the PREQ, that would at least indicate that our hypothesis is correct. But would this be the only possible answer to question #3? If Hypothesis #1 is correct how much would it account for the observed phenomenon that traffic causes longer paths in dense mesh clouds?
This test was designed to compare what two nodes can decode...
Node | total captured frames | from A | to A | PREQs from A | PREQs from A 54 | from J | to J | PREQs from J | PREQs from J 54 |
---|---|---|---|---|---|---|---|---|---|
E | 13,618 | 1678 | 469 | 145 | 29 | 1790 | 430 | 145 | 36 |
F | 13,496 | 1687 | 466 | 150 | 32 | 1681 | 428 | 140 | 36 |
S | 15,140 | 1637 | 340 | 142 | 27 | 1341 | 310 | 107 | 16 |
Node | total PDM frames | PREQs | PREPs | 54 | 36 | 11 | 1 |
---|---|---|---|---|---|---|---|
E | 7334 | 326 | 4525 | 1741 | 1947 | 2992 | |
F | 7298 | 328 | 4484 | 1735 | 1923 | 2978 | |
S | 6389 | 330 | 3565 | 3813 | 1976 | 3373 |
wlan_mgt.fixed.mesh_action == 0x0000
(wlan_mgt.fixed.mesh_action == 0x0000 ) && (wlan_mgt.fixed.sa == 00:17:c4:02:2f:59) && (wlan.sa == 00:17:c4:02:2f:59)
(wlan_mgt.fixed.mesh_action == 0x0000 ) && (wlan_mgt.fixed.sa == 00:17:c4:0c:e8:eb) && (wlan.sa == 00:17:c4:0c:e8:eb)
In another experiment, 4 XOs (B,D,G,I) captured traffic while 2 other XOs (A,J) exchanged UDP datagrams traffic.
node | fromA | from J | from A via J | from J via A |
---|---|---|---|---|
A | 100 | 97 | - | - |
B | 98 | 100 | 98 | 97 |
D | 98 | 100 | 98 | 97 |
G | 98 | 98 | 97 | 97 |
I | 98 | 100 | 97 | 97 |
J | 98 | 100 | - | - |
S | 98 | 100 | 98 | 97 |
Not only the quantities captured differ but in some cases, the number is the same but the subset of frames captured is different.
Although G and I both captured 97 frames retransmitted by J the subset is slightly different (96 are the same) that's the case of the 97 frames that G cap
Not only some frames are lost
wlan.bssid == 00:17:c4:0c:e8:eb and wlan.sa == 00:17:c4:0c:e8:eb and ip.dst == 224.2.3.72
wlan.bssid == 00:17:c4:02:2f:59 and wlan.sa == 00:17:c4:02:2f:59 and ip.dst == 224.2.3.72
wlan.bssid == 00:17:c4:0c:e8:eb and wlan.sa != 00:17:c4:0c:e8:eb and ip.dst == 224.2.3.72
wlan.bssid == 00:17:c4:02:2f:59 and wlan.sa != 00:17:c4:02:2f:59 and ip.dst == 224.2.3.72