Path Discovery Mechanism/Details

From OLPC
Jump to navigation Jump to search

This is a (more) detailed version of this brief introduction to the Path Discovery Mechanism.

The Mechanism in detail

PREQ clusters

After the initial cluster of PREQs were broadcasted, intermediary nodes will begin to rebroadcast them and flood the entire mesh cloud – a process called Network Wide Broadcast, or NWB – in an attempt to reach the destination. In the current implementation, the first PREQ received by an intermediary node will be immediately rebroadcast, but a delay time will be respected before additional PREQs are forwarded. During this delay time, the node may receive multiple PREQs, not only with different metrics, but also from different transmitters, but only the PREQ with the best metric will be rebroadcast after this time expires. This mechanism will avoid unnecessary consumption of airtime. After all, reactive protocols can be extremely bursty and rebroadcasting every single PREQ a node receives would make things even worse.

After the PREQ delay period is finished (by default this time is 10ms) the intermediary node may still receive additional PREQs. In this case, a new timer will be fired and again, the node will retransmit only the best PREQ received, when the timer expires.

But an intermediary node will not simply rebroadcast a single PREQ, it must actually make a new PREQ cluster based on the first PREQ received and also for every best PREQ received over the delay time period. This means that an intermediary node will rebroadcast at least a complete cluster (4 frames) and maybe more, depending on the configured delay time and on the mesh density. It is an inherent treat of a dense cloud that an intermediary node will receive many PREQs coming from different paths, with different metrics and hop counts. It is not impossible that PREQs with lower costs (better metrics) are received after PREQs with higher metrics. Thus, in a dense mesh, a short delay time will pose a higher control overhead on the cloud.

PREPs

When D finally receives the PREQ, it will answer with a PREP frame destined to A. Contrary to the PREQs, which are broadcast frames, a Path Pesponse (PREP) is a unicast frame that will follow the reverse path from D to A (see next section).

It is possible, even probable, that D gets more than one PREQ, coming from different paths. That's when the metrics associated to each of the PREQs come into scene - D will answer to the PREQ with the lowest cost.

Being unicast frames, PREPs are inherently more reliable than PREQs (which on the other hand are more numerous) since unicast frames are expected to be acknowledged by the receiver (or the transmitter will send them again)

Forward and reverse paths

But how can a PREP travel back from D to S in a series of unicast transmissions? The answer is the forming of a reverse path. Every time an intermediary node forwards (rebroadcasts) a PREQ it keeps the mac address of the node from which the PREQ was received (its predecessor). This will eventually form a reverse path from D to S, that will be traversed by the PREP if the path happens to be selected by D

Likewise, as the PREP is forwarded from D to S every intermediary node that participates in the newly formed path will know that it is part of the selected path from S to D - the forward path.

PERRs and Path maintenance

Every time a node fails to transmit a frame to the next node in a forward path, it should notify the source of such transmission (S to keep with the example) that this path is not working anymore. In order to do this the former will unicast a PERR (Path ERRor) frame all the way back to S (another use of the reverse path), which should start a new path discovery cycle.

Even if functional, paths will expire after some time (current default is 10 seconds). This will prevent the mesh to be frozen into a sub-optimal topology, i.e. will keep the paths fresh.


Additional remarks

Management and Maintenance

Once discovered, paths are kept in the forwarding table and will be periodically refreshed (current default is 10 seconds). More information on this can be found here.

IEEE802.11s flags

For those familiar with the IEEE802.11s draft it is worth noting that there is no DO or RF flags present in the XO path discovery frames. In fact, there is an implicit DO flag, since an intermediary node will not respond to PREQs.

Coverage and reliability

Since broadcast frames are not acknowledged they lack any retransmission mechanism, this fact alone advocates for some level of redundancy as necessary or many path discovery cycles would not succeed. On the other hand, another means for improving broadcast/multicast reliability is reducing the transmission data for multicast/broadcast frames. But lower transmission rates mean longer transmission times. There is a clear tension between coverage and reliability on one side and spectrum efficiency on the other.


The Mechanism in action (some examples)

A simple step-by-step

In short, a description of the path discovery mechanism implemented in the XO follows:

(0) S wants to discover a path to D.

(1) S will check its forwarding table for a valid path to D. A valid path will be one not expired. Route expiration time defaults to 10 seconds, currently.

(2) If (1) fails, S will broadcast a cluster of PREQs, consisting of four frames sent back to back at the data rates of 54, 36, 11 and 1Mbps. Each of these frames will have an associated cost.

(3) All intermediary nodes will rebroadcast the first PREQ received in a new cluster. If, for example, the intermediary node successfully decodes the 54Mbps PREQ from S, it will immediately broadcast a new cluster with values 13+13, 13+28, 13+42 and 13+64 for PREQs at 54, 36, 11 and 1Mbps, respectively.

(4) An intermediary node will wait a configurable time – the rreq_delay – before relaying another PREQ cluster if a PREQ with a better associated cost happens to be received later. Note that it is perfectly possible for a node to listen to the 11Mbps PREQ from S (with cost 42) and only then receive another PREQ, relayed by another intermediary node, in a two hops path, with a lower cost of 26. All but the first PREQ received (3) will be stored for rreq_delay before being retransmitted. And the intermediary node will only retransmitted the best PREQ received during this period. The rreq_delay currently defaults to 10ms.

(5) The PREQ will eventually hit the destination node D which will respond with a PREP. At this point the procedure is the same of what is described in the 802.11s proposal, besides the fact that if D needs to send traffic back to A, a new path discovery will start. Note that this is only necessary if the application that triggered the path discovery mechanism at A is bidirectional, which is the case for all TCP applications, but not necessarily the case for UDP-based applications.

An ARP resolution

The following example can give an idea of how basic network operations can become complex when performed over an ad-hoc multihop network. In this case, we suppose that a node A wants to ping a node D that is not within the limits of A's transmission range.

(1) A broadcasts an ARP request for finding the MAC address of node D.

(2) The ARP packet must reach all the corners of the mesh cloud, so every node will rebroadcast it (Network-Wide Broadcast).

(3) Eventually the ARP request reaches D which will start a path discovery in order to send an ARP reply. So, Dwill broadcast a PREQ cluster to discover a path to A.

(4) There is a new flood. Every intermediary node will rebroadcast the PREQ.

(5) Eventually D's PREQ will reach A which in turn will respond with an RREP.

(6) Nodes in the return path (A -> D) will retransmit the PREP.

(7) The PREP reaches D which is now able to send the ARP reply to A.

(8) The ARP reply will be send through the forward path (D -> A)

(9) Node A receives the ARP reply and must now start a path discovery to D. So it will broadcast a PREQ cluster to discover a path to D.

(10) The PREQs will be rebroadcast as in step 4.

(11) D will respond A's PREQs with a PREP

(12) Nodes in the return path (D -> A) will retransmit the PREP

(13) The PREP reaches A which is now able to send the first icmp echo request (ping) to D

(14) The echo request will follow the path from A to D

(15) D will receive the echo request and respond with an echo reply

(16) The echo reply will follow the path from D to A (which is not necessarily the same path as A to D)

(17) A receives the echo request.

(18 on) If new echo requests are to be sent from A, steps 13 to 17 will be repeated until the route expires or one of the links break, generating an PERR element. In both cases, a new path discovery mechanism will be started to establish a new path.


Analyzing PDM traffic

Wireshark Tool

In order to see the PDM traffic in action, you will need a patched version [provide links] of the Wireshark tool.

The specific filters for each of the PDM frames are:

For all PDM frames:

wlan_mgt.fixed.mesh_action

For Path Requests:

wlan_mgt.fixed.mesh_action == 0x0000

For Path Responses:

wlan_mgt.fixed.mesh_action == 0x0001

For Path Errors:

wlan_mgt.fixed.mesh_action == 0x0002

Airtime tool

In order to check the airtime consumption of the PDM traffic, you can use the airtime tool with the filters (-w field) given above.

PDMAnalysis tool

This tool will be released soon and will give a lot of statistics on the PDM mechanism (due to end June -- Ricardo Carrano)