Presentations/August 2008 Networking Talk
Jump to navigation
Jump to search
On August 26, 2008, Ricardo Carrano presented "Recent Investigations and Future Developments in the Wireless Front".
Slides are here.
Notes
- ricardo's talk on networking
- quasi-transcription notes by cscott
- implementation suggestions
- I1: detect and adapt
- a user mode daemon that estimates network environment
- sparse vs dense, etc
- infra vs mesh
- xo vs active antenna
- and tweaks various network parameters to match
- long digression as we fail to agree what a "dense mesh" means
- i contend the measure should be local, others argue for measures based on total numbers of connected machines, etc
- another parameter for estimation: overall noise level -- quiet, or noisy like 1cc?
- "Mesh Adaptation Daemon"
- next slide. parameters to be measured by the MAD:
- idle denseness / active denseness / congestion
- mobility / link quality
- ac powered / battery powered / low battery
- we forward packets for the mesh only if we have sufficient battery, eg.
- Density vs Multicast Rate: increase speed (which also increases error rate) as density increases.
- AC powered: if AC powered, we can also assume mobility is low, which then means we can increase route expiration time and rreq_delay.
- we can also use path errors, and denseness/congestion status (as well as power status) to estimate mobility
- back to density vs multicast rate. increased speed also decreases reception distance, so increase speed only if we think we're dense enough
- power vs metrics: if a node runs on battery, we should advertise worse metrics, so that it is not preferred for routes.
- OK, moving on to implementation point 2
- I2: Management traffic
- we're talking about beacons, probe request/response, etc.
- reduce the amount of traffic we generate here
- graph of beacon frequency vs number of nodes
- for # of nodes from 1-10
- 1 XO: 9Hz beacon
- 10 XOs: 12 Hz
- anecdotally: 50 XOs: <20 Hz.
- but still: 1Hz would be enough. That would save 1% of airtime.
- next slide: probe storms.
- one XO sends a probe, everyone tries to respond at once, fails, and then we start trying to retry, etc.
- the slide shows only 10 laptops, saturating the network during one of these storms.
- proposal: only retry twice, not 9 times.
- this should improve us from 20 to 25 laptops, roughly. not a huge improvement, but worthwhile.
- michalis: the source of these probe storms is network manager scans; NM scans every 2 seconds
- if not associated, then 2 seconds, if associated then it backs off to 2 minutes or so.
- unless you are in ad hoc mode, you can get away with a totally passive scan; we should do this.
- proposal: we should be doing a passive scan.
- we have a switch to do a partly-passive scan: we send out the probes, but we disable the responses from the XO. this isn't turned on by default.
- what we *do* currently is reduce the number of retries from 10 (the default) to 2.
- Implementation Proposal 3
- I3: Rate Adaptation Logic
- XO can transmit frames at many data rates; we should use the highest we can get away with
- the higher the rate, the less airtime it consumes (but the higher the probability of corruption)
- Marvell's firmware uses ARF, the first algorithm created to do rate adaptation.
- we try to broadcast at highest rate. if it fails three times, fail down to next lower rate, repeat.
- if we are successful 10 times, then try to increase the rate by one step.
- main issue:no distinction between failures due to noise and those due to congestion.
- so in a congestion environment, we fail and thus lower the rate, which makes things worse: now even more congestion!
- so more transmissions fail, and we lower the rate even further, etc.
- this is mesh mode only; in infrastructure mode the AP mediates the rate adaptation algorithm.
- in this next slide, CBR = constant bitrate. We're adding a steady stream of 1500 byte frames at 50ms intervals.
- This problem can't be fixed in the current generation of the marvell chipset, due to memory limitations
- the workaround for the current generation is MAD (again)
- we estimate congestion, and determine when the rate adaptation algorithm is just making things worse, and set the hardware to forbid rates below (say) 22Mb.
- this prevents us from falling all the way down to 1Mb and making the congestion 50x worse.
- Implementation note 6
- I6: Metrics
- Costs associated with probe requests at various bit rates.
- Currently 54Mbps=11, 36=28, 11=46, 1=64.
- Proposed values: 54Mb=963, 36=1073, 11=1997, 1=12906
- and for active antenna: 54Mb=962, 36=1072, 11=1996, 1=12905. (ie, one better)
- this prefers routes via the active antenna
- also, the difference between 11Mb and 1Mb more accurately reflects the amount of airtime taken by the lower rate.
- Better yet, use MAD to take other metrics into account, like battery and mobility.
- http://wiki.laptop.org/go/Path_discovery_metric
- it's not all about airtime, although airtime is important.
- also: queuing frames at intermediate notes: memory and CPU requirements of this.
- we renormalize the costs so that everything is time based
- factoring in the times required to queue a hop, so that they are directly comparable to the airtimes.
- by biasing the active antenna slightly down, we go via the active antenna when it's convenient.
- some confusion here
- ricardo clarified that *path metrics* are not a reasonable means to fix congestion issues
- even though other *network parameters* can be used to address congestion (like, say, beacon rate)
- Implementation note 7
- I7: NWB efficiency
- NWB = Network Wide Broadcast
- We are using a simple flood fill algorithm when we need to reach all the nodes in the mesh.
- we can't remove broadcast entirely, because some information inherently needs to reach all the nodes: presence info, and path discovery mechanism.
- proposal: SBA (Scalable Broadcast Algorithm)
- Skipping I8, I9, which are nortel recommendations.
- I10: Route Expiration Time
- Paths time out after X seconds.
- X=10, according to ricardo.
- Slide: colorful graph
- 10 laptops, pinging a multicast address once a second
- x axis is real time, showing periodicity of the network utilization
- y-axis is airtime utilization.
- tradeoff between timeout and mobility
- however, we redo path discovery if we see the path is broken
- so this is really a path optimiality tradeoff: how long do we keep using a suboptimal path which is not completely broken.
- proposal: immediately double the route timeout to 20 s
- iwpriv msh0 route_exp_time 20 <- something like this.
- I11: Contention window
- how long we wait to see if the airtime is being utilized
- currently XO uses [7,15] window
- standard values at [31, 1023]
- proposal: use the standard.
- one experiment: we are retrying 71% of the time. switching to standard value dropped this to 25%.
- for scaling to larger numbers of contending nodes, we may need to investigate more sophisticated contention management strategies
- skipping diagnose and test slide.
- princeton slide: what they've got ready to go
- hash cache: more more efficient than squid
- squid: 10% of storage required in memory for index
- tcp improvements: tell tcp up front what bandwidth to expect.
- applicable to single hop stuff; may be applicable to multi hop mesh (not clear)
- planet lab: mechanism to deploy and manage school servers
- next slide: thin firmware
- it's in 2.6.27.
- thin firmware enables: XO as access point
- we can also then run open80211s (o11s)
- slide lists the stuff which is implemented to date.
- digression here about open80211s; apparently the o11s implementation adds even more management traffic to the spec
- trying to allow multiple essids to share the same spectrum
- 80211s targetting in-home multimedia networks
- 8.2 recommendations.
- wireless: new driver in 2.6.25; firmware 22.p18.
- collaboration: need to generate failure logs and send them to collabora