Network2
Introduction
Last updated: Michael Stone 19:10, 25 July 2009 (UTC)
This document proposes a design for networking based on previously realized Network Principles. It then explores and elaborates the design with analysis, example configuration, and experimental results after which it concludes by crediting those who have contributed to the design and by explaining future work inspired by current results.
Its abstract purpose is to advance the Network Principles project by explaining how you might build a system based on those principles with currently available tools and by doing a first round of modeling and prototyping in order to gain some analytic and empirical evidence about whether those principles are sound.
Its concrete purpose is to provide internetworking and naming technology to XO-users (and other interested parties) that seamlessly and predictably supports the XO's most important low-latency network scenarios as well as is possible with existing software.
Its social and conceptual purpose is to provide a design that is satisfactory in several ways in which previous networking and collaboration substrates were not, as described in the following principles of design quality:
- do no harm -- our users are not volunteers, so don't waste their time
- play well with others, since we want a large ecosystem and lots of testing
- be realistic, so that we don't promise the impossible
- be predictable, so that we can tell people what will work and what will fail in advance
- prevent failure, by means of proof, simulation, and wise habits
- tolerate failure, by removing inappropriate single points of failure
- route around failure, by means of self-test procedures and preplanned maneuvers (manual overrides)
Notes on quality principles:
- do no harm means that we believe that previous designs unnecessarily harmed their users by means of waste of scarce resources (time, trust, capacity to learn), confusion, over-promising, lock-in, and failure to meet reasonable "go/no-go" requirements, e.g. on availability.
- realism and predictability are intended to evoke the following "litmus test" questions:
- how well does the design conform to the physical realities (bandwidth, latency, power, failure, and error) and to the social realities (ignorance, interdiction, authority, and autonomy) that define its niche?
- is there a public, written, and peer-reviewed design document describing the design?
- prevent, tolerate, and route around are all direct usability goals that no networking design intended for real humans (particularly by teachers!) should ignore
When judging, please also note that the design is not yet complete in several important respects:
- it has only a stub of a bandwidth model, hence we yet know how much it costs to scale it up
- its self-test algorithm is not yet written, (though good diagnostic primitives are systematically identified)
- it lacks truly clear implementation guidance and comprehensive sample code, and
- there are unresolved questions about
- how routing and timeouts should be configured so that peers search their target address space in a useful fashion
- how communications security might best be provided.
- it lacks an "integration and deployment" plan outlining how to get it adopted.
Design
Network Architecture
We want to offer maximally efficient and robust support for our ideal network scenarios (nos. 1 and 9, denoted with bold text, below) while offering seamless support for optional network enhancements like fancy links, routers, tunnel endpoints, and transit agreements that may be provided by the surrounding ecosystem of deployment organizations, universities, individuals, and commercial entities.
Network Scenarios:
- access to at least one shared-media link.
- a more efficient link, like an 802.3 switch or an 802.11 access point.
- a bridge, like an XS or a good access point, between two or more otherwise separate single-link networks.
- a local router, like an XS, routing between two or more otherwise separate (but potentially complicated) local networks
- a restrictive local router which provides some IPv4 connectivity but which drops IPv6 traffic
- credentials for some sort of dedicated local tunnel endpoint (like a SOCKS proxy or an HTTP proxy)
- a remote router offering us some sort of access to a larger internetwork, typically via (perhaps restricted) IPv4
- credentials for some sort of dedicated remote tunnel endpoint (like an SSL or IPsec VPN or a 6to4 tunnel, etc.)
- a remote router offering great access to a larger internetwork
Based on these scenarios, we imagine our network as being organized into three kinds of composable layers:
- a link layer, usually implemented via 802.3 wired Ethernet, 802.11b/g wifi in either ad-hoc or infrastructure mode, or various sorts of tunneling over IPv4, perhaps across NATs and firewalls,
- an internetworking layer, based on IPv6 (tutorial documentation), and
- a naming layer, based on DNS, for binding logical addresses from networks with different failure modes to stable human-memorable names
We find this layered conceptual model helpful for estimating dependency ("what has to work before this layer can work?") and cost ("what does it cost to pass through layer?").
IPv6 Configuration
Peers:
Your job is to be an IPv6 node. Consequently, when you bring up your interfaces,
- You might discover an IPv6 router advertising on one of your links.
- (See sysctl net.ipv6.conf.all.accept_ra and related variables.)
- You might try out dhcp6c.
- You might have some kind of IPv4 connectivity. If so, connect to the Internet or to other internetworks of your choice.
- Use dnshash to add guessable link-local addresses to all your interfaces.
Servers:
Your job is to be an IPv6 router and a DNS server. One of several situations might obtain:
- You might discover an IPv6 router advertising one or more IPv6 prefixes on your outbound link(s).
- You might have some kind of IPv4 connectivity. If so, connect to the Internet or to other internetworks of your choice.
- You might be under a tree. If so, generate a Unique Local Address prefix.
- (Use dnshash to add guessable link-local addresses to all your links?)
When done, use radvd or dhcp6d to share addresses.
DNS Configuration
One of the server's most important jobs is to get itself on appropriate internetworks so that it can dynamically map stable (DNS) names to unstable names (IPv6 addresses) for itself and its peers.
Discovery:
Peers need help locating one or more DNS servers. See RFC 4339 for available mechanisms; pay particular attention to RDNSS discovery.
Update
Here are two approaches for solving the update problem, based on how peers might want to communicate with DNS servers:
- Use a DNS UPDATE client like ipcheck or ddclient with shared keys with a DNS server like BIND.
- Run a bespoke control protocol over an existing secure tunnel, e.g. something based on with XML-RPC over HTTPS + client certs or on access to a restricted shell over SSH.
(NB: In order to perform this update, it will usually have been necessary for the peer to have been cryptographically introduced to the server.)
Unfinished Ideas
Security
This optional section is included merely to offer some hints about where we think communications security ought to be headed.
- Spoofing, Integrity, Confidentiality. See communications security and petnames for some background. A very rough road along which something reasonable might lie:
- Use physical introduction to CNAME cscott.michael.laptop.org to <key>.cscott.laptop.org.
- Then, my dnscurve-compatible DNS resolver will refuse to give me addresses unless the nameserver I contact for cscott proves knowledge of cscott's private key.
- Then I have a nice basis with which to configure IPsec security associations.
- System Integrity
- DoS
Performance
Wad points out that people writing software are probably going to want some help figuring out what routes are best to use. (e.g. in terms of bandwidth, latency, jitter, integrity, confidentiality, availability, ...)
Analysis
Bandwidth Usage
Several important numbers that we need to predict and to measure:
tx == transmit, rx == receive, btx == broadcast btx/tx/rx - ICMPv6+IPv6+phys - router discovery (RD) btx/rx - ICMPv6+IPv6+phys - duplicate address detection (DAD) tx/rx - ICMPv6+IPv6+phys - NS neighbor discovery (ND) tx/rx - UDP+IPv6+phys - DNS query tx/rx - JSON+SSH+TCP+IPv6+phys - DNS update where "phys" describes the equations' dependence on the "physical" layer's frame overhead and MTU notable "phys" layers: Ethernet -- ad-hoc wifi, infra wifi, 802.11s mesh, switch, hub TLS+UDP+IPv4 -- openvpn L2TP+IPsec+IPv4 -- raccoon, isakmpd, openswan, etc. UDP+IPv4 -- teredo
Debugging Techniques
Start recording a typescript so that we can see what you did.
TESTDIR=`pwd`/testing mkdir -p $TESTDIR && cd TESTDIR script ulimit -c unlimited
Check that you've got the right DNS name for the person you want to talk to.
NAME=the.right.person echo $NAME > peer
Dump your addresses, routes, and perhaps your open connections.
hostname --fqdn | tee host ip addr show | tee addrs ip route show | tee ipv4_routes ip -6 route show | tee ipv6_routes netstat -anp | tee conns
If you have wireless devices,
iwconfig | tee iwconfig iwlist scan | tee iwlist_scan
Fire up tcpdump:
tcpdump -w packets -s0 &
Resolve that name to addresses. Check that the addresses seem sane.
dnshash lookup $NAME | tee peer_addrs_dnshash dig $NAME | tee peer_addrs_dig
See who's answering broadcasts:
ping6 -I $IFACE ff02::1
Route to the addresses:
ping6 -I $IFACE $ADDR | tee ping traceroute6 $ADDR | tee traceroute tracepath6 $ADDR | tee tracepath
Connect to the address:
nc6 $ADDR $PORT # echo "SSH-2.0-Hi" | nc6 $ADDR 22 # printf "GET / HTTP/1.0\r\n\r\n" | nc6 $ADDR 80 # ssh $ADDR # curl -I http://$ADDR/ # ...
Conduct a bandwidth test:
iperf -c -V $ADDR
Collect logs from your application and send them to developers:
kill -SIGINT %1 cd .. tar c $TESTDIR | lzma -c > logs.tar.lzma
Self-Test Algorithm
In order for things to "just work", there are many subgoals that need to be satisfied. The purpose of the self-test algorithm is to speed up debugging by quickly and reliably identifying subgoals whose named requirements are satisfied but whose characteristic test fails.
The form of the self-test algorithm will be a decision-list which may, in the future, be incorporated into software.
A rough outline of that decision list is:
Do we have all the network interfaces that we should? Is each interface attached to a link? Does each interface have a link-local address? Is every interface able to ping itself? Does link-layer broadcast return responses? Does network-layer broadcast return responses? # assuming that we have a partner on the same link Can we ping our partner? Can we hear our partner pinging us? Does there seem to be reasonable bandwidth on our link? # assuming we have a link-local partner with a name Do we and our partner have byte-identical names written down? Can we both resolve the name to a link-local address? Do we get the same address? Can we both ping the address? Can I connect to a service running at the address (e.g. ssh) # assuming that we have a router Can we ping our router? Can we traceroute someone upstream of the router? ...
Advice for Coders
There are two critical changes that you'll need to make to your design in order to really make it sing.
First, you'll want to add some mechanism for your users to type in hostnames that they want you to connect to. This lets them do all sorts of cool stuff like:
- copy-and-paste links from websites or cerebro
- type in names from a physical display like a blackboard or a handout,
Second, you'll want to be prepared to re-resolve names in order to get fresh addresses each time your connectivity changes. For the time being, you should do this by calling libc's getaddrinfo() function.
Third, go check out SCTP (wikipedia, man page). It's support for multi-homing, multi-streaming with and without ordering guarantees, and for updating the addresses you're using to talk to your peer on the fly seem particularly serendipitous.
Advice for Deployers
Ask your ISPs to provide IPv6 prefixes or tunnel endpoints. After all -- if none of their customers ask, then what incentive will they ever have to upgrade?
Failing that, see if you (or a local university?) can afford a public IPv4 address -- even if it's dynamic. If so, you can be many sorts of tunnel endpoint.
Regardless, if you manage to get a globally reachable IPv6 address by any means, then you can provide a DNS server for your kids and it can direct them to one another and to any other services that you feel like pointing them at.
Experiments
Link-local configuration
Try out dnshash on an isolated access point, ad-hoc network, switch, or hub.
Observations: very pleasant!
VPN server configuration
In this experiment, we're going to configure openvpn and radvd on a machine (teach.laptop.org) with a public IPv4 address. Truthfully, this combination is probably overkill, but the task of constructing it seemed like it might to offer valuable experience, e.g. for someone who wants to bridge multiple kinds of tunnel endpoint or who wants to load-balance lots of peers between a couple of endpoints.
# Install our VPN and route advertisement software. apt-get install openvpn radvd # yum -y install openvpn radvd # add nobody:nobody groupadd nobody useradd nobody usermod -a -G nobody nobody # Configure radvd cat > /etc/radvd.conf <<EOF interface tap0 { AdvSendAdvert on; MinRtrAdvInterval 30; MaxRtrAdvInterval 100; prefix 1234:db8:1:0::/64 { AdvOnLink on; }; }; EOF # enable forwarding everywhere sysctl -w net.ipv6.conf.all.forwarding=1 # flush the forwarding table ip6tables -F FORWARD # really, I /want/ a multi-user version of # openvpn --dev tap --user nobody --group nobody --verb 6 # but I'm not sure how to get that. instead, I'll use some fake keys and no ciphers. mkdir -P keys && cd keys wget http://teach.laptop.org/~mstone/sample-keys.tar.bz2 tar xf sample-keys.tar.bz2 && cd sample-keys # create a multi-user tunnel openvpn --mode server --client-to-client --dev tap --user nobody --group nobody --verb 6 --opt-verify --tls-server --client-connect /bin/true --auth-user-pass-optional --duplicate-cn --auth-user-pass-verify /bin/true via-env --dh ./dh1024.pem --ca ./ca.crt --cert client.crt --key client.key --script-security 3 --auth none --cipher none & # at any rate, bring up the interface so that we get link-local addresses ip link set tap0 up # turn on the route advertisement daemon radvd -d 5 -m stderr &
VPN client configuration
The purpose of this experiment was to test the VPN configuration described immediately above.
# install vpn client apt-get install openvpn # yum -y install openvpn # add nobody:nobody groupadd nobody useradd nobody usermod -a -G nobody nobody # download fake keys. mkdir -P keys && cd keys wget http://teach.laptop.org/~mstone/sample-keys.tar.bz2 tar xf sample-keys.tar.bz2 && cd sample-keys # connect to the vpn openvpn --user nobody --group nobody --dev tap --remote teach.laptop.org --tls-client --ca ca.crt --cert ./client.crt --key client.key --auth none --cipher none & # bring up the interface ip link set tap0 up # find other people ping6 -I tap0 ff02::1 # if using dnshash, attach dnshash attach <your>.<domain>.<name> # ... test, as described above ...
Observations:
- TLS imposes a high latency cost, even with null algorithms.
- TAP devices work rather nicely, at least for tiny networks.
- Be careful of firewall rules!
- radvd is perhaps unnecessary with a single virtual ethernet -- dnshash "suffices" -- though it might be useful for routing between several load-balanced ethernets.
- The default IP sorting rules and route priorities mean that it may take a long time for a connecting app like ssh or nc6 to connect to the /correct/ dnshash address.
Credits
(If you've contributed and don't see your name, don't fret -- just add yourself with a word or two explaining your contribution!)
- Michael Stone [none] (writing)
- C. Scott Ananian [OLPC] (architecture,teaching)
- John Watlington [OLPC] (architecture,editing)
- Robert McQueen [Collabora] (prior work,critique)
- Dafydd Harries [Collabora] (prior work,critique)
- Polychronis Ypodimatopolous [MIT] (prior work,critique)
- Cortland Setlow [Tower Research Capital] (testing)
- Andres Ambrois [] (design,testing)
- Benjamin Schwartz [Harvard] (critique,publicity)
- Tabitha Roder [] (testing)
- Avi Kelman [] (editing)
Future Work
- Per-host networks and per-app IPs and names.
- Sample code.
- Designs for higher protocols like discovery, presence, and health.
- Analysis of the costs of our guarantees, in the style of Stuart Cheshire's "network dynamics".
- Relationship with delay-tolerant networking and sneakernets.