DNS --- Distributed systems: common features - content name: what a user is looking for [e.g., file name in FS, URL] - host address: where the resource is [e.g., IP address, inode, phone #] - routing mechanism: how to get to destination [e.g., BGP, inode structure] - lookup mechanism: binding between name and address, resolve name to address Name servers – basically a distributed system that supports two operations - address = resolve(name) - bind(name, address) Three structures you need to think about when building a name system... ----------------------------------------------------------------------- - structure of the name space - syntax of names (also types of records, extensibility) - what structures are possible? (flat, hierarchy, ...?) - administrative structure - which servers store records for particular names? how does lookup proceed? - what structures are possible (flat, hierarchy, ...?) - authoritative structure - who is allowed to bind names? how is authority delegated? - claim: probably should follow same structure as administrative ==> DNS: all three structures are the same - name space == authority -- delegation of control over sub-namespaces -- hierarchy of trust -- must believe parents behave correctly - name space == administrative -- use namspace as clues for navigating lookup path -- hierarchy means without caching, lookups intense at root - authority == administrative -- person that creates bindings is person who operates server -- convenient from management perspective DNS structure ------------- - show a picture - Root servers [a.root-servers.net. – m.root-servers.net.] - TLD servers [gTLDs .com/.biz… and ccTLDs .us/.ca/..] - etc. downwards - [ have .Berkeley.edu zone include math but not cs] - zone transfers and secondaries - zone = unit of name distribution and replication -- delegation of authority at zone boundary ["SOA" records] - three cases for name resolution -- 1. name is in server's zone --> server does local DB lookup -- 2. name is in delegated zone --> server refers client to child server -- 3. other --> server refers client to root server - stub resolvers (requests recursive lookup on behalf of client) vs. normal resolver (does iterative) -- most clients use stub resolvers --> most traffic gets absorbed by stub behind stub/server link! - namespace: (name + type) --> {resource records with TTL} -- many types: - A = address [name to address] - CNAME = canonical name [alias to name] - PTR [reverse DNS] - NS/SOA = zone demarcation - protocol: UDP with client-driven retransmission - clients typically proceed in rounds -- within a round, query multiple servers with a fixed timeout for each -- across round, double timeout -- timeouts usually start at O(1 second) - sequence numbers to match responses to requests - caching; possible in many places - application -- e.g., web browser caches DNS results. why? - stub -- e.g., BIND library -- usually doesn't cache, library tends to be stateless - local server -- e.g., comcast's DNS server -- aggressive caching -- plenty of temporal and spatial locality within a single client's lookup stream -- temporal locality within a client population (basically Zipf] -- saturates with O(10-20) clients in population; no need to do cooperative caching Engineering considerations -------------------------- - everybody is doing fine with TTL-based consistency, in spite of weak guarantees it provides - recursive vs. iterative lookup? - recursive: contact any server, it gives you full answer + simpler clients - DNS laundering attacks - iterative: server points you closer towards answer + simpler servers? + clients ultimately responsible for timeout/retransmit decision + lookup stream exposed to client, hence clients can cache more - NS record caching of paths down the DNS tree was crucial -- basically means TLDs just see bad queries (bogus 2nd-level names) or traffic due to unavailable servers - negative caching turned out to be hugely important -- why so many bad lookups? dns "search list", e.g., if you type in "google" - UDP matters -- TCP congestion window unimportant, since never send more than one datagram's worth of data -- retransmission timers (reliability) matters more -- connectionless --> no initial handshake, so eliminates RTTs - security: turned out to be underengineered in DNS -- mostly about authenticity and integrity of records, not secrecy -- DNSSEC having trouble getting deployed -- deals with spoofed responses via message signatures -- cache poisoning attacks - scalability - ideally, none of the following should grow (linearly or faster) with size of system: + DB state maintained per server + routing metadata maintained per server + network load on any server (including root) + iterative name resolution path + consistency traffic, if any, in the face of caches how well does DNS do? Some DNS cool tricks -------------------- - DNS round-robin for load balancing -- return multiple answers in response to an "A" lookup, in random order -- client round-robins across possible answers - combine ip2geo with low TTL -- nearest replica selection -- Akamai!