_____________ LECTURE1 Irene Zhang _____________ Table of Contents _________________ 1 Introduction 2 Administrivia 3 Project 4 Readings 5 Blog posts 6 Problem sets 7 Distributed Systems Intro .. 7.1 What is distributed system? .. 7.2 Why care about distributed computing? .. 7.3 Why so hard? 8 RPC .. 8.1 RPC implementation .. 8.2 But can you really treat RPC like a normal function? Not quite .. 8.3 What kinds of failures are there? .. 8.4 RPC semantics .. 8.5 RPC semantics example: buy a book .. 8.6 what does a client do in the case of an exception? .. 8.7 Impact: RPC is used everywhere 9 Next paper: MapReduce 1 Introduction ============== - Course title: Distributed Systems (both 452 and M552) - My name: Irene Zhang, not Tom Anderson - TAs: me and Lisa - I'm teaching sections, Lisa is grading, so come to me for help and go to Lisa with complaints - About me: undergrad and masters MIT, 3rd year PhD student, work on distributed systems research - Lisa? - I’m excited about this class - Distributed systems is probably the most complex and difficult topic in all of CS - Also probably one of the most relevant today. Name any app you interact with daily (Facebook, Google), probably a distributed system 2 Administrivia =============== - Homework - 1 Project - 2-3 problem sets - 10 blog posts - no final exam - Grades - 20% problem sets, 10% blog posts, 70% project 3 Project ========= - Will be done in Go. New programming language developed at Google for building highly concurrent distributed systems - Build a highly available, scalable, fault tolerant key-value store - Some people might not know what a key value store is, basically a big hash table - Will basically cover the core techniques for building something like BigTable, which Google uses to power Google - Developed at MIT for their grad DS class. We've made some modifications to make the design assumptions more explicit, but no big changes - A series of labs split into 7 assignments, roughly one due every week and a half. Each step you add more guarantees. So you go from something that doesn't have fault-tolerance, to something that can handle some types of faults to something that is fault-tolerant to a large range of faults. - I have done the project, so ask me questions - We'll be using gitlab, a new local GitHub server. Feel free to keep your code there as well, you can just fork our repo - First assigment is already up, but we'll cover Go in section on Thursday, so up to you if you want to get started - Projects will be done in teams of 2 for 452 and solo for M552 - 7 slip days for the project, but since they are in a sequence and one depends on the previous one, you'll have to be careful about using these - Last week is hack week, no classes or section 4 Readings ========== - 15 research papers, no textbook. How many of you have read a research paper? Key is not to let yourself get bogged down if you don’t understand something. 5 Blog posts ============ - Students in both 452 and M552 are required to post a short, unique comment, observation, or question to the discussion board by 10:00am on the day of each class. - For each assigned reading, we will post one or two discussion questions to the discussion board - We will grade you based on your 10 top blog posts, so no need to post more than 10 times 6 Problem sets ============== - to be done individually - collectively to make up a open note take home final 7 Distributed Systems Intro =========================== 7.1 What is distributed system? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - multiple computers interconnected, cooperate to provide some service - Example: Akamai, Google, catalyst, basically anything you use today 7.2 Why care about distributed computing? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Everything is a distributed system now 2. conquer geographic separation (there are people on the other side of the world that want Facebook to be fast too!) Also, speed of light not getting any faster 3. build reliable systems out of unreliable components (even if you had a perfectly reliable computer, there could still be an earthquake, something our Japenese customers at VMware really worry about!) 4. aggregate many computers for high capacity (Google chose to do this because buying commodity was cheaper, but you might just not be able to build a big enough computer) - aggregate cycles+memory: (ThreadMarks, Dryad) - aggregate bw (Coral, Shark) - aggregate disks (Frangipani) 5. customize computers for specific tasks (e.g., email server, backup server) 7.3 Why so hard? ~~~~~~~~~~~~~~~~ 1. system design - Partitioning of responsibilities (what does the client do, what does the server do? which servers?) - what are the right protocols? What are the right abstractions? 2. Failures - “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” - Leslie Lamport - Most apps don't have to think *too* hard about failures because it's all or nothing (either the whole thing crashes and then you don't care or it doesnt). - communication vs. hardware, how do you tell the difference? 3. Concurrency and consistency - A non distributed app can get away with doing one thing at a time, but in a distributed app lots of things are happening at once. - Also, lots of readers and writer of shared data. - Lots of replicas floating around for caching or fault-tolerance need to synchronize updates to all of these copies. 4. performance - how do we make a system fast when it needs to coordinate across multiple machines (e.g., generating 1 Facebook page takes calls to 1000s of machines) - performance can be extremely variable and unpredictable, some systems include very slow and very fast networks (e.g., storage system that has some servers in the same datacenter and some that are wide-area, some nodes are server vs others that are mobile devices) - sidebar: how to tell when a machine is down or just slow? - tail-latency: request is as slow as the slowest machine 5. Implementation and testing - often lots of hard to find bugs - hard to test all failure conditions (or reproduce failure conditions) - hard to have a controlled test environment 6. security - adversary may compromise machines or manipulate messages (without you being able to tell) 8 RPC ===== - One of most basic technique for communication in distributed systems - Important concept for project. There will be a lot of RPC in your project code - Basically a function that is executed on the server instead of the client. 8.1 RPC implementation ~~~~~~~~~~~~~~~~~~~~~~ client server request (w/ args) -------> processing waiting response <------- - Automate this calling to make the RPC look like a function call to the client, but call really goes into a stub - The stub packs up which function was called, and the arguments and sends it to the server. - server unpacks and executes with arguments, then packs up and returns the return value from the function 8.2 But can you really treat RPC like a normal function? Not quite ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - performance - local call: maybe 10 cycles = ~3 ns - RPC: 0.1-1 ms on a LAN => ~10K-100K slower - in the wide area: can easily be millions of times slower - failures - what happens if messages get dropped, the client or server crashes, etc? - also security, concurrent requests, etc. 8.3 What kinds of failures are there? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. communication failures (messages may be delayed, variable round trip, never arrive) 2. machines failures -- either client or server 3. sometimes can't tell if it was a dropped request message or a dropped reply message or whether it was a communication failure or a machine failure or, if machine failure, whether it crashed before or after processing the request 8.4 RPC semantics ~~~~~~~~~~~~~~~~~ - What semantics do we get with our RPC implementation above? - Just hangs if there's a failure. Not good. - Slightly better to timeout and tell the application we failed - Also: might execute a request twice due to duplicate packet - Alternative: at least once - retry until we get a successful response - Alternative: at most once - give each request an ID and have the server keep track of whether it's been seen before - But then you have to deal with server failures - Which do you think is best? - At-least-once versus at-most-once? 8.5 RPC semantics example: buy a book ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. if client and server stay up, book will arrive on your kindle 2. if client fails, user may not know if book was purchased (need a plan!) 3. if server fails, client may have bought the book, or not - at-least-once: client keeps trying - at-most-once: client will receive an exception 8.6 what does a client do in the case of an exception? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Need some application-specific protocol - Ex: ask server, did user buy the book? - Means server needs to have a plan for remembering state across reboots - at-least-once (if we never give up) - clients keep trying. server may run procedure several times. server must use application state to handle duplicates if requests are not idempotent (and difficult to make all request idempotent) (for example: is reading a web page idempotent? Normally, but not always!) - at most once - Ex: server logs purchases on disk, with request ID. Check log before each request, so if this is a retry, can squelch it - (Actually Amazon doesn’t do this – they purchase you the second book – better to be fast and sorry than slow and correct.) e.g., server store on disk who has lock and req id check table for each requst even if server fails and reboots, we get correct semantic - What is right? depends where RPC is used. - what does Facebook do? Better to give up and have user try again, some times you might post twice! - more sophisticated applications: need an application-level plan in both cases not clear at-once gives you a leg up => Handling machine failures makes RPC different than procedure calls 8.7 Impact: RPC is used everywhere ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - automatic marshalling is really useful; Go has especially nice library for this. Another popular one is Apache Thrift - client stubs & transparency are useful, but transparency only goes so far - dealing with failures is still hard and it typically still requires application involvement 9 Next paper: MapReduce ======================= - a system for coordinating computation over a large number of nodes - You will implement this in your first project assignment and will use RPC to communicate between master, which hands out jobs and workers that run them