Go Becoming widely used for many of the things that Java/Python had been used for; garbage collected so not for real-time, performance critical code. One thing is: it will take you a while to get used to using Go. One of the designers said last year that it takes a month -- because you bring the paradigms you are familiar with, before you figure out the easier way to do the same thing in Go. That's ok -- we're *not* grading on the "go"ness of your code. Just get it to work. It *is* helpful if we can understand your code, so commenting is essential. There are often multiple ways to do things -- your code doesn't need to be the most concise or do things "the go way". But you probably do want to keep thinking -- is this the easiest way to do something? If it feels clumsy, then there's probably a shortcut. 0. Irene covered many of the basic language features. I plan to cover the other things I needed in doing the project (so far). 1. slices An array has a fixed size A slice is a generalization of an array to make it more like a container; use it when you would otherwise use a list or set. (go does have lists -- specifically for polymorphic lists; slices only work if every element has the same type. elements can be referenced the same way x[1], x[2] elements can be pushed/popped slice resizes itself -- user only specifies initial size slices can be carved up into new slices, without changing the old slice ex: // set up a list of workers var workers []string // append a new worker when available // note I didn't "make" workers // append to an unitialized slice just assumes its empty workers = append(workers, nextWorker) // pop a worker worker, workers = workers[len(workers)-1], workers[:len(workers)-1] // a[i:j] -- a slice containing just elements i-j // a[:j] -- all the elements up through j // a[i:] -- all the elements from i on up workers = workers[1:] // all but the first item workers = workers[:len(workers)-1] // all but the last item // draw a picture -- the slice variable points to the underlying data // so multiple pointers to the same underlying data is ok -- just a GC problem 2. functions as data // define a function f := func(c rune) bool { return !unicode.IsLetter(c) } // type of f var f func(rune) bool // can then pass f to a function, e.g., FieldsFunc takes a f // to determine where its safe to split words tokens := strings.FieldsFunc(value, f) // can also define and call an anonymous function "function literal" // within a procedure func(){ fmt.Println("Hello World!") }() // this isn't all that helpful -- it just does fmt.Println("Hello World!") // we can pass arguments // this prints Hello 5 times func(n uint){ for i:= 0; i < n; i++) { fmt.Println("Hello World!") } }(5) // obviously, that's still not all that helpful 3. Concurrency // anonymous functions help when you want to create a // thread to do something // this runs the hello world function in the background go func(){ fmt.Println("Hello World!") }() // of course, you can also use go on a normal procedure go fmt.Println("Hello World!") // but neither of these wait for the function to finish 4. Channels // I'm going to just define what they do, then I'll come back // and explain how to use them // channel = typed bounded buffer // for example, I created a list of work to do, as a channel, to allow // another thread to pull items of work off, and do them // I could have used a slice, but a slice isn't a concurrent data structure, // so that would mean locking/unlocking the slice, etc. And that's ok, but // you can also use channels. toDo = make(chan int, 10) // put work in toDo <- 1 // take work out i:= <- toDo 5. Select // where channels really help is if you have many things to // wait for. Then you can use select as an event loop; wait for one of // the channels to have work. for { select { case address := <- mr.registerChannel: // a new worker is registering case nextTask := <- mr.mapToDo: // there's a map task to do case nextTask := <- mr.reduceToDo: // there's a reduce task to do ... } // you can also add conditions for { select { case address := <- mr.registerChannel: // a new worker is registering case worker && nextTask := <- mr.mapToDo: // there's a map task to do case worker && nextTask := <- mr.reduceToDo: // there's a reduce task to do ... } 5. Unicode Please remember that a string is NOT an array of one byte characters. Any particular character can be variable size, so you need to use the appropriate library code for parsing strings. 6. Libraries // all the docs are online; google is your friend // eg. import "strings" func FieldsFunc(s string, f func(rune) bool) []string FieldsFunc splits the string s at each run of Unicode code points c satisfying f(c) and returns an array of slices of s. If all code points in s satisfy f(c) or the string is empty, an empty slice is returned. FieldsFunc makes no guarantees about the order in which it calls f(c). If f does not return consistent results for a given c, FieldsFunc may crash. 7. CSP ok, let me come back to the question of how to use channels. Go has locks and condition variables, and ok to just use them. In lab 2, for example, we need to set up a server, and we want the server to do one operation at a time. That's ok way to use a lock. But sometimes easier to use channels. One model is if your code is a computational pipeline e.g., op1 | op2 | op3 create two channels, and three (or more) threads But also can use channels to replace a mutex: create a thread to manage the object create a channel for incoming requests create a channel for outgoing replies thread loops waiting for work to come in on the incoming channel does some operation puts result on outgoing channel can also have multiple channels, e.g., if work can be of different types, like above where we have both new workers arriving, and work to do This is called: CSP (Communicating Sequential Processes) It is a dual of monitors -- just a different way to write the same code. monitors: set of threads, that acquire a lock before calling into an object, so that only one thread executes inside the monitor at a time CSP: one thread executes all the operations on the object other threads invoke object methods by sending object a message (on a channel) Also: can wait until state variables are set in a particular way before doing some action. Why select/switch are similar in go -- can specify list of conditions needed, before checking to see if its ok to consume a particular token. In particular, you'll notice deadlock occurs in CSP, exactly where you might get deadlock in monitors. So if you use channels for communicating between threads, then if a channel is full, thread putting into the channel will wait. But that thread might also need to pull data off a channel to allow thread running inside the object to continue to operate. Net: ok to use channels/CSP style or locks for 452. 8. Lab 1 notes a. hopefully you know enough now to do part i, that is, to write a simple MapReduce program to do word count. Master runs the code worker code directly. b. part ii is to write the master, where workers run as separate processes (in fact, as separate go threads, but could be in a different process). four files: common.go -- the RPC spec, shared between client and server mapreduce.go -- code for splitting the initial file into chunks, creating file names, etc. master.go -- code for managing workers worker.go For mapreduce, master and workers -- which is the client and which is the server? Hah! Both. Can be a bit confusing since the code implementing the RPCs is intermixed with the rest of the code. Recall that RPC's need to have names that are capitalized -- the system takes all of those as RPC's, whether you intended them or not. E.g., you may get errors of the form; just ignore them. go test 2012/12/28 14:51:47 method Kill has wrong number of ins: 1 By convention, all RPC's have two arguments, arg *FuncArgs, reply *FuncReply and return error. When start up, master initializes itself then creates the workers; but how does it know when the workers are ready to receive work? Within a go program, it could just create a channel and put work into it. But since we're a distributed system, need to create a socket to the worker -- only works once the worker is listening for the socket. So: The slaves intialize, then register with the master; saying its ok to send work. // called from worker.go - ready to accept work // defined in mapreduce.go -- tell master I'm ready to accept work // args are defined in common.go func (mr *MapReduce) Register(args *RegisterArgs, res *RegisterReply) error // also need an RPC to do work // called from master // implemented in worker func (wk *Worker) DoJob(arg *DoJobArgs, res *DoJobReply) error RPC's can fail -- e.g., communication failure, worker failure, etc. If so, the caller will get back an error // if ok is true, call performed // if ok is not true, was call performed? // For now, failures are fail-stop: if ok = false, worker did not and will // never touch the intermediate files that it was asked to create // We'll remove this assumption in Lab 2. ok := call(worker, "Worker.DoJob", args, &reply) // There's also an RPC to tell worker to shut down func (wk *Worker) Shutdown(args *ShutdownArgs, res *ShutdownReply) error Finally, RPC's are blocking, so in order for the master to do more than one MapR task at the same time, it needs multiple RPC's to be outstanding. Hence you'll need to create a thread per worker or a thread per RPC, and you'll need to keep track of which tasks are still to be done, and which are complete. Only start Reduce tasks once all Map tasks are done c. part iii is to handle worker failures I ended up implementing part ii and part iii at the same time. If you hand a Map task to a worker, and the RPC fails, then you need to hand it to a different worker -- e.g., put it back on the task list 9. Concurrency of RPCs This will be more relevant in Lab 2, but worth mentioning here. At the server, RPCs from different clients will run concurrently. RPC's from the same client run sequentially (I think). Here, we have one master, so its not an issue. But an implication is that if you have shared state accessed by an RPC, you'll need to implement a monitor or use channels to implement CSP.