Go

Becoming widely used for many of the things that Java/Python had been used for;
garbage collected so not for real-time, performance critical code.

One thing is: it will take you a while to get used to using Go.  One of the designers said last year that it takes a month -- because you bring the paradigms you are familiar with, before you figure out the easier way to do the same thing in Go.

That's ok -- we're *not* grading on the "go"ness of your code. Just get it to work.  It *is* helpful if we can understand your code, so commenting is essential.
There are often multiple ways to do things -- your code doesn't need to be the most concise or do things "the go way".  But you probably do want to keep thinking -- is this the easiest way to do something?  If it feels clumsy, then there's probably a shortcut.

0. Irene covered many of the basic language features.  

I plan to cover the other things I needed in doing the project (so far).

1. slices

An array has a fixed size

A slice is a generalization of an array to make it more like a container;
use it when you would otherwise use a list or set. (go does have lists -- specifically for polymorphic lists; slices only work if every element has the same type.

elements can be referenced the same way x[1], x[2]
elements can be pushed/popped 
slice resizes itself -- user only specifies initial size
slices can be carved up into new slices, without changing the old slice

ex: 

// set up a list of workers
var workers []string 

// append a new worker when available
// note I didn't "make" workers
//  append to an unitialized slice just assumes its empty
workers = append(workers, nextWorker)

// pop a worker
worker, workers = workers[len(workers)-1], workers[:len(workers)-1]

// a[i:j] -- a slice containing just elements i-j
// a[:j] -- all the elements up through j
// a[i:] -- all the elements from i on up
workers = workers[1:]  // all but the first item
workers = workers[:len(workers)-1]  // all but the last item

// draw a picture -- the slice variable points to the underlying data
// so multiple pointers to the same underlying data is ok -- just a GC problem

2. functions as data

// define a function
f := func(c rune) bool {
 	return !unicode.IsLetter(c)
}

// type of f
var f func(rune) bool

// can then pass f to a function, e.g., FieldsFunc takes a f
// to determine where its safe to split words

tokens := strings.FieldsFunc(value, f)

// can also define and call an anonymous function "function literal" 
// within a procedure

func(){
  fmt.Println("Hello World!")
}()

// this isn't all that helpful -- it just does
  fmt.Println("Hello World!")

// we can pass arguments
// this prints Hello 5 times
func(n uint){
  for i:= 0; i < n; i++) {
    fmt.Println("Hello World!")
  }
}(5)

// obviously, that's still not all that helpful 

3. Concurrency 
// anonymous functions help when you want to create a 
// thread to do something

// this runs the hello world function in the background

go func(){
  fmt.Println("Hello World!")
}()

// of course, you can also use go on a normal procedure
go fmt.Println("Hello World!")

// but neither of these wait for the function to finish

4. Channels

// I'm going to just define what they do, then I'll come back
// and explain how to use them

// channel = typed bounded buffer
// for example, I created a list of work to do, as a channel, to allow 
// another thread to pull items of work off, and do them
// I could have used a slice, but a slice isn't a concurrent data structure,
// so that would mean locking/unlocking the slice, etc. And that's ok, but 
// you can also use channels.

toDo = make(chan int, 10)

// put work in

toDo <- 1

// take work out

i:= <- toDo 

5. Select

// where channels really help is if you have many things to
// wait for.  Then you can use select as an event loop; wait for one of 
// the channels to have work.  

for { 
	select {
 	case address := <- mr.registerChannel:
 	// a new worker is registering
 	case nextTask := <- mr.mapToDo:
	// there's a map task to do
 	case nextTask := <- mr.reduceToDo:
	// there's a reduce task to do
...
}

// you can also add conditions
for { 
	select {
 	case address := <- mr.registerChannel:
 	// a new worker is registering
 	case worker && nextTask := <- mr.mapToDo:
	// there's a map task to do
 	case worker && nextTask := <- mr.reduceToDo:
	// there's a reduce task to do
...
}

5. Unicode

Please remember that a string is NOT an array of one byte characters.
Any particular character can be variable size, so you need
to use the appropriate library code for parsing strings.

6. Libraries

// all the docs are online; google is your friend
// eg.

import "strings"

func FieldsFunc(s string, f func(rune) bool) []string

FieldsFunc splits the string s at each run of Unicode code points c 
satisfying f(c) and returns an array of slices of s. 
If all code points in s satisfy f(c) or the string is empty, an empty slice is returned. FieldsFunc makes no guarantees about the order in which it calls f(c). If f does not return consistent results for a given c, FieldsFunc may crash.

7. CSP

ok, let me come back to the question of how to use channels.  Go has locks and condition variables, and ok to just use them.  In lab 2, for example, we need to set up a server, and we want the server to do one operation at a time.  That's ok way to use a lock.

But sometimes easier to use channels.  One model is if your code is a computational pipeline

e.g., op1 | op2 | op3  

create two channels, and three (or more) threads

But also can use channels to replace a mutex:

create a thread to manage the object
create a channel for incoming requests
create a channel for outgoing replies
thread loops waiting for work to come in on the incoming channel
  does some operation
  puts result on outgoing channel

can also have multiple channels, e.g., if work can be of different types,
like above where we have both new workers arriving, and work to do

This is called: CSP (Communicating Sequential Processes)

It is a dual of monitors -- just a different way to write the same code.

monitors:
set of threads, that acquire a lock before calling into an object, 
so that only one thread executes inside the monitor at a time

CSP: 
one thread executes all the operations on the object
other threads invoke object methods by sending object a message
(on a channel)

Also: can wait until state variables are set in a particular
way before doing some action.  Why select/switch are similar in go --
can specify list of conditions needed, before checking to see 
if its ok to consume a particular token.

In particular, you'll notice deadlock occurs in CSP, exactly where you
might get deadlock in monitors.  So if you use channels for communicating
between threads, then if a channel is full, thread putting into the
channel will wait.  But that thread might also need to pull data
off a channel to allow thread running inside the object to continue to operate.

Net: ok to use channels/CSP style or locks for 452.

8. Lab 1 notes

a. hopefully you know enough now to do part i, that is, to write a simple MapReduce program to do word count. Master runs the code worker code directly.

b. part ii is to write the master, where workers run as separate processes  
(in fact, as separate go threads, but could be in a different process).

four files:
common.go -- the RPC spec, shared between client and server
mapreduce.go -- code for splitting the initial file into chunks,
creating file names, etc.
master.go -- code for managing workers
worker.go

For mapreduce, master and workers -- which is the client and which is the server?

Hah! Both.

Can be a bit confusing since the code implementing the RPCs is intermixed
with the rest of the code. Recall that RPC's need to have names that are
capitalized -- the system takes all of those as RPC's, whether you 
intended them or not.

E.g., you may get errors of the form; just ignore them.
 go test
2012/12/28 14:51:47 method Kill has wrong number of ins: 1

By convention, all RPC's have two arguments, arg *FuncArgs, reply *FuncReply
and return error.

When start up, master initializes itself then creates the workers; but how does it know when the workers are ready to receive work?  Within a go program, it could just create a channel and put work into it.  But since we're a distributed system, need to create a socket to the worker -- only works once the worker is listening for the socket. 

So: The slaves intialize, then register with the master; saying its ok to send work.  

// called from worker.go - ready to accept work
// defined in mapreduce.go -- tell master I'm ready to accept work
// args are defined in common.go
func (mr *MapReduce) Register(args *RegisterArgs, res *RegisterReply) error

// also need an RPC to do work
// called from master
// implemented in worker
func (wk *Worker) DoJob(arg *DoJobArgs, res *DoJobReply) error 

RPC's can fail -- e.g., communication failure, worker failure, etc.
If so, the caller will get back an error

// if ok is true, call performed
// if ok is not true, was call performed? 
// For now, failures are fail-stop: if ok = false, worker did not and will
// never touch the intermediate files that it was asked to create
// We'll remove this assumption in Lab 2.
ok := call(worker, "Worker.DoJob", args, &reply)

// There's also an RPC to tell worker to shut down
func (wk *Worker) Shutdown(args *ShutdownArgs, res *ShutdownReply) error

Finally, RPC's are blocking, so in order for the master to do
more than one MapR task at the same time, it needs multiple RPC's
to be outstanding.  Hence you'll need to create a thread per worker
or a thread per RPC, and you'll need to keep track of which 
tasks are still to be done, and which are complete.

Only start Reduce tasks once all Map tasks are done

c. part iii is to handle worker failures

I ended up implementing part ii and part iii at the same time.  If you hand a Map task to a worker, and the RPC fails, then you need to hand it to a different worker -- e.g., put it back on the task list

9. Concurrency of RPCs

This will be more relevant in Lab 2, but worth mentioning here.

At the server, RPCs from different clients will run concurrently. 
RPC's from the same client run sequentially (I think).

Here, we have one master, so its not an issue. But an implication is that
if you have shared state accessed by an RPC, you'll need to implement a monitor or use channels to implement CSP.