Friday 4/25/03 Notes
Notetakers: Kuang Chen and Michelle Moravan

*** Main Point ***
Semaphores can be used for the two distinct purposes of mutual exclusion (binary) and scheduling (counting). This results in much confusion.
******************

What is the relationship between scheduling and mutual exclusion?

The difference between synchronization for mutual exclusion and that for 
scheduling is that mutual exclusion allows only one thread to access a shared 
resource at a time, whereas scheduling controls progress by starting and stopping other processes. Scheduling is a subset of mutual exclusion.

Any scheduling policy must link the problem requirements and the desired behavior.

Returning to the MAILMAN ANALOGY:

Diagram:
person <--------> [mailbox w/ n slots] <---------> mailman

-- begin --

~ Mailbox Owner Requirements ~
if (box not full)
    deposit letter
else 
    wait until box not full

~ Mailman Requirements ~
again:
if (box not empty)
    grab letters
else
    wait until box not empty
goto again

-- end --

Here we find a link between the mailbox owner and the mail man. While owner produces mail and consumes slots, the mailman produces slots and consumes mail. Thus we see the potential for a *symmetrical* solution!

Somehow a signal must be exchanged between two sides. Unfortunately, we can't model this situation with a binary semaphore / mutual exclusion. Why not?
 
~ Review: Binary Semaphore ~
A binary semaphore represents mutually exclusive access to a single resource, 
which is the critical section. In particular, if I am using the resource then nobody else is, and if I am not using the resource then hopefully somebody else is. Most importantly, however, since the count of a binary resource never exceeds one, we are limited to representing a single entity.

So we see that since we would like model multiple letters, a binary semaphore 
just won't do. Thus we recall the counting semaphore, which we will use for 
scheduling a shared resource for which there exists a multiplicity of entities.

Summary: binary semaphores' counters cannot be >1, cannot handle mutiplicity of entities --> use counting semaphore with counter value set to num of free slots.

Returning the MAILMAN EXAMPLE once more:

We begin with a certain number of free slots in the mailbox, which is its 
capacity. If this number drops to zero or less, then the mailbox cannot accept 
any more mail. If the owner would like to add another letter, he must wait. In 
terms of a counting semaphore on the free slots:

customer: P // before consuming a free slot 
mailman:  V // after producing a free slot

The reverse occurs for mail (as opposed to slots).

Thus the entire problem can be modeled with two counting semaphores:
	* one for slots
	* the other for letters
and a binary sempaphor on:
	* the shared mailbox

~ FINISHED CODE ~
-- begin --

Semaphore mutex;
Semaphore empty(n); // # of free slots
Semaphore full(0); // # of letters in mbox

/** Mailbox Owner 
 *  The ticket P(empty) represents only that there is a slot available
 *  The resource isn't actually grabbed until P(mutex)
 *  If P(mutex occurred first, the owner might grab the mailbox when there were 
 *  no free slots. In this case there would be deadlock and the mailman would 
 *  sleep forever. 
 */
// see if slots available, then consume or wait as appropriate
  P(empty); // have ticket, can safely consume
  P(mutex); 
    // deposit letter
  V(mutex); // release lock ASAP to maximize sharing
  V(full); // inform the mailman that he has work

/** Mailman 
 *  Same goals as above, just consuming opposites.
*/
  P(full);
  P(mutex);
    // consume letter
  V(mutex);
  V(empty);

-- end --

And so we see how we can combine mutual exclusion with scheduling. The reason why these examples are so confusing is because we are using the same mechanism (semaphores) to provide completely different services (scheduling and mutual exclusion). Thus we must keep track of a lot of unintuitive details in order to 
pull it off.

Monday: We will explore how to implement this using two different mechanisms and so get rid of our confusion.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ONE QUESTION ABOUT THIS TALK ON MIDTERM, GUARANTEED

John Byers - Boston University
www.cs.bu.edu/fac/byers

Topology discovery via traceroute:
=================================

Goal: 	Map router-level graph used by IP, with tracerroute as the measurement primative, for the case of source infrastructure to a large set of destinations.

(k-m) tracerroute study is a snapshot in time, a simplification of the topology.
k: sources, conduct tracerroute to m destinations, small
m: destinations, large
union of all traceroutes --> (k,m) traceroute study

Measurement philosophy for this task: more data is better.
However, more measurements and more infrastructure is expensive.
How many source servers is sufficient for a reasonably accurate portrayal? 

Open question: scaling laws
Given graph G=(V,E), routing algorithm R, k sources, m dests
Consider subgraph G'=(V', E') induced by routes between R and all (src,dst) pairs
How do expected sub-graphs ( values of V' and E' scale as a functon of k,m) scale?

They found a heavy-tailed distribution for the underlying graph (the Internet). The k-m sampling appeared to cause a significant bias. It seemed to show that the coverage was very thorough near the source but that it decayed with distance from the source.

How significant is this bias in practice?
Can we use know statistical methods to remove bias?