University of Washington

CSE 326, Data Structures

Programming Assignment #3

Simulating Network Traffic

Checkpoint: Monday, February 26, 2001.

Due: Monday, March 5, 2001, 1:30pm

Winter 2001 February 16, 2001

Donald Chinn

In all areas of engineering, simulators are used to discover design flaws, test engineering hypotheses, and determine performance. For example, the Boeing 777 airplane was extensively simulated --- almost exclusively in software --- before the first plane was built. In the computer industry, chips are simulated in software before millions of dollars are spent building the plants and manufacturing the actual chips.

If a simulator is constructed in the right way, it can be used to test different designs rapidly. Imagine how slow (and expensive) it would be to build a life-sized plane for each proposed design.

In this assignment, you are asked to write a simulator for network traffic. The environment the simulator is intended to simulate is a network of computers that communicate with each other. You could envision the network of computers to be the Internet, a loosely coupled set of workstations in a corporate network, or a set of processors that need to communicate with each other in a tightly coupled multiprocessor machine. You could even imagine a large number of cars on a network of roads. The exact environment we are trying to simulate is not that important for our purposes. As we shall see, we will be abstracting away many details of a real network of computers so that we can focus on the problem of traffic.

Our network can be represented, of course, as a graph, where each vertex of the graph represents a node in the network and each edge of the graph represents a connection between nodes in the network.

In our simulation, the smallest unit of communication is a packet. A packet is (typically) a fixed size sequence of bytes. At a minimum, it contains bytes that represent the destination node of the packet and the data that's actually being transmitted. It might also contain a representation of the source node and possibly other information, depending on whether that information is needed in routing the packet in the network. Routing is simply the method (or algorithm) by which packets are sent from source to destination.

Our model is as follows. The network is a graph, and time is measured in abstract units or steps. Each node has a central queue that contains packets located at that node at a given time step. The central queue has a maximum capacity of k packets. For reasons that will become clear later, the node also has an injection queue, which can contain an unlimited number of packets. Finally, for each connection to adjacent nodes, there is a corresponding in-queue that can contain at most one packet that was sent from the corresponding adjacent node.

Here's a picture of a node:

At each time step, the following happens (in the following order) in each node:

Delivery: All packets in a node's in-queues that have the node as its destination are "delivered" and ejected from the system.

Injection: Each node in the network is allowed to inject (at most) one packet into the network. That is, the node can insert a packet into its injection queue.

Centralization: The node is allowed to transfer packets from its injection queue and its in-queues to its central queue. It might be that there is not enough room left in the central queue to hold all the packets in the injection and in-queues. Whatever method that is used to determine which packets enter the central queue, it must never let more packets into the central queue than it can hold. Also, packets must not magically disappear from the network.

Packet assignment: The node is allowed to assign packets in its central queue to outgoing edges. The amount of information allowed to make the assignment might vary. More on this later.

Send: Packets that are assigned to edges are transmitted to the corresponding node's in-queue, but only if that in-queue is empty. If the in-queue is not empty, then the packet is not sent, and it stays in the central queue of the sending node.

The nodes are organized in a network, which can be described by a graph. In this assignment, the graph will come from a file as an input to our simulator. The format of the file is simple. The first line of the file contains the number of vertices the graph has. On each successive line, there will be a vertex number, followed by a colon, followed by a list of adjacent vertices to that vertex (separated by commas). The line ends with a period. Here’s a sample input graph (a 2 x 2 mesh):

0:1,2.

1:0,3.

2:0,3.

3:1,2.

The graphs we will use will all be undirected graphs: for any pair of vertices v and w in a graph G = (V, E) , either both (v , w) Î E and (w, v) Î E or neither are. We will be running our algorithms on two kinds of graphs: meshes and random graphs. A mesh network is a rectangular grid of vertices connected by edges in the vertical and horizontal directions, except along the edges of the mesh. Here is a 3 x 5 mesh:

In this project, when we talk about meshes, we will concern ourselves only with square meshes (n x n).

A random graph consists of a connected line of vertices (so that the entire graph is connected). For every other pair of vertices, the edge is included in E with some probability q.

A program called graphgen has been provided for you to generate both mesh and random graphs. Feel free to modify or add to the code to generate your own graphs.

You will need to write code to read the graph from the file, making the appropriate connections between nodes in your simulator.

The Assignment

Your task in this project is to write a simulator that can simulate routing algorithms. You will design and implement a routing algorithm for either mesh graphs or random graphs. (Random graphs are more general, and so it is possibly more difficult to design good algorithms for them. On the other hand, devising good algorithms for general graphs is more useful in practice.)

Measure of Performance

Our goal is to minimize the average amount of time it takes for a packet to arrive at its destination. To calculate this, we must keep track of how long each packet has been in the system. The easiest way to do this is to have a data member in the Packet class to store that information. As each packet is delivered, we can keep a running total of how many packets have been delivered and the total time it has taken to deliver those packets. (Dividing the total delivery time by the number of packets delivered will give us the average delivery time.)

Programming Details

The key data structures in your simulator will be the node and the packet. Your Node class will need to be organized so that you can easily implement your algorithm (use the picture above of the node as a guide). Your Packet class will need to have a data member that keeps track of how long it has been in the network.

Let us briefly look at each phase of the simulator:

delivery: This is easy to implement. Scan all the packets in the in-queues. Any packets in its destination node can be eliminated from the system, updating our packet delivery count and total delivery time.

injection: In this phase, flip a biased coin for each node so that with probability p a new packet is inserted into the injection queue of that node. If a new packet is injected, a random destination is chosen for it (equally likely for all nodes). If the destination of the packet is the node, then we treat it as though the coin flip failed (i.e., no packet generated).

centralization: The central queue can be any data structure. It can be as simple as a regular queue or as complex as a priority queue with many different operations (beyond insert and deleteMin). Your choice of data structure will depend on what routing algorithm you choose to implement. However, the size of our queues will be small, so a simple array is probably sufficient.

Generally speaking, if there is enough space in the central queue to hold all packets in the injection queue and in the in-queues, then they should be placed in the central queue. If there isn’t enough space, then you need to decide which packets get in and which ones stay where they are. A simple strategy (e.g., give priority to packets in the in-queues) is usually sufficient here.

packet assignment: This is the main component of a routing algorithm. For every adjacent node, a packet can be chosen to be sent to it in the next "send" phase. The assignment can be simple, such as picking the first packet in the queue that can use that edge to get closer to its destination. Or, it can be more complicated. See some examples below (under the Basic Algorithms section).

sending packets: This is easy to implement. For the assigned packets in a node, check to see if the corresponding in-queue in the adjacent node is occupied. If not, then send it. If it is full, then the packet stays.

A centralization policy and a packet assignment policy constitute a routing algorithm.

The Basic Algorithms

For the mesh, the basic algorithm is what is known as the dimension order algorithm. The centralization policy of this routing algorithm is that packets in the in-queues get priority over packets in the injection queue for space in the central queue. If not all of the packets in the in-queue can fit in the central queue, then there is some pre-set priority of the in-queues (e.g., the queue connected to the node to the "south" has highest priority, "west" is next, "east" is next, and then "north").

The packet assignment policy is that packets always move toward their destination and they always move horizontally in the mesh before moving vertically. So, it’s possible to calculate for each node which edge it must cross next. For each outgoing edge, the first packet in the central queue that must cross that edge next is chosen for that edge. (It’s called "dimension order" because the algorithm can be generalized to three-dimensional and higher-dimensional meshes, where the "coordinates" of the packet are corrected in a fixed order.)

For random graphs, the basic algorithm uses a greedy approach. We run breadth first search from every node to every other node. From the search, we can construct a routing table for each node. The routing table tells which edge a packet with a given destination must cross next. For each outgoing edge at a node, the packet assignment policy is to pick the first packet in the central queue that must cross that edge next.

As with the dimension order algorithm, the centralization policy is to give in-queue packets priority over injection queue packets.

Some Ideas For Algorithms

Here are some quick ideas for devising algorithms. If you imagine driving a car over busy city streets, these ideas will look familiar.

Using alternate paths: In both the basic algorithms, given any source node and destination node, there is one path a packet takes from that source to its destination. Intuitively, this seems potentially wasteful because there could be many alternative routes (some longer than the minimum) that packets could take.

Randomness: perhaps randomness could be used (either in the centralization policy or the packet assignment policy) to help evenly distribute the packets among the edges. For example, in the dimension order algorithm, several packets may all need to go out one edge even though they could use other edges and still move closer to their destinations. A coin flip could be used to decide which of the two directions a packet could take next and still make progress.

Derouting: Under some situations, it might be advantageous to send a packet further away from its destination so that there is less congestion. This can actually be useful, but it is important to avoid livelock (see below).

Deadlock, Livelock

All routing algorithms must avoid deadlock, the situation where a group of packets cannot advance because they are all competing for resources (space in the nodes) they are currently holding. Thus, none of them can move.

Routing algorithms that deroute packets have an additional concern called livelock. Livelock is the situation where a packet is guaranteed to eventually move, but it is not guaranteed to eventually reach its destination (because, for example, it keeps moving in a cycle in the network).

A Little Theory

Suppose each node injects a packet with probability p. Then on an n x n mesh, pn²/2 packets on average are generated on the left half of the mesh each time step. Half of these packets, pn²/4, have destinations on the right half of the mesh. Since there are only n edges that connect the two halves of the mesh, pn²/4 must be less than n or else we will get progressively worse congestion. So, p = 4 / n is a theoretical limit for how much traffic the mesh can accommodate.

If p = l 4 / n, where 0 < l < 1, then l is called the load on the network. (Note that N = n², where N is the number of nodes in the n x n mesh.)

We can perform a similar calculation for random graphs. Let q be the probability of creating an edge. There are (roughly) q N² / 4 edges between the two halves of the graph (vertices 0 to N/2 – 1 and N/2 to N). So, we must have p ( N / 2) (1/2) < q N² / 4, or p < q N. If p = l q N, then l is the load on the network.

What we will want to do is to see how our algorithms perform on a range of loads and on a range of graph sizes.

What To Do

After you have designed and implemented your routing algorithm for your chosen class of graph (mesh or general), it is time to see how it fares against the basic algorithm for that class. Run your simulator on a range of sizes of graphs (for meshes, try 8 x 8, 16 x 16, 32 x 32, and 64 x 64; for random graphs, try graphs with sizes 100, 250, 1000, and 4000). Run them on a variety of loads (for example, 0.2, 0.4, 0.6, 0.8, 0.9). Try to identify the point at which the delivery time starts to increase dramatically (the curve on the performance graph will start to go up fast).

You should run your simulator on a graph until 1,000,000 packets have been delivered.

Naming Files

In order to systematically experiment with your simulator, you will want to name your files in a way that indicates what graph it has in it. Here is a suggestion. Name mesh graphs as mesh.n for n x n meshes. Name random graphs as rand.N.XX, where N is the number of nodes and XX/100 is the probability used to generate the graph.

What To Turn In

1. At the halfway point (February 26), you should turn in a brief checkpoint report (see below).

2. You should turn in your code electronically as usual.

3. In your submission, you should have a README file that says who is in your group, what kind of algorithm (mesh or random) you investigated, and the results of your experiments (performance graphs that compare the load of the system to the average delivery time). You should also briefly describe how your algorithm works and why you believe it avoids deadlock and livelock.

4. You should have another file called MYSTRUCTURES that briefly describes your data structures (Node, Packet, and any other important data structures).

Checkpoint

On Monday, February 26, you will turn in a brief report that contains the following information: (1) who is in your group, (2) whether you intend to work on mesh routing algorithms or general graph algorithms, and (2) an outline of your original routing algorithm that you plan to implement.

Teams

This is intended to be a team project, with groups of no more than two people. You may work alone, but you will have to do nearly as much work as the two-person teams.

One-person projects need only implement one of the basic algorithms and run it on a range of graph sizes and loads on those sizes.

This project is worth 100 points (the other two projects were 50 points). Unless there is are extreme circumstances, both members of a team will receive the same grade for the project.

Software Engineering Tips

Start small. Before you start coding, decide what data structures you will need to implement the basic algorithm. Then decide what data structures you will need to implement your original algorithm. Be flexible with your design so that you can experiment with different routing algorithms. Then code it. Test your simulator on small graphs and check for correctness. (Write utility routines so that you can see what’s going on in a node at any time, similar to the kinds of functions we used for the AVL tree project. Set breakpoints so that you can step through your code.) It’s important to make sure your simulator is correct before you spend the (long) time you will need to run your simulator on a large number of graphs and loads.

If you develop using Visual C++, make sure to allow time to port it over to the UNIX systems to compile and test it there.

As a team, you can divide up the work and code separately or you can work together on the same code. In either case, make sure you know what your partner is doing. Nothing is worse than coding to a set of functions that unexpectedly change (or worse, don’t exist anymore).

Bonus Points

Bonus points (up to 10) will be awarded to group(s) that devise algorithms that are correct and have fast average delivery times. They must be deliver packets faster than the basic algorithm. Elegance of the algorithm and its implementation will factor into how the points are awarded.