Lecture 16: Performance and Queueing theory

This week's theme is "putting the 'systems' in 'distributed systems'".
The systems motto is:
- Is the problem the system solves real?
- Does the system actually solve the problem?
  - Is the system correct?
  - Is the system fast enough to be useful?
Correctness is the most important aspect of distributed system design, in part because it is so much harder than single-node programming.
But performance is important too.

Systems

What is a system?
- a set of interconnected components whose joint behavior is observed at the interface with its environment
the interface is a box separating the components in the system from "everything else"
outside the box is everything else, which we call "the environment"
inside the box is the system, including its components and their connections
- each component is itself often a system, with its own internal structure, recursively
we observe the system at its boundary with the environment
- observers don't look inside the box, they look at the behavior of the box
notice that this definition does not contain the word "computer" or even "electrical signals"
- it applies equally well to biological systems, social systems, and mechanical systems
we will mostly be interested in computer systems, though many of the ideas also apply to other systems

Performance

"Performance" means "how many resources does the system require to do its work?"
Many ways of looking at this question.
In a single node setting
- from a theoretical perspective, we often talk about time and space complexity
  - big-oh notation
- from a practical perspective, we implement a program and measure its performance
  - come up with some benchmark input and measure how long it takes to run (and how much memory)
In a distributed setting, some things are different, especially historically:
- historically, networks were slow, and so most of the time a system spent waiting for messages to be delivered.
- this meant that it basically didn't matter how fast or slow the programs running on the nodes were, because that time was always outweighed by network delay.
- we refer to this situation as being "I/O bound", because the performance of the system is determined by the performance of the network, not the CPU.
- (a similar thing can happen in a single-node setting if you are spending most of your time waiting for the hard drive.)
Modern networks are fast, especially intradatacenter, and so distributed performance relies a lot more on single-node performance than it would with slow networks.
To measure performance of a distributed system:
- from a theoretical perspective, can talk about message complexity or communication complexity, measuring how many messages and of what size are exchanged by the system.
- from a practical perspective, implement the system an measure its performance
  - come up with some benchmark client workload, measure how long it takes to execute on the system

Throughput and Latency

Imagine we have some system that accepts requests, does some work to execute the request, and then sends responses.
Clients communicate with the system over the network, so to understand the performance of our system, we need to start by understanding the performance of the network

Network performance

Imagine a simple network that is just one very long ethernet cable between the client and the system, and ignore all actual details of ethernet :)
Two important measures of network performance:
- (network) latency: the time between when a message is sent and received
  - network latency = propagation (speed of light) delay + transmission time
    - typical value: intradatacenter: 500\(\mu\)s (0.5ms); around-the-world: 40ms
  - transmission time = message size in bytes / network throughput
- (network) throughput: the maximum amount of data that can be sent per second
  - network throughput is a function of the physical construction of the network
  - typical value in the data center for one ethernet cable: 10 gigabits per second (1.25 gigabytes per second)
Notice that latency depends on throughput!
Consider the case of sending a very small message, say 1 byte.
- Then the transmission time is essentially 0, so the latency is just the propagation delay
- So latency depends only on the distance between the two sides of the network
- If we then send back a similarly small response, then the total latency of the request is two times the propagation delay. This is known as the "round-trip time" (RTT), because it's the shortest time it can take to send a message to the system and get back a response.
What about a large message, say 10 gigabytes, on a 10 gigabit network
- Transmission time = size / throughput = 10 gigabytes / 10 (gigabits / sec) = 8 seconds
- Latency = propagation delay + transmission time = 40ms (say) + 8 seconds = 8.040 seconds
- so propagation delay is negligible compared to transmission time, so latency only depends on the message size

System performance: 1 server

Now let's add our system back to our analysis.
It's not just about getting a request message from the client to our system. We also have to do whatever that request asks us to do, and then send back a response.
Let's say our system is like the lab 1 RPC server: a single node that processes request sequentially in the order it receives them.
Now the latency of one request is:
- total latency = network latency for the request + system processing time + network latency for the response
- "network latency = propagation delay + transmission time" just like the previous section
The througput of the combined system is:
- end-to-end throughput = min(network throughput, system throughput)
- why the min? it's useful to understand this formula in terms of "where is the bottleneck?"
  - consider the case where the network is low-throughput, say 1kbps, and the system is high-throughput, say 1M requests per second, and say that request and response messages are 1 byte big.
    - then the network can only deliver 1k requests per second
    - even though the server would be able to process more, it doesn't have a chance
    - the network is the bottleneck
  - now consider the opposite case, where the network is high-throughput, say 10gbps, and the system is low-thorughput, say 1k requests per second
    - then the network is capable of delivering over 1 billion requests per second
    - but the server cannot serve requests that fast, so the requests will either wait in line or get dropped.
    - the server is the bottleneck
  - notice that in these two scenarios, the way the network and server were connected ("the box diagram") did not change, but the "reason" for the system's performance changed.
    - performance is often due to bottlenecks, but finding the specific bottleneck can be non-obvious
To measure the performance of the system, we can consider the offered load, which is the number of requests per second being sent by clients.
- suppose that:
  - requests and responses are 1 byte each
  - the network has a throughput of 10 gbps
  - the server has a throughput of 1M requests per second
- Then the network is capable of delivering about 1 billion requests per second.
- Since the server can only process 1 million of them, what will happen to the others?
  - Drop them?
  - Store them in a queue and process them in the order received?
- Both are possible, but clearly we're in a badly overloaded situation here.