Lecture 16: Performance and Queueing theory

  • This week's theme is "putting the 'systems' in 'distributed systems'".
  • The systems motto is:
    • Is the problem the system solves real?
    • Does the system actually solve the problem?
      • Is the system correct?
      • Is the system fast enough to be useful?
  • Correctness is the most important aspect of distributed system design, in part because it is so much harder than single-node programming.
  • But performance is important too.

Systems

  • What is a system?
    • a set of interconnected components whose joint behavior is observed at the interface with its environment
  • the interface is a box separating the components in the system from "everything else"
  • outside the box is everything else, which we call "the environment"
  • inside the box is the system, including its components and their connections
    • each component is itself often a system, with its own internal structure, recursively
  • we observe the system at its boundary with the environment
    • observers don't look inside the box, they look at the behavior of the box
  • notice that this definition does not contain the word "computer" or even "electrical signals"
    • it applies equally well to biological systems, social systems, and mechanical systems
  • we will mostly be interested in computer systems, though many of the ideas also apply to other systems

Performance

  • "Performance" means "how many resources does the system require to do its work?"
  • Many ways of looking at this question.
  • In a single node setting
    • from a theoretical perspective, we often talk about time and space complexity
      • big-oh notation
    • from a practical perspective, we implement a program and measure its performance
      • come up with some benchmark input and measure how long it takes to run (and how much memory)
  • In a distributed setting, some things are different, especially historically:
    • historically, networks were slow, and so most of the time a system spent waiting for messages to be delivered.
    • this meant that it basically didn't matter how fast or slow the programs running on the nodes were, because that time was always outweighed by network delay.
    • we refer to this situation as being "I/O bound", because the performance of the system is determined by the performance of the network, not the CPU.
    • (a similar thing can happen in a single-node setting if you are spending most of your time waiting for the hard drive.)
  • Modern networks are fast, especially intradatacenter, and so distributed performance relies a lot more on single-node performance than it would with slow networks.
  • To measure performance of a distributed system:
    • from a theoretical perspective, can talk about message complexity or communication complexity, measuring how many messages and of what size are exchanged by the system.
    • from a practical perspective, implement the system an measure its performance
      • come up with some benchmark client workload, measure how long it takes to execute on the system

Throughput and Latency

  • Imagine we have some system that accepts requests, does some work to execute the request, and then sends responses.
  • Clients communicate with the system over the network, so to understand the performance of our system, we need to start by understanding the performance of the network

Network performance

  • Imagine a simple network that is just one very long ethernet cable between the client and the system, and ignore all actual details of ethernet :)
  • Two important measures of network performance:
    • (network) latency: the time between when a message is sent and received
      • network latency = propagation (speed of light) delay + transmission time
        • typical value: intradatacenter: 500\(\mu\)s (0.5ms); around-the-world: 40ms
      • transmission time = message size in bytes / network throughput
    • (network) throughput: the maximum amount of data that can be sent per second
      • network throughput is a function of the physical construction of the network
      • typical value in the data center for one ethernet cable: 10 gigabits per second (1.25 gigabytes per second)
  • Notice that latency depends on throughput!
  • Consider the case of sending a very small message, say 1 byte.
    • Then the transmission time is essentially 0, so the latency is just the propagation delay
    • So latency depends only on the distance between the two sides of the network
    • If we then send back a similarly small response, then the total latency of the request is two times the propagation delay. This is known as the "round-trip time" (RTT), because it's the shortest time it can take to send a message to the system and get back a response.
  • What about a large message, say 10 gigabytes, on a 10 gigabit network
    • Transmission time = size / throughput = 10 gigabytes / 10 (gigabits / sec) = 8 seconds
    • Latency = propagation delay + transmission time = 40ms (say) + 8 seconds = 8.040 seconds
    • so propagation delay is negligible compared to transmission time, so latency only depends on the message size

System performance: 1 server

  • Now let's add our system back to our analysis.
  • It's not just about getting a request message from the client to our system. We also have to do whatever that request asks us to do, and then send back a response.
  • Let's say our system is like the lab 1 RPC server: a single node that processes request sequentially in the order it receives them.
  • Now the latency of one request is:
    • total latency = network latency for the request + system processing time + network latency for the response
    • "network latency = propagation delay + transmission time" just like the previous section
  • The througput of the combined system is:
    • end-to-end throughput = min(network throughput, system throughput)
    • why the min? it's useful to understand this formula in terms of "where is the bottleneck?"
      • consider the case where the network is low-throughput, say 1kbps, and the system is high-throughput, say 1M requests per second, and say that request and response messages are 1 byte big.
        • then the network can only deliver 1k requests per second
        • even though the server would be able to process more, it doesn't have a chance
        • the network is the bottleneck
      • now consider the opposite case, where the network is high-throughput, say 10gbps, and the system is low-thorughput, say 1k requests per second
        • then the network is capable of delivering over 1 billion requests per second
        • but the server cannot serve requests that fast, so the requests will either wait in line or get dropped.
        • the server is the bottleneck
      • notice that in these two scenarios, the way the network and server were connected ("the box diagram") did not change, but the "reason" for the system's performance changed.
        • performance is often due to bottlenecks, but finding the specific bottleneck can be non-obvious
  • To measure the performance of the system, we can consider the offered load, which is the number of requests per second being sent by clients.
    • suppose that:
      • requests and responses are 1 byte each
      • the network has a throughput of 10 gbps
      • the server has a throughput of 1M requests per second
    • Then the network is capable of delivering about 1 billion requests per second.
    • Since the server can only process 1 million of them, what will happen to the others?
      • Drop them?
      • Store them in a queue and process them in the order received?
    • Both are possible, but clearly we're in a badly overloaded situation here.