From: Cem Paya 98 (Cem.Paya.98_at_Alum.Dartmouth.ORG)
Date: Wed Jan 28 2004 - 15:42:20 PST
Paper review: Lottery scheduling (Waldspurger, Weihl)
CSE551P, Cem Paya
This paper discusses a novel, randomized scheduling idea based on the concept of lotteries. Authors built a
proof-of-concept on top of Mach3.0 kernel and conducted some experiments. Proposed algorithm achieve
goods control over relative proportions of resources (such as CPU time) allocated to different processes
imposes near-negligible overhead compared to the built in schedule in spite of sub-optimal implementation that
scans all processes when searching for the winning ticket. Mach3.0 kernel for scheduling, the authors also hint
at the possibility of extending it to any resource under contention such as memory, bandwidth and ownership
of exclusion locks. There is even “reverse lottery” for reclaiming resources where the winner ends up losing a
resource such as physical memory page.
Basic idea involves two abstractions: a currency and a lottery ticket. Currencies form an acyclic graph rooted at
a single node for the base currency. Each entity receiving an allotment of lottery tickets (just like monetary
currency, one “physical” ticket may represent multiple “logical” tickets in the same way $5 bill represent five $1
bills) can issue their own tickets. Inactive processes are not eligible for “winning” in the lottery so their tickets
are not factored into the system. When it is time to allocate a resource such as quantum of CPU time, the
winning number is chosen randomly and the resource awarded to the task with that ticket.
Economic analogies have neat parallels in the computation world, all of them supported by the scheduler. For
example tickets can be transferred so that one process blocked on RPC call from another can simply pass its
tickets along to the server. This makes sense because tickets aren’t of any use to blocked process and for
that process to make any progress the server needs to execute on the call. Inflation corresponds to clients
minting more lottery tickets than they have been allocated, artifically bumping up their own priority. (That
would require a cooperative environment otherwise the process that can mint most tickets can end up
hogging all the resources.) Different currencies allow for the system to work across trust boundaries each
running their own lotteries by establishing an agreed-upon exchange rate. Compensation tickets partially
refund the winner who “declines” part of the prize by not using all of it, such as process that yields its
quantum before it is consumed. Lottery metaphor seems to be a good way of modelling resource allocation in
an operating system.
Probabilistic approach brings in interesting benefits: under the assumption of uniform distribution, law of large
numbers guarantees convergence on expected ratios—eg resource usage is proportional to number of tickets,
with the relative deviation going to zero because standard deviation scales as sqrt(n) while the mean goes as
n. This is a significant improvement over the opaque “priority numbers” used in UNIX or NT where the only
guarantee is that higher priority thread consumes more of the resource. Lottery scheduling offers probabilistic
guarantees about relative consumption rates. (Unfortunately pseudo-random number generator used here is
very poor-quality, linear congruential generator chosen for convenience of implementation in assembly
language instead of good randomness properties. Also the paper does not explain how PRNG is seeded—with
fixed seed material the algorithm is deterministic.) Experimental data supports this: for example a database
application with three simultaneous queries shows CPU rates proportional to the tickets allocated to each
query. Monte-Carlo simulation makes for another showcase for lottery scheduling: by deflating the number of
tickets available, long-running tasks receive less and less CPU time to allow for rapid initial convergence from
new simulations. Another experiment on video frame rates doesn’t fit that well however, with the authors
blaming the X-windowing server for the skew.
Overall this is an interesting idea. Basic implementation is straightforward and scheduler has low-enough
overhead (esp when tickets are arranged into tree structure) that it can schedule with very high granularity,
on the order of millisecond per quanta—in fact the authors assert a few thousand RISC instructions. Bigger
problems lurk around moving lottery tickets around. For example mods to Mach kernel were required to
support ticket transfer in RPC. Covering all cases when one process is blocked on another (eg exclusion) would
be very difficult. Similarly ticket inflation requires cooperation and planning between processes competing for
resources, which complicates programming model.
This archive was generated by hypermail 2.1.6 : Wed Jan 28 2004 - 15:42:26 PST