Lecture: verification

preparation

finish the Yggdrasil exercise

administrivia

lab X
- you are encouraged to work in pairs; how to split the work is entirely up to you
- everyone needs to submit a report
- no late days
- if you want to do a demo on Friday, let us know before Wednesday 11am
final exam
- similar to last year’s final (solution)
- focus on JOS and xv6; open book, open laptop, no communication
- also cover guest lectures and demos (see last year’s final)

history

some examples of efforts
compilers
- 1967: Correctness of a compiler for arithmetical expressions, McCarthy & Painter
- 1972: Proving compiler correctness in a mechanized logic, Milner & Weyhrauch
- 2009: CompCert
OS kernels
- 197x-1980: UCLA Unix security kernel - finished 90%+ spec and 20%- proof
- 198x: Kit
- 2009: seL4
what does verification provide
- a mechanical proof that the impl “meets” the spec
- assume a correct proof checker
cost
- verification effort, run-time performance, compability, learning curve
- seL4: “about 25–30 person years, to do this again it would be about 10 person years”
recent advance in SMT
- recall the lecture on KLEE
- still requires substantial human efforts
- Ironclad: 3 person-years for 6500 lines of implementation code
general questions
- what’s the TCB (trusted computing base): spec, tools, environment
- is the spec reasonable
  - strong enough to prevent bugs
  - simple enough for human review
- is the proof effort reasonable
  - should we use verification in daily software development
  - economy issue & human factors
- is the performance reasonable
  - is the impl is good for practical use (or, too trivial in order to simplify verification)
  - is it possible for verification to catch up with implementation

overview

example: develop a little-endian serializer/deserializer for 16-bit integers
- encode: n → bytes
- decode: bytes → n
- what’s the specification?
spec v0: forall 16-bit integer n, decode(encode(n)) == n
- what bugs cannot be captured?
- is this good enough?
verification: check if a given implementation meets the spec
- code: le16.c
- exhaustive testing: check the condition for n from 0 to 65535
- apply rewriting rules: show LLVM -O2 result constant true
- pros and cons?
search for proofs
- in the input space: check a claim is true for every input; easier to automate; boundedness
- in the “rewriting” space - rewrite the claim to true; harder to automate
discussion on spec complexity
- how about spec about the little endian details
- how about spec about just memory safety

FS verification

recent efforts
- Cogent
- Flashix
- FSCQ
- what’s the proof effort?
- what’s a “correct” FS
- what kind of bugs one can make (or can be prevented by verification)
the Yggdrasil paper focuses on two properties
- non-crash functional correctness
- crash safety
verification techniques
- leverage SMT solvers
- challenge: SMT scalability (i.e., need to limit the SMT expression size)
idea: crash refinement
- enable modularity: layers
- enable incremental verification: disk layouts, crash consistency models
- easy to translate into boolean expressions
discussions on some excellent answers - do you agree with them?

Since Yggdrasil cannot or has trouble reasoning about non-finite operations it would be unable to verify the garbage collection system used in deleting files.
– Anonymous

If a Yggdrasil user has an incorrect specification, Yggdrasil may pass the consistency invariants, yet produce a file system that does not have correct behavior. Similarly, if the consistency invariants are insufficient, e.g. trivially satisfiable such as an empty set of invariants, then Yggdrasil probably won’t be able to fail the given implementation for the given specification.
– Sean Wammer

If the upper layers are affected by details of the implementation of lower levels that do not appear in the specification, there could be an incompatibility because there are no integration tests run with the actual implementations, only their specifications. Suppose that layer 2 satisfies its specification, but has a side effect that is not noticed by Yggdrasil. Then layer 3 will be tested based on the specification of 2, and that side effect will be lost until you actually run the code all together in a comprehensive integration test. The bug may propogate up several layers as well, making the likelyhood of Yggdrasil catching it less likely.
– Sean Wammer