Lecture: verification
preparation
administrivia
- lab X
- you are encouraged to work in pairs; how to split the work is entirely up to you
- everyone needs to submit a report
- no late days
- if you want to do a demo on Friday, let us know before Wednesday 11am
- final exam
- similar to last year’s final (solution)
- focus on JOS and xv6; open book, open laptop, no communication
- also cover guest lectures and demos (see last year’s final)
history
- some examples of efforts
- compilers
- 1967: Correctness of a compiler for arithmetical expressions, McCarthy & Painter
- 1972: Proving compiler correctness in a mechanized logic, Milner & Weyhrauch
- 2009: CompCert
- OS kernels
- 197x-1980: UCLA Unix security kernel - finished 90%+ spec and 20%- proof
- 198x: Kit
- 2009: seL4
- what does verification provide
- a mechanical proof that the impl “meets” the spec
- assume a correct proof checker
- cost
- verification effort, run-time performance, compability, learning curve
- seL4: “about 25–30 person years, to do this again it would be about 10 person years”
- recent advance in SMT
- recall the lecture on KLEE
- still requires substantial human efforts
- Ironclad: 3 person-years for 6500 lines of implementation code
- general questions
- what’s the TCB (trusted computing base): spec, tools, environment
- is the spec reasonable
- strong enough to prevent bugs
- simple enough for human review
- is the proof effort reasonable
- should we use verification in daily software development
- economy issue & human factors
- is the performance reasonable
- is the impl is good for practical use (or, too trivial in order to simplify verification)
- is it possible for verification to catch up with implementation
overview
- example: develop a little-endian serializer/deserializer for 16-bit integers
- encode: n → bytes
- decode: bytes → n
- what’s the specification?
- spec v0: forall 16-bit integer n, decode(encode(n)) == n
- what bugs cannot be captured?
- is this good enough?
- verification: check if a given implementation meets the spec
- code: le16.c
- exhaustive testing: check the condition for n from 0 to 65535
- apply rewriting rules: show LLVM
-O2
result constant true
- pros and cons?
- search for proofs
- in the input space: check a claim is true for every input; easier to automate; boundedness
- in the “rewriting” space - rewrite the claim to true; harder to automate
- discussion on spec complexity
- how about spec about the little endian details
- how about spec about just memory safety
FS verification
- recent efforts
- Cogent
- Flashix
- FSCQ
- what’s the proof effort?
- what’s a “correct” FS
- what kind of bugs one can make (or can be prevented by verification)
- the Yggdrasil paper focuses on two properties
- non-crash functional correctness
- crash safety
- verification techniques
- leverage SMT solvers
- challenge: SMT scalability (i.e., need to limit the SMT expression size)
- idea: crash refinement
- enable modularity: layers
- enable incremental verification: disk layouts, crash consistency models
- easy to translate into boolean expressions
- discussions on some excellent answers - do you agree with them?
Since Yggdrasil cannot or has trouble reasoning about non-finite operations it would be unable to verify the garbage collection system used in deleting files.
– Anonymous
If a Yggdrasil user has
an incorrect specification, Yggdrasil may pass the consistency invariants, yet
produce a file system that does not have correct behavior. Similarly, if the
consistency invariants are insufficient, e.g. trivially satisfiable such as an
empty set of invariants, then Yggdrasil probably won’t be able to fail the
given implementation for the given specification.
– Sean Wammer
If the upper layers are affected by
details of the implementation of lower levels that do not appear in the
specification, there could be an incompatibility because there are no
integration tests run with the actual implementations, only their
specifications. Suppose that layer 2 satisfies its specification, but has a
side effect that is not noticed by Yggdrasil. Then layer 3 will be tested based
on the specification of 2, and that side effect will be lost until you actually
run the code all together in a comprehensive integration test. The bug may
propogate up several layers as well, making the likelyhood of Yggdrasil
catching it less likely.
– Sean Wammer