% KLEE # symbolic execution week today: testing next: correct by construction take [507](http://courses.cs.washington.edu/courses/cse507/) if you want to learn more # building block for modern sw symbolic computation: Mathematica, compilers breakthrough in SAT/SMT solving many products: MS Office ([FlashFill](https://support.office.com/en-us/article/Use-AutoFill-and-Flash-Fill-2e79a709-c814-4b27-8bc2-c4dc84d49464)), Visual Studio (IntelliTest, StaticDV), ... many tools: SymDrive driver testing (OSDI'12), Portend datarace detection (ASPLOS'12), commutativity spec generation (SOSP'13), undefined behavior checker (SOSP'13), software dataplane verification (NSDI'14), Ironclad verification (OSDI'14), ... # how to test systems code user/community testing standardized tests: SPEC, NIST crypto test vectors testsuites by developers tool-generated tests # challenges of test generation what spec to test against coverage comprehensible output environment modeling performance # two examples OS kernel: complex input sources & state transitions compiler: deep pipeline [draw input & stages] q: use random bytes to test syscalls/gcc? # problem: generate "useful" tests blackbox: infinite monkey theorem domain knowledge: Csmith, trinity (syscalls), qemu img fuzzer whitebox (implementation knowledge): KLEE, SAGE, Catchconv [example: switch to Bill & James B.] # optimizations algorithms & heuristics implementation: data structures (immutable/functional - recall RCU), caching, parallel checking, IR choice, symbolic-concrete trade-off, ... # optimization: input equivalence observation: if two inputs cause a program to go through the same path (i.e., the same jumps), then it's a waste of time use symbolic execution to find distinctive inputs - how? # approach A run with symbolic input invoke the scheduler at each branch scheduler: fork the world & prune infeasible branch q: how exactly does the fork-and-prune optimization work? # approach B run with concrete input track symbolic branch conditions choose new input by flipping branch start over until no more new input q: how exactly does the optimization work? NB: there're many other approaches - check Rosette # heuristics space is huge & often cannot exhaust prioritize important/likely bugs domain-specific heuristics q: what would you do to test kernels? distributed systems? # demos [mini checker/fuzzer](https://github.com/xiw/mini-mc): test_me bad find-first-bit (ffs) impls optimized modulo q: bug finding vs verification? q: extend to support pointers/syscalls & more optimizations?