=========================================================================== Static analysis teaser Consider the following code. x = 0; y = read_even_value(); x = y+1; y = 2*x; x = y-2; y = x/2; Assume all values are integers, there is no overflow, etc. What do you know about the variable values at the end of execution? For example, are they positive? Are they even? Do you know other facts? The most accurate result is that y has same value as its initial value that was read from input, and x is twice that. We can determine this by doing symbolic execution: for each variable value, determine an algebraic formula that represents its value. It's also a fact that x and y are both even. However, suppose that we used symbolic execution with a simpler abstraction (a simpler abstract domain), where each value is "even", "odd", or "unknown". This abstraction is simpler and faster to compute, but it loses information and the final value for y is "unknown" instead of "even". The field of static analysis is primarily about choosing an appropriate abstraction: one that is simple enough for efficient computation, but expressive enough to retain precision. Abstract interpretation is useful in program analysis and optimizations. It has been used for decades at Airbus to verify safety of avionics systems. ---------------- Here is some jargon that you should know in order to be prepared for the next class. If you don't know it, please look it up or ask the course staff a question about it. Most important: AST (and distinction from a parse tree): https://en.wikipedia.org/wiki/Abstract_syntax_tree control flow graph: https://en.wikipedia.org/wiki/Control_flow_graph basic block: https://en.wikipedia.org/wiki/Basic_block 3-address form: https://en.wikipedia.org/wiki/Three-address_code lattice: https://en.wikipedia.org/wiki/Lattice_(order) Less important: SSA (single static assignment form) SCC forward edges back edges cross edges spanning tree: useful in profiling dominators: useful in optimization Not important: Scott domain Galois interaction =========================================================================== Teaser for lecture on Cousot & Cousot paper: Give an explanation of the terminology -- a glossary =========================================================================== Teaser for third lecture: What is the reason that monotonicity is required of the lub function; that is, what might go wrong if lub is not monotonic? Give an example of a lub function that is not monotonic and that causes the problem. Using the same lattice and a lub function that is monotonic, show that the problem does not occur. ---------------- Answer: it is to guarantee that the analysis terminates (given a finite-height lattice) Example: Lattice elements = Top, Bottom lub = lub(T, T) = Bottom lub(T, Bot) = Top lub(Bot, T) = Top lub(Bot, Bot) = Top Transfer functions: none needed code: x = input() loop: goto loop or, equivalently, x = input() label: if (unanalyzable) goto label The CFG looks like _ | / \ | | v v | join | | | v | nop \_/ | v The estimate for x starts out as Top, but on every iteration through the loop it flip-flops and the analysis never terminates. ===========================================================================