Automatic Abstraction for Bit-Vectors using Decision Procedures
This dissertation is concerned with abstract interpretation of programs whose semantics is defined over finite machine words. Most notably, the considered class of programs contains executable binary code, the analysis of which turns out demanding due to the complexity and the sheer number of involved operations. Challenging for correct yet precise abstract interpretation of binary code are transfer functions, which simulate the execution of any concrete operation in a program in an abstract domain. Crucially for correctness, overand underflows need to be supported faithfully. This dissertation argues that transfer functions and abstractions for sequences of operations over finite machine words can precisely and efficiently be generated, which contrasts with classical methods that depend on handcrafted transfer functions. To support this statement, we present an approach that eliminates the time-consuming process of manually deriving transfer functions altogether. The core of our methods are specifications of the concrete semantics of sequences of operations, which are given in propositional Boolean logic. By utilizing SAT and SMT solvers, which can determine satisfiability of Boolean formulae, we show how to automatically synthesize optimal abstractions from such semantic specifications. The practicality of our method is highlighted using abstractions generated for a variety of numerical domains that are frequently used in abstract interpretation. The abstract domains considered in this dissertation are, most notably, intervals, value sets, octagons, convex polyhedra, arithmetical congruences, affine equalities, and polynomials of bounded degree. Importantly, all presented techniques automatically handle finiteness of machine words, which manifests itself in overand underflows. Once the analysis of a program has terminated, an abstract interpreter often emits a warning that highlights a potential error in the analyzed program. Since abstract interpretation computes an over-approximation of the states reachable in a concrete execution, such a warning may be spurious. For this setting, we present variations of our methods, which compute complete abstractions. Then, it is possible to provide guarantees about actually reachable states, which allows us to do both, identify a warning as spurious or generate a legitimate counterexample trace.
Jörg Brauer