P. Ferrara
Static analysis via abstract interpretation of multithreaded programs

PhD Thesis
Prepared at the Ecole Polytechnique and at the Computer Science Departement of the Università Ca' Foscari of Venice
Defended at Ecole Normale Superieure of Paris on May 22nd, 2009

Final version: Pdf (1.36 MB)
Slides of the PhD Defense: Pdf (926 KB)

Abstract:

The goal of this thesis is to present a generic static analysis of Java multithreaded programs.
Multithreaded programs execute many task, called threads, in parallel. Threads communicate through the shared memory implicitly, and they synchronize on monitors, wait-notify primitives, etc... Some years ago dual core architectures started being distributed on the broad market at low price. Today almost all the computers are at least dual core. Manycore, i.e. putting more and more cores on the same CPU, is now the current trend of CPU market. This multicore revolution yields to new challenges on the programming side too, asking the developers to implement multithreading programs. Multithreading is supported natively by the most common programming languages, e.g. Java and CSharp.
The goal of static analysis is to compute behavioral information about the executions of a program, in a safe and automatic way. An application of static analysis is the development of tools that help to debug programs. In the field of static analysis, many different approaches have been proposed. We will follow the framework of abstract interpretation, a mathematical theory that allows to define and soundly approximate semantics of programs. This methodology has been already applied to a wide set of programming languages.
The basic idea of generic analyzers is to develop a tool that can be plugged with different numerical domains and properties. During the last years many works addressed this issue, and they were successfully applied to debug industrial software. The strength of these analyzers is that the most part of the analysis can be re-used in order to check several properties. The use of different numerical domains allows to develop faster and less precise or slower and more precise analyses.

In this thesis, the design of a generic analyzer for multithreaded programs is presented.
First of all, we define the happens-before memory model (as an over-approximation of the Java memory model) in fixpoint form and we abstract it with a computable semantics. Memory models define which behaviors are allowed during the execution of a multithreaded program. Starting from the (informal) definition of the happens-before memory model, we define a semantics that builds up all the finite executions following this memory model. An execution of a multithreaded program is represented as a function that relates threads to traces of states. We show how to design a computable abstract semantics, and we prove the correctness of the resulting analysis, in a formal way.
Then we define and abstract a new property focused on the non-deterministic behaviors due to multithreading, e.g. the arbitrary interleaving during the execution of different threads. First of all, the determinism of a multithreaded program is defined as difference between executions. If two executions expose different behaviors because of values read from and written to the shared memory, then that program is not deterministic. We abstract it in two steps: in the first step we collect, for each thread, the (abstract) value that it may write into a given location of the shared memory. At the second level we summarize all the values written in parallel, while tracking the set of threads that may have written it. At the first level of abstraction, we introduce the new concept of weak determinism. We propose other ways in order to relax the deterministic property, namely by projecting traces and states, and we define a global hierarchy. We formally study how the presence of data races may afflict the determinism of the program.
We apply this theoretical framework to Java. In particular, we define a concrete semantics of bytecode language following its specification. Then we abstract it in order to track the information required by the analysis of multithreaded programs. The core is an alias analysis that approximates references in order to identify threads, to check the accesses to the shared memory, and to detect when two threads own a common monitor thereby inferring which parts of the code cannot be executed in parallel.
The generic analyzer described above has been fully implemented, leading to Checkmate, the first generic analyzer of Java multithreaded programs. We report and deeply study some experimental results. In particular, we analyze the precision of the analysis when applied to some common pattern of concurrent programming and some case studies, and its performances when applied to an incremental application and to a set of well-known benchmarks.
An additional contribution of the thesis is about the extension of an existing industrial generic analyzer, Clousot, to the checking of buffer overrun. It turns out that this analysis is scalable and precise. In summary, we present an application of an existing, industrial, and generic static analyzer to a property of practical interest, showing the strength of this approach in order to develop useful tools for developers.

Commission:

Bibtex:

@phdthesis{Ferr09,
  author = {Ferrara, Pietro},
  title = {Static analysis via abstract interpretation of multithreaded programs},
  school = {Ecole Polytechnique of Paris (France) and University "Ca' Foscari"
	of Venice (Italy)},
  year = {2009},
  month = {May},
}