Tue 6 Nov 2018 13:30 - 14:00 at Newbury - III

Research in program analysis is usually evaluated using specialized microbenchmarks or test suites, custom collections of real-world code, or established evaluation corpora such as the Qualitas Corpus, the XCorpus, or the DaCapo benchmark suite. In some cases, researchers are able to compare their approach directly to those of other researchers and can, thus, show improvement over a previously established baseline. As this process establishes comparability between different approaches or different instances of an approach it is considered desirable.

This is, however, not easily achieved. We as researchers use various methods and program corpora to evaluate our approaches and reusing research artifacts such as the analysis itself or the evaluation results is hardly ever possible. The reasons are manifold: First, we all use different platforms and frameworks to drive our research, hence input and output formats are not compatible. Second, our input and output data is sometimes not made available to the public. Third, our experiments are not repeatable as their implementation is either not available or not runnable anymore. And fourth, existing research to compare against is hard to find.

In the Delphi project we aim to alleviate and mitigate these problems impeding researchers from producing comparative evaluations for program analysis research. We will present our current solution for the normalization of input program sets which helps researchers to find representative input data for their analysis in a way that makes it repeatable for other researchers. Also, we present our plans to extend the platform with features for better interconnectivity and findability for analysis implementations and research data. Through these changes we believe to proliferate harmonization of output formats and reuse of result data inside the program analysis community.

Slides (Delphi.pdf)9.62MiB

Tue 6 Nov

Displayed time zone: Guadalajara, Mexico City, Monterrey change