We are working on creating a Normalized Java Resource (NJR) that will speed up innovation in the area of software tools. Those tools include security enhancers, bug finders, and code synthesizers, all of which can benefit greatly from access to Big Code. Our vision is a diverse collection of 100,000 normalized Java projects that is executable, scriptable, and searchable. The Java projects stem from the Sourcerer collection and we normalize their representation to enable large-scale processing with reproducible results. Such processing includes execution, static and dynamic analysis, scriptable interaction, and search for projects with specific dynamic characteristics. For each search of the collection, NJR returns both a file with Java projects and a container for a cloud service such as Amazon EC2. Thus, a researcher can run tools on those projects both locally and on a cloud service. Researchers will be both beneficiaries and contributors to NJR. They benefit from searching for Java projects that fit their need, and once their tools run on NJR, they contribute to an ever-increasing collection of measurements. Notice the powerful network effect: the more people run tools on NJR, the more data we get for search, and the more data we get for search, the more people will want to search and run on NJR.

Invited Talks


Call for Presentations

We welcome presentations on tools that may benefit from NJR, on efforts to create large collections of programs, and on experiences with existing collections.