Tue 6 Nov 2018 16:30 - 17:00 at Newbury - IV

Copying a code fragment and then reusing it by pasting and adapting (e.g., adding/modifying/deleting statements) is a common practice in software development, resulted in a significant amount of duplicated code in software systems. On the other hand, duplicated code poses a number of threats to the maintenance of software systems such as clones are the #1 “bad smell” in Fowler’s refactoring list. Software clones are thus considered to be one of the major contributors to the high software maintenance cost, which could be up to 80% of the total software development cost. The era of Big Data has introduced new applications for clone detection. For example, clone detection has been used to find similar mobile applications, to intelligently tag code snippets, to identify code examples, and so on from large inter-project repositories. The dual role of clones in software development and maintenance, along with these many emerging new applications of clone detection, has led to a great many clone detection tools and analysis frameworks. In this talk, I will outline our experience in developing clone detection tools from large-scale inter-projects code repositories using even a desktop machine with standard hardware configurations. I will then also talk about how do we evaluate such large-scale clone detection tools using our BigCloneBench, a clone benchmark of more than eight million manually validated clone pairs in 25 thousand Java projects.

Tue 6 Nov

Displayed time zone: Guadalajara, Mexico City, Monterrey change