Onward! 2016
Sun 30 October - Fri 4 November 2016 Amsterdam, Netherlands
co-located with SPLASH 2016
Fri 4 Nov 2016 11:45 - 12:10 at Matterhorn 2 - Session 4 Chair(s): Veselin Raychev

Program similarity is a central challenge in many programming-related applications, such as code search, clone detection, automatic translation, and programming education.

We present a novel approach for establishing the similarity of code fragments by:
(i) obtaining textual descriptions of code fragments captured in millions of posts on question-answering sites, blogs and other sources, and
(ii) using natural language processing techniques to establish similarity between textual descriptions, and thus between their corresponding code fragments.

To improve precision, we use a simple static analysis that extracts type signatures, and combine the results of textual similarity with similarity of the signatures.
Because our notion of code similarity is based on similarity of textual descriptions, our approach can determine semantic relatedness and similarity of code across different libraries and even across different programming languages, a task considered extremely difficult using traditional approaches.
To evaluate our approach, we use data obtained from the popular question-answering site, Stackoverflow. To obtain a ground-truth to compare against, we developed a crowdsourcing system, Like2Drops, that allows users to label the similarity of code fragments. We used the system to collect similarity classifications for a massive corpus of 6,500 program pairs. Our results show that our technique is effective in determining similarity, and achieves more than 85 percent precision, recall and accuracy.

Fri 4 Nov

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:10
Session 4Onward! Papers at Matterhorn 2
Chair(s): Veselin Raychev ETH Zurich, Switzerland
10:30
25m
Talk
Exploring the Role of Sequential Computation in Distributed Systems: Motivating a Programming Paradigm Shift
Onward! Papers
Ivan Kuraj MIT CSAIL, USA, Daniel Jackson MIT
DOI
10:55
25m
Talk
Gramada: Immediacy in Programming Language Development
Onward! Papers
Patrick Rein Hasso Plattner Institute, Marcel Taeumel Hasso Plattner Institute, Robert Hirschfeld HPI
DOI
11:20
25m
Talk
Helping Johnny Encrypt: Toward Semantic Interfaces for Cryptographic Frameworks
Onward! Papers
Soumya Indela University of Maryland at College Park, Mukul Kulkarni University of Maryland at College Park, Kartik Nayak University of Maryland at College Park, Tudor Dumitras University of Maryland at College Park
DOI
11:45
25m
Talk
Leveraging a Corpus of Natural Language Descriptions for Program Similarity
Onward! Papers
Meital Zilberstein Technion, Eran Yahav Technion
DOI