Robust Parsing of Cloned Token Sequences

Rainer Koschke, Ole Jan Lars Riemann


Token-based clone detection techniques are known for their
scalability, high recall, and robustness against syntax errors and
incomplete code. They, however, may yield clones that are
syntactically incomplete and they know very little about the syntactic
structure of their reported clones. Hence, their results cannot
immediately be used for automated refactorings or syntactic filters
for relevance.

This paper explores techniques of robust parsing to parse code
fragments reported by token-based clone detectors to determine whether
the clones are syntactically complete and what kind of syntactic
elements they contain.

This knowledge can be used to improve the precision of token-based
clone detection.

