About Metrics for Clone Detection

Thierry Lavoie, Ettore Merlo


Clone detectors rely on the concept of similarity and dissimilarity measures to identify cloned fragments. The choice of specific distance function in a clone detector is arbitrary up to some extent. However, with a deeper knowledge of similarity measures, we can condition this choice to have some properties that can help improve scalability and quality of tools. This paper presents some interesting results, insights and questions about similarity and dissimilarity measures, including a somehow counter-intuitive result on the cosine distance.

