Spreadsheet bibliography

Title Data clone detection and visualization in spreadsheets
Authors Felienne Hermans, Ben Sedee, Martin Pinzger, & Arie van Deursen
Year 2012
Type Article
Publication Delft University of Technology
Series Software Engineering Research Group, Technical Report Series

Spreadsheets are widely used in industry: it is estimated that end-user programmers outnumber programmers by a factor 5. However, spreadsheets are error-prone, numerous companies have lost money because of spreadsheet errors. One of the causes for spreadsheet problems in the prevalence of copy-pasting.

In this paper, we study this cloning in spreadsheets. Based on existing text-based clone detection algorithms, we have developed an algorithm to detect data clones in spreadsheets: formulas whose values are copied as plain text in a different location.

To evaluate the usefulness of the proposed approach, we conducted two evaluations. A quantitative evaluation in which we analyzed the EUSES corpus and a qualitative evaluation consisting of two case studies. The results of the evaluation clearly indicate that:

  • Data clones are common.
  • Data clones pose threats similar to those code clones pose.
  • Our approach supports users in finding and resolving data clones.
Full version Available
Clone detection pop-up
The clone detection pop-up shows the copy-paste dependency for our example. On the formula side, we show where the data is copied and on the data side, we indicate the source.