i-nth logo

Authors

Wensheng Dou, Shing-Chi Cheung, Chushu Gao, Chang Xu, Liang Xu, & Jun Wei

Abstract

Spreadsheets are widely used by end users for various business tasks, such as data analysis and financial reporting.

End users may perform similar tasks by cloning a block of cells (table) in their spreadsheets. The corresponding cells in these cloned tables are supposed to keep the same or similar computational semantics. However, when spreadsheets evolve, thus cloned tables can become inconsistent due to ad-hoc modifications, and as a result suffer from smells.

In this paper, we propose TableCheck to detect table clones and related smells due to inconsistency among them. We observe that two tables with the same header information at their corresponding cells are likely to be table clones.

Inspired by existing finger-print-based code clone detection techniques, we developed a detection algorithm to detect this kind of table clones. We further detected outliers among corresponding cells as smells in the detected table clones.

We implemented our idea into TableCheck, and applied it to real-world spreadsheets from the EUSES corpus.

Experimental results show that table clones commonly exist (21.8%), and 25.6% of the spreadsheets with table clones suffer from smells due to inconsistency among these clones.

TableCheck detected table clones and their smells with a precision of 92.2% and 85.5%, respectively, while existing techniques detected no more than 35.6% true smells that TableCheck could detect.

Sample

Table clones in a spreadsheet
Table clones in a spreadsheet

TableCheck works based on the observation that two tables (blocks of cells), if having the same row and column headers at their corresponding cells, are likely to share the same computational semantics and become table clones.

To detect smells among grouped table clones, we analyze possible inconsistency among corresponding cells in these table clones, and mark outliers from them as smells.

The cells marked by a red right-cornered triangle are smelly.

Publication

2016, 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, November, pages 787-798

Full article

Detecting table clones and smells in spreadsheets