Authors
Chris Chambers
Abstract
We present the evolution of a reasoning system for inferring dimension information in spreadsheets. The three papers included in this thesis show how the initial system can be used to check the consistency of spreadsheet formulas and thus is able to detect errors in spreadsheets, and the evolution to a system that can check both label and dimension errors.
The approach for these systems is based on three static analysis components. First, the spatial structure of the spreadsheet is analyzed to infer the labels for specific cells. Second, those cells that are identified as labels are analyzed to determine dimension information. Once this is completed the system, will look at formulas and, using specific rules, will determine if the dimensions and labels are correct.
An important aspect of the rule system defining dimension inference is that it works bi-directionally, that is, not only "downstream" from referenced arguments to the current cell, but also "upstream" in the reverse direction. This flexibility makes the system robust and turns out to be particularly useful in cases when the initial dimension information that can be inferred from headers is incomplete or ambiguous.
These systems have been implemented as a add-in for Excel, and this prototype has allowed us to perform several evaluations on the systems. These evaluations show that the systems can be effective in detecting dimension errors, with the initial system detecting errors in 50% of the investigated spreadsheets, and the subsequent systems having similar success. In addition these evaluations show that by adding label checking, the effectiveness and efficiency of the system is improved with many previously undetected errors being found.
Sample
This table shows the types of dimensions found in the EUSES corpus spreadsheets. Money and time dimensions are, by far, the most common.
Publication
2010, Master's thesis, Oregon State University, June