i-nth logo

Authors

Thomas Schmitz & Dietmar Jannach

Abstract

Spreadsheet environments like MS Excel are the most widespread type of end-user software development tools and spreadsheet-based applications can be found almost everywhere in organizations.

Since spreadsheets are prone to error, several approaches were proposed in the research literature to help users locate formula errors. However, the proposed methods were often designed based on assumptions about the nature of errors and were evaluated with mutations of correct spreadsheets.

In this work we propose a method and tool to identify real-world formula errors within the Enron spreadsheet corpus. Our approach is based on heuristics that help us identify versions of the same spreadsheet and our software helps the user identify spreadsheets of which we assume that they contain error corrections.

An initial manual inspection of a subset of such candidates led to the identification of more than two dozen formula errors.

We publicly share the new collection of real-world spreadsheet errors.

Sample

Analyzing differences of a spreadsheet
Analyzing differences of a spreadsheet

Tools and techniques for reducing model risk:

  • Model design review. Rapidly assess whether the model appears to be fit for the purpose intended and is built to an adequate standard.
  • Top level/analytical review. Review the model's "big picture" to detect potentially large errors.
  • Degree of integration and reconcilliation of financial statement forecasts. Failure to properly integrate P&L, balance sheet and cashflow is a common error.
  • Parallel modelling. A re-performance technique either for the model as a whole or for key risk areas.
  • Flex and sensitivity testing. Review the reasonableness of sensitivity runs.
  • Macro review. Modellers are increasingly using more complex macros. We need to differentiate between low- and high-risk macros.

Publication

2016, IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), September

Full article

Finding errors in the Enron spreadsheet corpus