i-nth logo

Authors

Rishabh Singh, Benjamin Livshits, & Benjamin Zorn

Abstract

Spreadsheets are widely used for financial and other types of important numerical computations.

Spreadsheet errors have accounted for hundreds of millions of dollars of financial losses, but tools for finding errors in spreadsheets are still quite primitive. At the same time, deep learning techniques have led to great advances in complex tasks such as speech and image recognition.

In this paper, we show that applying neural networks to spreadsheets allows us to find an important class of error with high precision. The specific errors we detect are cases where an author has placed a number where there should be a formula, such as in the row totaling the numbers in a column.

We use a spatial abstraction of the cells around a particular cell to build a classifier that predicts whether a cell should contain a formula whenever it contains a number.

Our approach requires no labeled data and allows us to rapidly explore potential new classifiers to improve the effectiveness of the technique.

Our classifier has a low false positive rate and finds more than 150 real errors in a collection of 70 benchmark workbooks. We also applied Melford to almost all of the financial spreadsheets in the EUSES corpus and within hours confirmed real errors that were previously unknown to us in 26 of the 696 workbooks.

We believe that applying neural networks to helping individuals reason about the structure and content of spreadsheets has great potential.

Sample

Example of number in place of forumla
Example of number in place of forumla

In the example, we see that while the other columns are computed using SUM, column D instead just has a number in row 7, which, while numerically close, does not actually represent the sum of the values in the column.

We call such errors "number-where-formula-expected" (NWFE) errors. The job of the classifier is to identify cells like D7 and highlight them to the user.

Publication

2017, Microsoft Tech Report, Number MSR-TR-2017-5, January, pages 1-13

Full article

Melford: Using neural networks to find spreadsheet errors