Spreadsheet bibliography

Title A grammar for spreadsheet formulas evaluated on two large datasets
Authors Efthimia Aivaloglou, David Hoepelman, & Felienne Hermans
Year 2015
Type Proceedings
Publication 15th IEEE International Working Conference on Source Code Analysis and Manipulation
Series September
Abstract

Spreadsheets are ubiquitous in the industrial world and often perform a role similar to other computer programs in many different domains. However, there does not exist a reliable grammar that is concise enough to facilitate research on spreadsheet formula code bases.

This paper presents a grammar for spreadsheet formulas that is compatible, is compact enough to feasibly implement with a parser generator, and produces parse trees suited for further manipulation and analysis.

We evaluate the grammar against more than one million unique formulas extracted from the well known EUSES and Enron spreadsheet datasets, successfully parsing 99.99%. Additionally, we utilize the grammar to analyze these datasets and measure the frequency of usage of language features in spreadsheet formulas.

Finally, we identify smelly constructs and edge cases in the syntax of formulas.

Full version Available
Sample
Grammar for Formula
Grammar for Formula
This is the syntax diagram of the Formula production rule, with most of its production rules expanded.