Really, that statistic needs to be repeated: 95% of spreadsheets have errors.
This is not just a random, made-up statistic. Extensive research into spreadsheets has consistently found that almost all spreadsheets have errors.
Large spreadsheets are more likely to have errors, meaning that they're almost certainly wrong.
Research and experience agree
Our assertion that 95% of spreadsheets have errors is based on:
- Academic audits of real spreadsheets and experiments in which users create spreadsheets. These studies typically produce high error rates – even for simple spreadsheets. For a summary of spreadsheet error rates across many studies, see What we know about spreadsheet errors.
- In practice, professional spreadsheet auditors say that they have never seen a major spreadsheet that was free of errors.
How can a 95% error rate be true?
It seems surprising that a spreadsheet almost certainly has errors. So, how does that happen?
Most people think that they are very accurate when doing most activities – and they're right. Research has shown that, for a wide range of cognitive tasks, humans make an error in only 1% to 5% of tasks.
For example, typical error rates for simple, non-trivial activities are:
- Type a short number: 1.0% (per number).
- Grammatical errors: 1.1% (of words).
- Simple arithmetic: 2.0% (of calculations).
- Software development: 3.7% (per line of code).
- Type 10 digits: 5.0% (per number).
The spreadsheet Cell Error Rate (CER)
Experiments in spreadsheet development have seen similar rates, with errors in 1% to 5% of cells – this is called the "Cell Error Rate" (CER).
Although a CER of 1% to 5% seems low, the cascading nature of spreadsheet calculations means that errors accumulate through the calculations down to the bottom-line results.
Most spreadsheets have hundreds or thousands of formulae. Even with a small probability of error in each formula, the accumulated probability that bottom-line results have errors is bound to be high. Even for a small spreadsheet, the accumulated probability of error tends towards 100%.
The probability that a spreadsheet has at least one error is shown in the following chart.
The chart assumes that each cell has the same independent probability of containing an error. For example, if a spreadsheet has only 100 used cells (which is small), and assuming a moderate cell error rate of 3%, then the probability of at least one error in the spreadsheet is about 95%. This aligns well with experience.
The more used cells a spreadsheet has, the more likely it is that there are many errors. Even with a low cell error rate of 1%, a spreadsheet that has 1,000 used cells has a 97% probability of having at least 5 errors, and the expected number of errors is 10.
In general, errors seem to occur in a few percent of all cells, meaning that for large spreadsheets, the issue is how many errors there are, not whether an error exists.
Panko, What we know about spreadsheet errors
For more information about the modelling of cascading spreadsheet errors and accumulated error probabilities, see Calculation cascade: A common cause of catastrophe.