Spreadsheet bibliography

Title Understanding data analysis workflows on spreadsheets: Roadblocks and opportunities
Authors Pingjing Yang, Cheng Ti-Chung, Sajjadur Rahman, Mangesh Bendre, Karrie Karahalios, & Aditya Parameswaran
Year 2020
Type Proceedings
Publication Proceedings of Workshop on Human-In-the-Loop Data Analytics (HILDA'20)
Series June

Spreadsheets are widely used for data management and analysis by individuals and teams with varying degrees of programming expertise across a spectrum of domains.

While several papers have studied the prevalence of errors on spreadsheets and performed ethnographic studies on spreadsheet use, little is known about how spreadsheet users approach and address computational tasks on spreadsheets, especially on relatively large datasets.

To understand how users analyze data on spreadsheets, we conducted a study consisting of eight common analytical tasks, with thirty-two participants. Participants developed an execution strategy for each task and then attempted to operationalize this strategy within the spreadsheet system. From examining the study results and transcripts, we identified the successful and unsuccessful strategies participants adopted in addressing the tasks.

In general, we find that unsuccessful spreadsheet users had difficulties mapping spreadsheet models to their predetermined execution strategies, comprehending online help documents when trying to learn how to use new formulae, and identifying workarounds when confronted with roadblocks.

We identify opportunities to reduce barriers in computational task completion, including improvements to the spreadsheet interface and better training/educational methodologies and tools.

Full version Available
Sankey diagram of task progression and outcome
Sankey diagram of task progression and outcome

The figure shows a Sankey diagram summarizing how participants attempted a task.

Out of 24 participants, 6 participants gave an incorrect answer after performing their planned approaches. Among them, one participant used a different approach to achieve the correct result, while five participants gave up.

We identified three typical flows for participants when attempting to address tasks:

  • Successful submissions &endash; where participants were able to complete a task successfully at the first attempt.
  • Refined successful submission &endash; where participants initially failed, but were able to refine their strategies to complete a task.
  • Unsuccessful submission &endash; where participants did not recover from a failure.