i-nth - Understanding data analysis workflows on spreadsheets: Roadblocks and opportunities

Authors

Pingjing Yang, Cheng Ti-Chung, Sajjadur Rahman, Mangesh Bendre, Karrie Karahalios, & Aditya Parameswaran

Abstract

Spreadsheets are widely used for data management and analysis by individuals and teams with varying degrees of programming expertise across a spectrum of domains.

While several papers have studied the prevalence of errors on spreadsheets and performed ethnographic studies on spreadsheet use, little is known about how spreadsheet users approach and address computational tasks on spreadsheets, especially on relatively large datasets.

To understand how users analyze data on spreadsheets, we conducted a study consisting of eight common analytical tasks, with thirty-two participants. Participants developed an execution strategy for each task and then attempted to operationalize this strategy within the spreadsheet system. From examining the study results and transcripts, we identified the successful and unsuccessful strategies participants adopted in addressing the tasks.

In general, we find that unsuccessful spreadsheet users had difficulties mapping spreadsheet models to their predetermined execution strategies, comprehending online help documents when trying to learn how to use new formulae, and identifying workarounds when confronted with roadblocks.

We identify opportunities to reduce barriers in computational task completion, including improvements to the spreadsheet interface and better training/educational methodologies and tools.

Sample

Sankey diagram of task progression and outcome

The figure shows a Sankey diagram summarizing how participants attempted a task.

Out of 24 participants, 6 participants gave an incorrect answer after performing their planned approaches. Among them, one participant used a different approach to achieve the correct result, while five participants gave up.

We identified three typical flows for participants when attempting to address tasks:

Successful submissions &endash; where participants were able to complete a task successfully at the first attempt.
Refined successful submission &endash; where participants initially failed, but were able to refine their strategies to complete a task.
Unsuccessful submission &endash; where participants did not recover from a failure.

Publication

2020, Proceedings of Workshop on Human-In-the-Loop Data Analytics (HILDA'20), June

Full article

Understanding data analysis workflows on spreadsheets: Roadblocks and opportunities