Characterizing Scalability Issues in Spreadsheet Software using Online Forums


Spreadsheets are an ubiquitous data storage and querying tool. However, as our society produces more data, spreadsheets become unusable, sometimes hanging for minutes, other times crashing. We call the usability issues experienced in spreadsheets due to a large amount of data scalability issues. We seek out complaints from users to better understand the usability issues that develop in spreadsheets as data increases in order to inform the creation of better spreadsheet software for large quantities of data.

Our Approach

the 5 main themes and their subthemes

We utilized the website Reddit's API to scrape posts from the Microsoft Excel subreddit. We first scraped random posts from the site and separated them into themes. We then entered search terms into the forum which related to scalability issues such as "slow" and "crash". We then categorized these posts.

We found that of the 712 posts we collected, 83 posts related to scalability and these scalability posts as well as the random posts fell into 5 main themes: importing data, managing data, querying data, presenting data, and miscellaneous. From these results, we discuss the possible ways we can improve Microsoft Excel or inform the development of other spreadsheet tools meant to handle large quantities of data.


Kelly Mack, John Lee, Kevin Chen-Chuan Chang, Karrie Karahalios, and Aditya Parameswaran. Characterizing Scalability Issues in Spreadsheet Software using Online Forums. CHI 2018. pdf


