Why Use Excel and Why Not

Excel is an extraordinarily popular tool, but are we sure it’s the right tool for everything?

Glauco Uri
Geek Culture
6 min readApr 27, 2021

--

Introduction

Excel born to organize data and typically is used to perform (sometimes not so) basic analysis and models. It is widely spread in offices across the globe, used in all working areas and in companies of all dimensions.
If one billion people use Microsoft Office [1] globally, assuming that about 1/3 of these use mainly Excel, this is a surprising ~300M of users[2], and they are definitely not going down!

It is a de-facto professional productivity tool for pretty much anything that requires data organization.

But looking deeper, its strength is focused in:

  • Data entry and data management
  • Reporting
  • Planning
  • Time and task management
  • Financial analysis and modeling

The potential uses are seemingly endless, clearly not limited to data that needs to be organized, but let’s look a little deeper at the use cases

Kind of uses

There are mainly, three different kinds of users. Leaving aside the first phase of the data life cycle, data entry, for which the spreadsheet tool is obviously very effective, the driver I’ve used for these categories, was the type of output produced, and nevertheless, the deployment of the outcome itself. From this point of view, data, are always loaded in one or more sheets, cleaned, organized, and augmented in the same manner but these use cases trigger different data flow with different criticalities.

Single user — Manager case

In this use case, the outcome is consumed from the sheet itself. This is the typical use case for management roles, e.g. for time, task or people management, planning like Gantt, or timesheet. Manual data entry like report or polls. Produced data are cut and pasted into a presentation or sometimes a report tool. Files size tends to remain small so data can circulate via emails.

Data analysis — Analyst case

When the analyst work on a spreadsheet, introduce a little bit of complexity in data manipulation, validation, data augmentation, and some report, but the result is required elsewhere. Sometimes the file is stored in a shared folder, after all, it has low concurrency, it is mainly used by one user at a time . Tabular data could go into a database, using an intermediate file like a csv, or, a report build of colored charts must go to be published or imported into a power point. Maybe these operations must be done with a regular schedule. For these cases, the workspace start begins to be tight, because there are some weak points:

  • It is not easy to integrate Excel into an automated process.
  • It’s very hard to insert it in modern infrastructure cloud-based, because there are no API available to handle it without its GUI.
  • It can only work on a limited amount of data.

This means that repetitive operations such as seasonal analyzes will have to be done by hand. It probably seems obvious, because it was created to be used by a single user and on limited data, so the necessary integration tools are lacking, but this can put teams in a cul de sac.

Advanced — Modeler case:

In some cases, unfortunately several, excel was loaded with a lot of tasks. In addition to the precedents, data manipulation effectively becomes a model, with all phases like calibration, backtesting, and validation. Models and time series analysis frequently are involved in forecasting models with a lot of parameters and variables to project. Sometimes data to handle become huge and more people are involved. In addition to the table data and reports, as in previous use cases, we have new parameters and the models themselves as outcomes.

This scenario is the most critical case for all IT departments that must handle these situations.

  • No versioning instruments can be used.
  • No change management tools available
  • Concurrence over shared files leads to the loss of a lot of work.
  • Often Excel was empowered with VS code or add-ins, this adds more critical elements.
  • Data processing can be done only on one computer, almost never a server.

Excel seems to be not enough

In recent years, some very useful tools have grown up and are used to try to calm these issues. I report below some tools that in the python integration field I think are the most interesting. These libraries and add-ins are widely used in the fields of tool integration, data integration, automation, and functional enhancement.

  • pyxll Add-Ins, developed by Tony Roberts [in]. “No VBA, just Python!”, the manifesto is all about being able to extend Excel UDFs with functions written in python language a discourages use of VB
  • xlwings Add-Ins, developed by Felix Zumstein. “Make Excel Fly!” also in this tool, Excel can be pimped with functions written in python code, nevertheless some functionality of the underlying com obj are exposed via REST API.
  • openpyxl library developed by Eric Gazoni and Charlie Clark, this is at a low level than previous libraries, but AFAIK was widely used thanks to its capability to read xlsx files either data and formulas.
  • pycel, developed by Dirk Gorissen [in] it is, unfortunately, an outdated library. This project has an innovative and very interesting approach “that can translate an Excel spreadsheet into executable python code”. This is the less common library of the four, I have never met it in production environments.

For further information on use cases, I have written some in-depth articles on these topics, related to a tool I’m working on. XLtoy library is not currently ready for use in production environments anyway can be used to solve some critical issues in the deployment and maintenance processes.

Conclusions

Despite the impressive number of users and the libraries mentioned above, there is still a lot to be done to integrate the data produced in Excel into the increasingly modern processes.

Someone rewrite models in other languages like SAS, R, or python, move the database on more robust infrastructures, but any time, they must spend a lot of time to do the same works without automation.

Others prefer to cage the Excel sheet in a semi-automatic environment and dominate it through external scripts to automate the import and execution of models.

Others prefer to adopt a hybrid approach, mixing manual operations, like loading the database and scripts… but the substance does not change, these are fragile solutions.

From my point of view, use Excel to write complex models to be used frequently or in an extra-excel environment can become a painful choice. Excel is simple to learn and widely used, but it is perceived as an expensive nuisance by IT departments, with too many gaps in the development process, too many gaps in the maintenance process, too many gaps in the integration processes. This means that globally, for example in the banking sector, the cost of these fragile solutions is far too high. Data and software engineers spend most of their time chasing performance, translation errors, execution errors in hostile environments. An expensive nuisance.

Resourses

--

--

Glauco Uri
Geek Culture

Pragmatic programmer | Data Engineer | Python architect