Why no one uses Copilot for Excel

Jesse Kim
Cruising Altitude
Published in
3 min readJun 15, 2024

(Same question as why no one uses ChatGPT to work its magic on numbers)

The Chosen Twenty-Six

The answer is simple. A language model produces approximations of what it has seen and heard. It does not compute. Neither does it reason.

At work, I was lucky enough to get an early-access licence to the various Copilot services. By now, “GenAI”-friendly use cases are clear. Summarisation, proofreading, translation, citation, and illustration are all time-saving acts of approximation based on how the materials known to the language model are treated in ways known to the language model. “In this study, AI hallucinated 17–31% of the time,” may indeed be great news leading to a seal of approval in certain applications of generative AI; simply weed out the hallucinations and take what’s left. That may still represent a quantum leap in efficiency compared to pre-AI efforts.

Hallucinations may in fact contribute to creativity. Prompt ChatGPT to draw an illustration of a classroom with 5 pupils in it, and it can produce one with 26 pupils, subsequently chosen as the final work (a true story). But even in such applications, one must watch out for side effects. Try generating a logo repeatedly, containing two English-dictionary words, and observe how often ChatGPT spells those words correctly in each image.

In any case, I strongly doubt there is a valid threshold for the allowance of hallucinations when a C-level executive asks generative AI to analyse a fresh set of financial statements. When faced with a new dataset, an analyst typically performs initial profiling of what they are looking at to understand what the fields and values represent, trying to pick out any nuances or characteristics to be aware of. Feed the same data to ChatGPT or Copilot for Excel to have specific questions answered, and it struggles and hallucinates in a variety of forms — despite the expectation that the language model should have no problem understanding the context, structure, plain-language terminology, and nuances of the data.

The output, if produced at all, tends to be the work of qualitative pretence (as in “this is how I know a bar chart is commonly drawn”) as opposed to the results of the requested computations. More importantly, there is no way to figure out or trace how it got to that output. If something can’t be reproduced, it does not qualify as science.

Sure, the problem may be in the prompt. In addition to the raw data, I kindly verbalise all the context, structure, plain-language terminology, and nuances that I think are fundamental to analysis. AI thanks me for that. I then approach the problem step by step rather than jumping straight to the question that needs to be answered. AI follows along better this time, producing correct responses, but only up to the point where it decides that I have made it sufficiently dizzy. I then ask myself, why am I doing the job of a high school statistics teacher preparing exam questions, instead of producing output?

Luckily, when it comes to quantitative analysis, there is AI that brings actual results — one that predates both ChatGPT and Copilot by a fat margin — it’s called Excel. It’s also called Matlab, R, WolframAlpha, SAS, Maple, Tableau, and Power BI. Just let computers compute.

Adding a layer of generative approximation (Copilot) to the very tool for computational analysis (Excel) is one of the most redundant ideas I have heard of. That is why Copilot for Excel remains an also-ran among the Copilot family, a very distant one.

--

--