Data Science Collective

Advice, insights, and ideas from the Medium data science community

Member-only story

Missing Data? Use Explainable AI to Fill the Gaps (Correctly)

9 min readMar 14, 2025

--

I used to work as a scientist in theoretical particle physics. It was super data-heavy, but I generated most of that data myself with Monte Carlo simulations. If a data point was missing, I just generated a new one.

These days, I work with real-world data in finance. I process tons of datapoints that companies produce, and figure out what it means for a company’s ability to make money.

You might be surprised to hear this, but even bluechip companies can be incredibly sloppy about reporting consistent data. This complicates my job — and I know that for many other data professionals out there the situation is even worse.

If you’ve ever built a model from imperfect data and found that it only returned garbage, then you know exactly what I mean. Garbage-in-garbage-out is true. But incomplete-in-garbage-out is true, too. Sadly.

Many data scientists respond to this challenge with simple tools. Techniques like mean imputation and forward are easy to implement. But these techniques are just band-aids, not a cure. In many cases, they improve models but still lead to distorted results.

--

--

Data Science Collective
Data Science Collective

Published in Data Science Collective

Advice, insights, and ideas from the Medium data science community

Ari Joury, PhD
Ari Joury, PhD

Written by Ari Joury, PhD

Founder of Wangari. Sustainable finance & ESG-financial modeling. Get all articles 3 days in advance: https://wangari.substack.com

Responses (7)