A Good Data Model Starts Humbly

3 min readMay 9, 2018

A good data model, the kind that makes me wet my pants for jealousy, grows up naturally. Full disclosure, I’ve never wet my pants over data models, brilliant or otherwise. I’m just saying I get excited when someone’s gotten it so right they are at the pinnacle of their discovery. My envy leads to learning about their algorithms and data pipelines and try to mimic them. That’s backward.

A better way is to start small. If you can start a model on a napkin over lunch, you’re probably on track. On a napkin I can see the shape my data might take and how it generally works. Am I making intuitive leaps? Have I over complicated things? Do I actually have this data or know what data I have? If I had to do this without sophisticated systems, could I pull it off?

Ray Dalio started his market predictions with an HP calculator and a composition notebook. He incrementally improved his work until it made him and his clients at Bridgewater Associates a lot of money.

Once you have an idea of what to do, see if you can explain it to someone. Are they skeptical? Can they see problems? Is this model interesting, timely, and relevant?

Next, use basic tools like a spreadsheet or a Jupyter notebook to build something. You’ll probably explore the data a little and copy standard solutions from tutorials. You’re not inventing anything here.

Finally, start to build the pants-wetting insight you know you can do. Take incremental steps and compare your results with your cheap model’s predictions. You’ll eventually get into larger systems. You’ll automate some of the feature engineering and validation work. You’ll build interfaces for your models so other systems can use them. Now is the time to explore interactive data visualizations.

The experience isn’t taking a few small steps and then take a giant leap into a large system. You have a working model that’s simple. You’ll make incremental steps. Agility is working from where you are. Use agility.

In a recent TWIML & AI episode, John Bohannon discussed how he developed deep insight on hard problems. Bohannon’s approach isn’t different than the one that’s worked for me; only his results are of the pant-wetting kind. He and his team have been able to figure out powerful ways to map large amounts of data into useful insights at Primer.ai.

When things are small, there is more room for data exploration, feature engineering, and baseline models. Once a model starts to fulfill a role in the organization, I hesitate to revisit the first steps. With a shaky start, who knows where a rocket will go?

The time between delivering steps on a model is time for imagination. What if I could enhance that data? Why not estimate the best I can expect from my work before I do the work? Do people want the work I want to do? These kinds of questions come up when I’m working on a model without rushing it.

With a baseline model and people that care about the results, I still like to inject days into the development of a project. It’s better to put something together and let it train or sit overnight. In the morning I write down what the model is doing. In the afternoon or evening, I come back to it or a different one.

Take it from a person that’s gotten it wrong quite a few times; you can develop a workflow that increases your chances of delivering your best work.

A Good Data Model Starts Humbly

Written by David Richards