How a Simple Cleaning Framework Helps Startups Organize Data for Growth

Pierre DeBois
The Startup
Published in
4 min readAug 26, 2020

--

Getting data into a clean format can be the conflicted step in creating a data model. It is the lengthiest aspect of data hygiene, yet has a number of steps that may not be anticipated by a small start-up team. Startups that rely on software — be it a website, an app, or a platform service. All software and activity trigger a need to apply data hygiene to keep the platform — and business model — operating smoothly.

It can be daunting for a start-up to know where to best start with managing data. But keeping a few concepts in mind can help organize what to do to set up advanced analysis for regression or a tensor for TensorFlow.

Here are three general concepts for a start-up team to consider. The key benefit from each statement is to use it to frame the right questions and consequential tasks for data cleaning.

Clean data is identifiable to you.

This first statement means when you look at a data table you understand what the fields are meant to contain and can map how they are arranged. There may be an ID in each row that is recognizable. You may see duplicate entries that should not exist. In short, your knowledge of the subject associated with the data will drive the degree of data literacy needed for cleaning data.

Clean data highlight programming format and libraries

--

--

Pierre DeBois
The Startup

#analytics |#datascience |#JS |#rstats |#marketing services for #smallbiz | #retail | #nonprofits Contrib @CMSWire @smallbiztrends #blackbusiness #BLM