Published in


Why data collection is like baking

To care about data — first, think of a cake…

We sat down with data scientist, Juan Pane who led the team that we worked with for our COVID-19 Data Taxonomy for Pandemic Preparedness project. Juan made a perfect analogy about data and cake. Here are 5 key takeaways from our conversation with him:

Photo by Brands&People on Unsplash

Takeaway 1: Think about data as the ingredients for a cake — it might make you care more about data collection.

You can look at data as ingredients for a cake. The better the ingredients are, the better cake you will have. It is impossible to make a delicious cake if you use rotten eggs. We forget that eggs, the flour, and all of the ingredients were produced by someone. Producers are an essential part of the process and we need to care about who they are and how involved they are in making the ingredients. Applying this to data, we need to care more about data collection, about data cleaning and about data modeling.

Takeaway 2: Data is the means to an end, not an end in itself.

When we take on the role of data producer, we make the mistake of focusing only on the problem itself. Even in the open data community, we tend to forget that the data is not the end — it’s just a means to an end. Later, the data will be used for something else. Just like in the data-as-ingredients-for-cake metaphor, think again of a raw egg or the flour. You can consume these on its own, but in order to gain real value from it — it needs to be mixed and baked etc. So the more experienced you are in working with data, the more value we can get out of it.

Takeaway 3: Data literacy among governments requires a shift in focus — from documents to data.

One of the biggest producers of data — though this is not always true — is the government. Especially in Latin America, there is a huge gap in how public servants manage or deal with data and even produce data. That’s where we as computer scientists can come in — teach and guide them on how to work with data. Once people learn, the focus changes a lot. Public servants are used to producing documents in their work. It’s a good start to switch their focus from producing documents to producing data.

Takeaway 4: Accept your data capacity and learn to manage expectations.

Data knowledge and capacity is something that has to be properly communicated in order to avoid people having huge expectations. The downside, which I’ve seen with other projects, is that people want to get to an advanced state of implementation, because they want to be the best. There are countries with a lot of resources who are trying to be the best, but then they don’t even have a basic data infrastructure. Before any project, I set my expectations low, so I don’t get disappointed. I expect a lot of questions and a lot of basic implementation that could go wrong at the start. With more than six years of experience with data, this is unavoidable. I’ve learned that implementing data systems is hard.

Takeaway 5: Have a spectrum of perspectives, not just one.

One interesting insight from this COVID-19 Data Taxonomy project is the usage of the ethnic variables. We cannot define the needs or the future based only on one particular point of view. We cannot have the views of the global north without sharing the views of the global south. We always need a mix between them, especially when we are building standards. Different countries have different cultures and therefore have different needs and perspectives when analyzing their data. This variety of perspectives is necessary when we are dealing with data and especially when we are dealing with things that could potentially impact our communities.

If you’re a government with advanced open data capabilities, we encourage you to explore these cards and make use of them. If you are a civil society organization, you may learn from these data cards (in English or in Spanish) to demand this type of data standardization from your governments.

This work was started before the efforts to vaccinate began all over the world. If you would like to help us expand it with the latest data and information, we’d love to collaborate! Please e-mail us at




Learn how we are working towards a culture of open and responsible data use by governments and its citizens.

Recommended from Medium

New York Stock Exchange Price Prediction

Slowly Changing Dimension In Hive — DataWareHouse

How you can use Benzinga’s Python Library to get access to financial and news data?

Employee Stock Options in Python — Part 2; Predict Employee Stock Option Values using Basic…

Is Java a Good Fit For Data Science?

Knowledge Discovery likes Sports

What I’ve Learned as a Data Scientist

Computer Vision for Beginners: Part 2

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Open Data Charter

Open Data Charter

Collaborating with governments and organisations to open up data for pay parity, climate action and combatting corruption.

More from Medium

Data is key for care policy advocacy

2021 in Review: Advancing the Third Wave of Open Data in A Responsible Way

Mastering The Challenges Of Data In Local Government | Peak Indicators

Q&A: Robin Brewer on Machine Learning & Disabilities

Hand-drawn illustrated portrait of Robin Brewer, a woman of color wearing a red scarf