ODELSA uses Kedro for Social Good

A non-profit organisation shares how it is benefitting communities across LatAm using Kedro

--

Jo Stichbury, Technical Writer, Yetunde Dada, Principal Product Manager, QuantumBlack; Lais Carvalho — Developer Advocate

This conversation with Carlos Gimenez, founder of Open Data Science LatAm (ODESLA), is part of a global series to understand how Kedro is used around the world.

Carlos Gimenez is a pioneer for data science best-practices. He recognised how Kedro could benefit his teams soon after it was open sourced; and, introduced Kedro while he was at Naranja X and to his own organisation, Open Data Science LatAm (ODESLA).

Kedro is an open source Python framework that helps Data Scientists create reproducible, maintainable and modular data science code. Kedro is built upon a collective knowledge gathered by QuantumBlack, whose teams routinely deliver real-world machine learning applications as part of McKinsey. In this article, we won’t explain further what Kedro is, but you can read our introductory article to find out more, and find links to other articles and podcasts in our documentation.

ODESLA’s first meeting with its volunteers in 2019. Photo by Carlos Gimenez.

ODESLA is a 100-member, non-profit organisation that uses public open data to design machine-learning solutions for community social challenges. It supports a network of Data Scientists that want to apply their skills for social good. Their mission focuses on humanising and democratising the transfer of Data Science knowledge in Latin America.

Based on actions such as the inventory of open data for the Latin American Data Science community, ODESLA has started two main projects, referred by the group as challenges, that range on solving different problems in the community.

The first challenge used the distribution of lawsuits, provided by the Autonomous City of Buenos Aires electronic file system, to pinpoint certain courts that were overloaded with cases. A lack of visibility over case loads caused staffing and resource constraints. The idea was that cases could be rerouted to equalise the workload across different courts.

The second challenge focused on gender-based violence, and it aimed to classify forms, types and patterns in real cases of domestic assaults. Fifty volunteers set up a workflow which used written reports of incidences from five different sources to classify the (text) data and assign weights for the risk-cases, the type of violence, etc.

ODESLA has also played a role during the global pandemic. The team introduced data science tools that helped the general public monitor information infected, deceased and recovered people in each country.

We spoke to Carlos about ODELSA and how Kedro has fit into his workflow.

The conversation has been edited for length and clarity.

What do you think about Kedro?

Kedro represents the evolution of Data Science frameworks. It is the normalisation of complex parts of a Data Scientist’s job and a guide for better practices. It’s similar to LateX, which is used to create quality scientific documents. It eliminates worries about code structure and allows us to focus on content creation.

Kedro helps us focus on the problem with the sureness that we are creating something maintainable, reproducible and scalable.

Why are you using Kedro at ODESLA?

We previously used Cookiecutter Data Science. When we picked up Kedro, I found the concept of nodes and pipelines very innovative. We kept using Kedro because it let us standardise projects, made things easy to maintain and allowed the teams to adapt and reuse project configurations and modular pipelines on different use-cases.

Carlos is referring to the ease of creating projects with Kedro-Starters, a customisable, template project that can be adapted to different needs.

In practical terms, how does Kedro help you extract information from data sources?

Kedro has a way of organising data sources, called the Data Catalog. The feature allows us to update and share data sources within the team without the need for hardcoding, even when the project is in its early stage.

How we organise datasources with the Kedro Data Catalog. Photo by Tobias Fischer.

What would you say is the most-useful feature of Kedro?

I like how credentials for loading and saving data sources are managed (credentials.yml file under the conf folder) and how you can use Jupyter Notebooks and the Data Catalog together. The visualisation tool assists during team debates because it shows the logic of the solution. Sometimes, we even use it to debug our pipelines and make sense of where the project is going during the development phase.

The automatic development of documentation (using the CLI command kedro build-docs — based on the Sphinx library) is something that has saved us a lot of time since it allows the team to focus upon the solution without neglecting the readability of our codebase.

The “visualisation tool” Carlos refers to is Kedro-Viz, one of Kedro’s out-of-the-box plugins, that shows the structure of a Kedro pipeline.

How is Kedro used across your organisation?

Kedro is the standard for projects at ODESLA. We have more than 100 volunteers working on our data science challenges and Kedro is the only reason that we can work together. For example, our challenge that focuses on identifying types and patterns within gender-based violence is a collaborative effort of around 50 data scientists. They all work from their own laptops. Collaboration would be very challenging without the use of Kedro on this project.

Whenever we introduce Kedro to new joiners, we do it in the context that better practice enables seamless implementation.

I believe that best practice and Kedro go together.

As a Data Scientist, can you compare life before and after Kedro?

Before Kedro, I used to invest a lot of time explaining how my project was set up to new team members. Also, because of modular programming, I had to define ad-hoc structures, and this changed between different projects. This all added a considerable time overhead to my projects.

Now, we start a project directly with kedro new, and the conversations focus on the problem. The larger the project, the more critical it is to invest time in configuration, where one can see a significant differential on using an initial template with the boilerplate structure.

And finally, in your opinion, what is the value that Kedro brings to a data scientist’s life?

Being open source allows Kedro to be a reference for Data Science frameworks and encourage best practice on Data Science projects. Overall, it is a big win for our community that we have access to such a powerful tool.

Thank you, Carlos for such an informative interview! We are excited to see Kedro used by Data Scientists working on social projects around the world.

Want to know more about ODELSA? Check our their website and Data Science challenges on GitHub.

Share your use-case

Now it’s your turn! Have you used Kedro on an exciting project? Do you have a use-case that you would like to share? Or, do you have feedback on Kedro? Make sure to let us know, and we will contact you for more details.

Want to contribute to Kedro? Check out our Community Contribution Guide and make your Pull Request.

--

--

QuantumBlack, AI by McKinsey
QuantumBlack, AI by McKinsey

We are the AI arm of McKinsey & Company. We are a global community of technical & business experts, and we thrive on using AI to tackle complex problems.