Data Science in Spanish

Jana Thompson
IBM Data Science in Practice
4 min readDec 3, 2021
multiple colorful with distinct geometric patterns baskets lined up in rows
Photo by Ricardo Gomez Angel on Unsplash

While many programs and bootcamps exist globally for data science and analytics, the primary language of all such curricula remains in English. Since only 8.2% of the world’s population speaks English fluently enough to undertake educational efforts in English, the world of data science and machine learning remains accessible only to those who speak English as a first or relatively fluent second language.

6% of the world’s population speaks Spanish as a native language — a total of 483 million people — second only to Mandarin, and ahead of English in native speakers among global languages [1]. Creating resources for this language seems an overlooked need in creating greater access to data science curriculum and materials. In this post, I will cover some of the efforts being made by IBM and others to cover this crucial need and discuss briefly the perspectives of IBM Student Advocate Luis Geraldo Ayala Bertel.

In the community

During a recent conversation with Luis, we talked about his journey to learning data science and quantum computing through IBM’s available courses and support forums while he has worked as both a math teacher and studied mathematics at the University of Cartagena in Colombia. He has a deep commitment to education and community that has been reflected in his work as a math and accounting teacher in rural towns in Colombia and in his work as an IBM Student Advocate, coordinating learning and community among students at his own university and across Latin America.

Luis notes that learning English is critical in his educational journey. He would have limited access to much of the educational materials in data science that have fueled his passion for the field and his development as a data scientist, as well as community building online with community Slack and Discord channels and forums that lead to growth in knowledge and experience. While passing an examination in English proficiency is required for graduation at the University of Cartagena, he noted that for many, the language did not stick fully due to lack of practice or application in their own fields. This lack of functional fluency leads to many potential data scientists not being able to pivot to more technical roles in their careers in the way that so many Americans do throughout their working years, leaving Spanish-speaking countries with a gap in growth of their own local data science and AI communities of practice.

What exists

Former IBMer and current Kyndryl employee Julio Marcelo Ripoll began the Ciencia de datos en español in early 2020 to increase access to data science educational materials in the native language of his country. For Julio:

“I felt that study and work in other language was another “exam” to be completed to do my work better. It was like another barrier to be jumped to get our goals” [2]

Currently Ciencia de datos en español has sixteen tutorials available in Spanish, ranging from use of DB2 in IBM Cloud to creating voice agents in Watson Studio. Additionally, IBM currently offers three data science courses in Spanish on Coursera:

And five on edX:

Outside of IBM

While researching this topic further, universities such as UNAM have a data science program, while universities such as the Universidad Ricardo Palma in Peru are just starting a data science program.

an overhead view of multiple seedlings planted in pots lined up in rows
Photo by Markus Spiske on Unsplash

In organizations for data science there is also a lack of communities or resources outside of universities. The Women in AI community does not have a Latin American resource area or group, other than indicating that one in Mexico is “coming soon”. One bright note that I found was that the 2nd Latin American Conference of Women in Bioinformatics and Science was recently held virtually in September 2021, with much of their material available on YouTube.

Conclusion

Scholar Celia Medina Lloret noted in an early 2021 paper [3] that trying to conduct an AI study using Spanish resources was an incredibly difficult task due to the incredible disparity in scholarly articles and computational linguistic resources available in the Spanish language. While resources and communities are starting to grow in the Spanish language, more work is critical for accessibility for Spanish speakers who wish to study data science and AI globally. While IBM’s work with Ciencia de datos español is only one small piece of this effort, our Data Science Community is global and we encourage participants to both join and increase our efforts to make data science learning accessible to all.

Sources cited:

[1] https://www.languagemagazine.com/2019/11/18/spanish-in-the-world/

[2] https://community.ibm.com/community/user/datascience/blogs/julio-marcelo-ripoll1/2020/06/18/que-es-ciencia-de-datos-en-espanol

[3] https://www.openscience.online/pub/state-of-the-art-of-data-science-in-spanish-language-and-its-applications/release/1

--

--

IBM Data Science in Practice
IBM Data Science in Practice

Published in IBM Data Science in Practice

IBM Data Science in Practice is written by data scientists for data scientists to gain hands-on and in-depth learning and to read about inspirational applications and conceptual understanding for challenging topics in the field. Discuss and network: community.ibm.com/datascience