TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Automate Flash Cards Creation for Language Learning with Python

My experience using Python to support my long journey in learning Mandarin as a native French speaker.

Samir Saci
TDS Archive
Published in
7 min readSep 15, 2022

--

Flowchart illustrating the process of validating language learning flashcards in Mandarin using Python. The process starts with a translation of the word into Mandarin (超市 for ‘supermarket’), followed by a phonetic check (pinyin ‘Chāoshì’). If both translation and phonetics are correct, the learner proceeds to the oral test. If the learner’s oral pronunciation is correct, they move on to the next flashcard. Failures at any stage lead to review and retry.
(Image by Author)

Do you need help with learning a new language?

Automate your flashcard creation with Python and data analytics tools to make the process easier and more efficient.

Read on to learn how I have improved my language skills and stayed motivated on my language-learning journey using Python.

Introduction

Learning a language can be a long journey, so staying motivated and having clear goals to aim for are essential.

Because Mandarin uses a pictorial system of writing words and sounds called hanzi 汉字, learning it can be even more challenging for learners without a background in a similar language.

I want to study chinese
I want to study Mandarin — (Image by Author)

In my quest for Chinese fluency, flashcards have been my best ally in improving my reading and pronunciation.

In this article, I will share my experience using data analytics tools with Python to automate the creation of flashcards to support my learning process.

Summary
I. Use Python to Support Language Learning
1. Lessons Learned: The importance of using flashcards
2. A personal teacher on your phone with Anki
II. Create Anki Flashcards with Python
1. Extracting keywords from Emails with pywin32
2. Extracting keywords from PDF reports with PyPDF2
3. Final Results with Vocabulary lists, including translation
III. Add phonetic transcription with Google Translate
Google Translate API to generate the pinyin of chinese words
IV. Add audio transcript with Google Text-To-Speech
Improve your pronouciation with the support of Text-To-Speech
V. Next Step: Boost your Learning Journey with GPT
Adaptative learning experience using GPT and custom visuals
1. Generative AI: Boost your Learning Experience with GPT

2. Automate the Design of Nice Visuals
3. Conclusion

If you prefer, you can check the video version of this tutorial

Use Python to Support Language Learning

I am a French guy who moved to China to study engineering for a two-year double degree program.

Finally, I stayed for more than six years, and my main challenge was learning Mandarin for daily life and work.

Lessons Learned: The importance of using flashcards

The main mistake I made when I started to learn Mandarin was not following the advice of intelligent people promoting the use of flash cards.

Do you remember as a kid when one of your parents or tutor was holding your book to help you prepare for tomorrow’s history test?

She was asking you questions related to the lesson:

  • If you answer well, she can consider that you are ready for the test.
  • If you make mistakes, she will ask you to read the lesson again and return when ready.
Screenshot from the Anki flashcard app showing a Mandarin language flashcard. The card on the left displays the Chinese characters ‘你好!’ (Hello), prompting the user to guess the translation and pronunciation. The card on the right shows the answer with ‘Nǐ hǎo!’ in pinyin and ‘Hello!’ in English. This illustrates how Anki presents questions and answers to help users learn new words and improve their language skills.
Anki Flash Cards (Left: Question, Right: Answer) — (Image by Author)

Now, there is an open-source app for this, and it’s called Anki.

A personal teacher on your phone with Anki

In the picture above, you can find an example of a card to learn how to say ‘Hello!’ in Mandarin.

Step 1: Shows you the word in the Chinese character Hanzi

Step 2: Show you the answer with the following:

  • The pronunciation using the romanization system pinyin: nĭ hăo
  • The translation in English: Hello!
  • The oral pronunciation with an mp3 sound
Screenshot from the Anki flashcard app displaying self-assessment buttons for a flashcard session. The options include ‘Edit,’ ‘Again,’ ‘Good,’ and ‘Easy.’ The user can select ‘Again’ to repeat the card within 1 minute, ‘Good’ to see the card again in 10 minutes, or ‘Easy’ to review the card in 4 days. This system helps users track their progress and adjust the frequency of flashcard review based on their performance.
Your self-assessment — (Image by Author)

Step 3: Perform your self-assessment

  • If you guessed well, press ‘Good’: the card will reappear in 10 min
  • If you think that it’s ‘Easy’, Anki will wait 4 days to ask you again
  • If you did not guess well, press ‘Again’; the card will reappear shortly
Flowchart explaining the review process in the Anki flashcard app. The process starts with a flashcard, followed by the user’s self-assessment: ‘Easy,’ ‘Again,’ or ‘Good.’ If ‘Easy’ is selected, the card returns after 4 days. If ‘Again’ is selected, the card reappears after 1 minute. If ‘Good’ is selected, the card returns after 10 minutes. This structured repetition system optimizes learning and memory retention.
Review Process of Anki — (Image by Author)

Objective

To support your learning journey, you want to feed your Anki with thousands of cards and practise 2 hours per day during your commuting and dead times.

🏫 Discover 70+ case studies using data analytics for supply chain sustainability🌳and business optimization 🏪 in this: Cheat Sheet

Create Anki Flashcards with Python

In this section, I will explain how to use Python to build these cards with…

  • Common words or sentences for daily life or work
  • Add the phonetic transcription using a Python library
  • Add an audio transcription using Google TTS API
Diagram showing the flashcard creation process for language learning using Python. Data is sourced from social networks, Excel files, emails, and PDF documents. The data goes through processing, followed by adding phonetics and pronunciation. The final output is a flashcard that includes both written and audio components to enhance language learning. This process demonstrates how Python automates the creation of personalized flashcards.
Flash Card Creation Process — (Image by Author)

This framework can be applied to any language, not only Mandarin Chinese.

As a foreigner working in China, my main priority was to have a basic vocabulary to communicate with my colleagues.

Extracting Keywords from Emails with pywin32

Because my first objective was to read emails in Mandarin, I planned to extract the most frequently used words in the emails in my Outlook mailbox.

Using the code below, you can extract the body of all your emails and store them in a list.

Extracting Keywords from pdf reports with PyPDF2

Some reports and documentation I received from suppliers can be a good source of technical words.

Therefore, I have built this simple code to extract the text from any PDF report.

Extracting Keywords from Excel Files with Pandas

Another main source was the monthly financial reports in Excel that can be processed using the Pandas library.

Final Results with Vocabulary lists, including translation

After processing, I get a list of words like the one below

Example of Chinese Vocabulary List — (http://samirsaci.com)
List of vocabulary extracted from the operational document at work — (Image by Author)

Add phonetic transcription with Google Translate

You need a phonetic transcription to practise your pronunciation and get the right use of the tones.

Example of Vocabulary List with Phonetic Transcription — (http://samirsaci.com)
Add Phonetics Transcription — (Image by Author)

I use the jieba library for Mandarin, which takes the Chinese characters and returns the phonetics transcription (pinyin).

You can find a library for your language.

For instance, you have fonem for French and epitran for Italian.

Add audio transcript with Google Text-To-Speech.

You want to add the pronunciation to each card to improve your speaking ability.

There is a solution for this using the GTTS library.

This Python library and CLI tool interface with Google Translate’s text-to-speech API.

You can find more details and instructions on using it in the official documentation.

Next Step: Boost your Learning Journey with GPT

Generative AI: Boost your Learning Experience with GPT

In November 2022, OpenAI released the first version of ChatGPT.

A diagram showing the relationship between Excel file input, data analysis, and communication processes. It depicts a workflow automated with a custom GPT where an Excel file leads to the automation of decision-making processes for supply chain optimization. The output involves a GPT called Supply Chain nalyst performing classification tasks, with the final result displayed in an understandable format for communication.
Example of GPT agent I designed for Supply Chain Analytics — (Image by Author)

Generative AI is an opportunity to bring additional intelligence to manage flash cards' creation and order of appearance.

Let’s imagine a learner that interacts with a GPT agent

  • User: I would like to improve my vocabulary skills for accounting.
  • Agent: Generates flashcards using the scripts designed in this article and follows the user's progress.
This diagram outlines how the custom GPTs for Supply Chain Analytics work. It starts with the user asking a question or requesting an analysis, the agent retrieves data (from the provided dataset or sample), processes it with a core Python script, and returns output as charts or comments. The flow clearly illustrates three key steps: initial prompt, data processing using the script, and final analysis outputs that is used by “The Supply Chain Analyst”.
Examples of GPT agent architecture — (Image by Author)

Instead of relying on the hard-coded logic of Anki, we can exploit the intelligence of LLMs to adapt the learning path to the student’s level.

💡 If you want to boost your learning skills with GPT,

Automate the Design of Nice Visuals

Use Python Pillow to automate the creation of graphs, visuals or illustrations to feed your report.

Animation with hundred Warehouse labels with a large yellow upward arrow pointing toward the storage location generated with Python Pillow. It contains a barcode, SKU code 6090761375244, and icons indicating that the stored items are boxes, pants, and should be handled with care due to liquid content.
Example of Graphic Design — (Image by Author)

For example, the labels above have been generated automatically with a Python script.

This method can create illustrations of words to boost your memorization process.

💡 If you want to create your own visuals,

Conclusion

Now you have a list of words or sentences with the translation in English, the phonetics transcription, and a short mp3 audio with the pronunciation.

These cards can be used to practise your…

  • Reading Comprehension using the translation
  • Pronunciation using the phonetics transcription
  • Oral Comprehension using the short audio
Language Learning Process with Anki Flash Cards — Samir Saci
My Leaning Process — (Image by Author)

Apply the process presented in the visual above, and I promise you will see improvements in your language mastery with Python!

About Me

Let’s connect on Linkedin and Twitter; I am a Supply Chain Engineer using data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

💌 New articles straight in your inbox for free: Newsletter
📘 Boost your Productivity with Data Analytics: Productivity Cheat Sheet

--

--

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Written by Samir Saci

Top Supply Chain Analytics Writer — Case studies using Data Science for Supply Chain Sustainability 🌳 and Productivity: https://bit.ly/supply-chain-cheat

Responses (2)