How I learn vocabulary š
Hi everyone!
In my previous company, I took English classes twice a week. Our teacher forced us to learn new words. She usually started our classes with checking our vocabulary. She asked us to translate random words from our topics which we had learnt before. But it was hard, because we didnāt have time to memorize them. Therefore, I bought a vocabulary notebook and started writing down new words in there. Over time I realized that I started to forget about words from that book. It just wasnāt convenient for me to take the book everywhere and everywhen, to find the forgotten word and to write new words down neatly.

The vocabulary notebook in 2019? Are you serious? Where is your device, dude? Exactly! I could use mobile phone, but I didnāt want to set up a new application for this purpose. Probably for that reason I like my profession. Each developer can design everything he needs for himself and I decided to create a Telegram bot.
The idea for the start was simple ā a dictionary application. It is a piece of cake for any developer, isnāt? In just a few minutes my bot was made.

Whenever I wanted to practice my vocabulary, I sent the command /ask to the chat. The bot started asking me, like a teacher, random words from my dictionary, which Iād sent to him before and checked my translation or vice versa. If I didnāt know the right answer, I could send my typical face in similar situations ā:(ā and would get translation in the next message. Also I could always ask about forgotten words or add new ones. That whole process reminded me of funny member berries from South Park episode. I called my bot the same way, and he has never upset me since then.

I added a few words from my vocabulary notebook and started to use my bot on my way to work. Subway, traffic jams - on the average 30 minutes twice a day ā good practice. But, over time, I became tired of the same questions. My bot didnāt see the difference between words, which Iād already learned and words, which I couldnāt recall. There were situations like this:
Do you member Дircumstances?
ŠŠ±ŃŃŠ¾ŃŃŠµŠ»ŃŃŃŠ²Š°
Of course!Do you member Дircumstances?
ŠŠ±ŃŃŠ¾ŃŃŠµŠ»ŃŃŃŠ²Š°
Of course!Do you member Дircumstances?
ŠŠ±ŃŃŠ¾ŃŃŠµŠ»ŃŃŃŠ²Š°
Of course!ā¦
Iāve already learned this word. Please donāt ask me that again and again⦠but still remember and continue check it less often.
See /help for instructions.
Little idea popped into my head. The Bot could choose the next word depending on the mistakes which Iād made. It would ask me those words which I couldnāt memorize or always made mistakes in. The probability that the bot would ask me unknown words needed to be increased. For that, I collected the next statistics for each pair of words and keep it to my Firebase storage.

I decided to keep all statistics by sessions. Session is the time interval between the first asked word and the first asked word after long pause (~20 min). I hoped that it would allow me to analyze the ideal time period for user to learn new words effectively because I was thinking that if the session was too long it could be quite difficult for user to memorize more words.
When I deployed my application on Heroku I was interested to know how to pass data from python backend to js frontend on condition that only backend knew about data. I supposed that it wasnāt the correct way, but it was possible. Just for fun Iāve created the simple react application with graphics which shows some general statistics and progress in the last session. By default this site shows my data but you can insert telegram userId after the host name.

Each time when the bot asks me the next word it filters the words, which have been asked before, out by the time interval. Thanks to that, bot knows the words which it has already asked recently. I choose 7 (just magic number) random words from all vocabulary for the bot not to ask the same words twice. It allows the words Iāve learnt before to be on the list when the bot starts to ask.
After that, Iāve analyzed statistics on these words and calculated probability of making mistakes. Here numpy helps me. I use random.choice method which can apply p argument for probabilities associated with each entry in my array. And as a result ā I choose next 7 words based on mistakes Iāve made.

Since then the number of my sad faces ā:(ā in our chat has been increased. Now the Bot knows which word it can ask me so I make a mistake. It asks those words which I canāt memorize or always make mistakes in. Iām sure thatās the way to stimulate your brain to memorize a lot of information.
Do you member Indefatigable?
ā:(ā
Remember that: ŠŠµŃŃŠ¾Š¼ŠøŠ¼ŃйDo you member Dilatory?
ā:(ā
Oooh, remember: ŠŠµŠ“Š»ŠøŃŠµŠ»ŃŠ½ŃŠ¹Do you member Abnegation?
ā:(ā
Keep in mind that: ŠŃŃŠµŃениеā¦
I know nothing. Bot, please ask me something, what I know.
Дircumstances for example. Normally talked!
Ho-ho-ho!!! Only if you get lucky (my Дreator range your vocabulllary by magical 7 number)
Since the first version of my chat Bot, Iāve started reading the book in English. After a few pages I realized that searching those words in the dictionary and sending messages to my bot took a lot of time. I skipped about 10 new words per page because I couldnāt waste my time. I started to underline each unknown word for me in the book to check what it means later. But I was always so lazy to send all underlined words into the chat. Instead of that I was always thinking that simpler way had to exist.

To be a developer ā itās a kind of superpower. Iām serious. You can create everything that you need and itās amazing. The new challenge, which appeared in front of me, was named Optical Character Recognition. Fortunately, that task was solved and I could use pytesseract library. All that I needed for that was to crop my photo with underlined words.
The solution for me was searching underlined lines with OpenCV help. I filtered my photo colors by blue color borders and got the dark mask and then found hough lines in the picture. I created a pandas data frame, grouped the frame by coordinates to line my page by strings and found coordinates of the start and the end for each word on the string. The font size and horizontal distance between words were found approximately.

Iāve colored different words on purpose, for convenient checking. After cropping I got the set of photos with underlined words. All that I need to do was to use pytesseract to get an array of words and use translation api for each item in the array. Iām showing the recognized and translated words pairs. I can copy the necessary pair and keep it in vocabulary after sending into the chat.


I know itās not a brilliant result. I crop my photos by approximate coordinates. Photos can be taken from different books and their quality will never be ideal. I canāt recognize the word wraps and itās always hard to get full translations. Besides, unfortunately pytesseract sometimes canāt recognize words on page band.
But this simple solution is enough for my start. For your start on this way, check the link on GitHub at the end of story. I believe that the distance between underlined unknown words will always be short. Besides, if the word couldnāt be recognized, it would be found in the dictionary.
Iāve come across a few problems, which I must mention
- The quality of photos is worse when the Telegram compresses them. In the most cases, my local recognition worked better than in the chat. I discovered that the size of compressed files is smaller and therefore, the quality was decreased (3mb -> 100kb).
- Heroku doesnāt allow using local storage by 12Factors. It means that you canāt check temp files which were created in request time. For debug of all files, I was forced to use aws s3 buckets.
In conclusion
It was a good experience. When I started, I didnāt even think to use firebase, openCV, pytesseract, flask, aws services and deploy all of that to heroku. I like my bot, I like python, telegram and my profession. I highly recommend developing everything which can help you in your daily life and share it.
You can take a look on my github project:
https://github.com/Squirre1/words-book-bot
And start to learn your vocabulary:
@words_book_bot
Thank you for attention!
Good luck in your learning!
