Search automation in Google Translate: download translations with Selenium
Link to this article Portuguese translation here.
In a previous article of mine (link here), I developed a program that automatically opens the Google Translate website and translates a specific text to another language. However, the presented code had a significant limitation since there was no functionality to copy the translation result to use this data somehow later.
We will add such a resource by using the Selenium library, one of the best available libraries for web scraping and automated tests. Before we start analyzing our code, an important note: to use such a library, besides the customary command pip install selenium, one needs to download the geckodriver file to the local machine. This file should correspond to your Chrome browser version and be from the same operating system used by your computer (Linux, Windows, or macOS). For more details about these procedures, please consult the page below from the official documentation:
Another important detail for Windows users: you should not run the chromedriver.exe file (which is the geckodriver name to the Chrome / Windows setup). You should just save it in your machine, that is it. My geckodriver, for example, was placed on my Windows 10 partition desktop. This kind of error is relatively common, and that is why I mention it here. I have myself already suffered from this problem. Please remember where you saved the geckodriver file since you will need to reproduce its absolute path later in your code.
FIRST CODE MODIFICATIONS
Below I present the program part that is the same as the one in my previous article, with the extra import statements we will need now. One should also emphasize that the old open_google_trans() function will be called search_google_trans() from now on. If you have already read this other article, suit yourself to skip this code reproduction and move on.
Now, we will use Selenium to open the Google Translate formatted URL, which is saved in the link variable. So, we need to create a Chrome browser instance and call the get() method on this instance. We will also ask the program to pause for 15 seconds. The lines below do just that. Remember to change the executable_path argument value by using the geckodriver file path saved in your machine.
The program will open the Google Translate website and show both the original text and the respective translation to the chosen language.
Now, we will locate the copy translation button on the page and click on it using Selenium so that the translation is copied to the clipboard (install the following library too: pip install pyperclip). Next, we will paste this translation into a variable, and we will save this variable value into a .txt output file. All these steps are carried out by the following code:
Although my complete program ran smoothly during tests, it is always a good idea to write error-handling code when working with Selenium since problems might occur with a considerable frequency while running a web scraping. In that case, I leave it to you, dear reader, the challenge of making this improvement, if you consider it necessary.
CREATION OF TWO AUXILIARY FUNCTIONS
It is possible to improve the search_google_trans() function code by isolating some parts that could constitute independent functions. Here, two code sections stand out as potential candidates: the input type check, on the one hand, and the process of generating the output file and saving it in the correct folder, on the other.
Therefore, we will create the functions check_input_type() and save_output_as_txt() and subsequently call them within the search_google_trans() function. Besides, the search_google_trans() parameters sequence will be changed to facilitate future functionalities implementation, as well as some new parameters will be added so that we can use them later. Such code changes are reproduced in the gist below:
TRANSLATE TEXTS WITH MORE THAN 5K CHARACTERS
Now, we will move to the code implementation part that automates the translation of a .txt text with more than 5 thousand characters, which might be a helpful resource for some users. In order to do that, we need to create a system to break texts into smaller units and then call the search_google_translate() function to translate each of these text chunks. Thus, we present below the function translate_any_text:
In the link below, one will find the web scraping complete code. I don’t know about you, but I spent a considerable time looking at the Japanese and Korean final translations, amazed by these languages’ beauty. Maybe someday I will have the courage to learn a little of them. Who knows!
Thank you so much for having honored my text with your reading.
P.S.: You will find more info about my work on LinkedIn, Medium, and Github: