Quick & worldwide: how we accelerated Doc&Loc releases and updates for 34 localizations

Published in

Kaspersky

13 min readFeb 29, 2024

Our consumer mobile products are unique in that they are distributed in over 100 countries in 34 languages — possibly a record in the Russian IT industry. Typically, companies tend to have just a few products translated into a dozen or so languages, but we have a whole range of flagship products translated into all 34 languages. And of course, if our documentation and localization (Doc&Loc) team localized products for each region individually from scratch, without any optimization, we probably wouldn’t have set any records.

My name is Nikita Avilov, and I’m a technical writer in the Doc&Loc Mac & Mobile group at Kaspersky. In this article, I’ll explain exactly how we’ve organized the work of our team, as well as cross-functional collaboration with other departments, to efficiently roll out our products to such a large number of locales.

Where does localization begin?

One of the key tasks of a technical writer is to create GUI (Graphical User Interface) text that is both human and selling. Often, users feel like it’s not enough for an application to simply be functional and visually appealing. They need to understand why certain features are useful, what certain buttons do, and so on. Technical writers have to write a text that builds users’ trust in the interface Therefore, they should try to speak to a user in clear, easy-to-understand language, avoiding technical terminology, complex syntax, and so on.

Our first trick is to use English, rather than Russian, as the source language. With double translation (Russian -> English -> local language), not only does the time and cost of translation increase, but there’s also a risk of the meaning being distorted. But English is more universal; when translating it word-for-word into other languages, the text’s meaning is preserved as much as possible. Moreover, we can then be more creative when it comes to localizing into Russian, because we don’t need to worry about subsequent translation into other languages.

Regarding interfaces: in collaboration with our UX (user experience) designers, we write text straight onto mockups in order to get an idea of the size of the text on the screen and decide on the formatting. The experienced eye of a technical writer or localization engineer also evaluates the mockup for localizability — how the text will look in another language. For example, German words are often very long, and some screen fields might have character limitations — so they need to assess whether the translation will fit.

After these procedures are done, the text is reviewed by a cross-functional team, usually comprising a product manager, UX designer, system analyst, and marketing manager. Finally, after all the wording is approved, the technical writer enters the text into the strings already set up by the developer in the repository and hands it over for localization.

In localization, CAT (Computer-Assisted Translation) systems are particularly helpful — automated translation systems that parse the incoming English text from the repository into segments in the left column of the editor and, in the right column, provide ready-made preliminary translations from the TM (Translation Memory), which contains translations from past iterations. Naturally, this speeds up the translation process.

Previously, we used offline CAT tools like SDL Trados or Passolo; we would collect a package of strings in proprietary formats, import them into the CAT tool, perform the localization, export the package, unpack it, and finally generate and upload the translated strings to GIT. But with the frequency of our releases, this approach was too slow and inconvenient, so we switched to the cloud solution Smartcat. This program lets us directly load files from GIT via the Syncer utility. By the way, we’ve also automated the loading of files to Smartcat using our internal Syncer service. This lets us flexibly configure the processing of files for translation and the list of localizations.

What’s more, traditional CAT tools operate on the basis of one license per end-user. Each license is purchased separately, making it more expensive than a paid corporate account in Smartcat. Additionally, a localization engineer might be left without access to the CAT tool at a crucial moment, if all available licenses are being used by colleagues.

The CAT tool stores hundreds of project Translation Memories (TMs) and keeps them up to date. In certain situations, technical writers search these TMs for similar texts in related projects — this also saves time and expenses when localizing new texts. Moreover, the uniformity of our texts across different platforms improves the user experience.

To localize GUI texts, we outsource to a separate pool of professional, carefully selected native linguists. Their work is closely monitored by members of our department, most of whom are proficient in more than one language. Our Doc&Loc Vendor Management team is dedicated to the task of selecting, testing, and interacting with vendors (contractors), including negotiating and executing NDA agreements.

Online help — development, robots, humans

When it comes to the centralized development of documentation, we use the content management system, Author-it, to write online help guides for products. It allows us to flexibly customize text styles, leave comments, use different versions of articles, and export documentation to HTML, DOCX, and XML formats.

After we write an article in Author-it, the draft in English is given to a native English speaker for proofreading and to a system analyst for compliance checks. In some cases, a tester is also involved in approving the document. The English version is posted for review on Stage, an internal portal for user documentation, before publication on the production server. The checked help article is approved by an analyst and the product team as necessary, after which the English document is sent for localization according to the process described above.

So, the English version of the online help article is now ready and has gone through all the approval stages. You might think that the next step would be to send the texts to agencies for professional linguists to prepare high-quality localizations, considering the specifics of each language.

However, such traditional translation would require much more time and money than our current approach. We first use machine translation (MT) from English using a CAT tool. This version is then sent by the localization engineer to linguists for post-editing. By way of comparison, a new 1000-word online help text requires approximately a week for a linguist to translate from scratch. With our approach, the edit is completed within 2–3 days, and then it’s back in the hands of the localization engineer.

Again, keep in mind that we’re not just talking about simple Google Translate here — we use its model trained on our texts. What’s more, MT allows for the high-quality pre-translation of new segments, since documentation typically uses standard terminology. Machine translation has another advantage, often overlooked: it eliminates typos, which can’t be guaranteed in traditional translation. I repeat, all of this saves significant time and money.

Hash coordinator, or how to make sure the buttons don’t get mixed up

Localizing technical documentation is quite labor-intensive, not because documentation text is longer than interface texts, but because the names of GUI elements in the help guides must exactly match the names in the application’s interface. If a user opens a help guide and doesn’t see the same elements as in the interface, they won’t find the answers they need and are likely to turn to technical support. The primary purpose of the online help guides is to assist with complex and atypical cases, reducing the burden on technical support by lowering the number of inquiries.

As I’ve already mentioned, manually processing 34 languages is out of the question, so we’ve also automated the process of checking terms in localizations using Hash ID. This tool assigns each interface string a unique identifier, which is inserted into the document and then replaced with its value from the application’s resource file with the string. For this, the technical writer first generates an Excel table with the GUI element names and their hashes in square brackets. Then, they insert the ID for each element into the help topics in Author-it. After this, the help article is exported to HTML format and sent to localizers.

Hash ID allows us to more effectively use MT (machine translation based on our trained engine) for pre-translating new segments. MT and Hash ID complement each other when writing technical documentation — whether it’s a machine or a human translating a button or section name, with Hash ID, incorrect translations are avoided. Moreover, using Hash ID has another advantage — it not only significantly streamlines and speeds up the process of creating help articles, but also eliminates the human factor — the likelihood of errors or discrepancies in references to GUI elements.

The next step is the same as when translating GUI texts: the localization engineer sends the file, pre-translated from English, to our vendors of linguistic services.

Having received the edited document, the localization engineer downloads it, replaces the Hash ID in the localizations using the utility, rebuilds the help guide through AIConverter, and uploads it to the help release folder in Stage. AIConverter is our utility that updates the help guide layout and configures correct text indexing in the search.

Where would Doc&Loc be without documentation

Any digital product is always accompanied by various legal documents: user agreements, usage policies, and so on. And inevitably, these documents must be prepared right before the release, when everything is already in turmoil. However, we’ve found a way to optimize this process too.

For convenience, we’ve divided it into two stages: preliminary translation and document finalization.

At the first stage, lawyers draft the legal documentation, and we take care of proofreading, document verification in English and Russian, and translation into the required languages. One nice thing about this is that we’ve had all the ready-made templates for a long time, and it’s rare that a completely new document is composed. That’s why our tasks usually involve making changes to existing documents, so the process isn’t as lengthy as it could be. Systems analysts know about these additions in advance and create a preliminary translation task for the localization engineer.

Upon receiving the preliminary translation task, the localizer uploads the document with data to Smartcat and sends it for translation to agencies that work with legal documents. One peculiarity of translating legal text is that it’s done in two stages: translation and mandatory editing. The contractor must complete both stages to submit the work. It goes without saying that legal support is a very delicate matter that requires meticulous attention. We save the received translations in the corresponding TM, after which we close the task.

Next, an analyst begins to finalize the documents. For the Doc&Loc team, this is a priority task — not just because the deadline is so tight, but also because elements requiring special attention might arise. If new data appears that must be transferred, it needs to be sent in English for native reviewing. If amendments are made to the document templates, the analyst must be informed about this in order to adjust the translation in Poseidon — a system which analysts use to work with data flows.

Next, a technical editor or writer proofreads the final documents in English and Russian. Then, all the automated tests are run, and if necessary, corrections are made. Section B, which contains lists of processed data, must be checked to ensure the correctness of the formatting and number of bullet points with transferred data in both languages (as during composition, some bullet points might accidentally merge). By the way, it’s very convenient to do this in Beyond Compare, a program for identifying differences in folders and text files.

In the next step, the writer converts the agreements into HTML format, checks all the documents for formatting, working links, copyrights, and the absence of typos, and uploads the finished HTML files alongside the RTF ones.

Finally, the localization engineer, upon receiving the task from the technical writer, provides the translation of the final documents. Usually, translations for new strings are already in the TM from the preliminary translation stage, and the engineer just needs to assemble the final localization documents from the prepared segments.

After substituting the preliminary translation, the localizer exports the finished documents to the TFS folder and runs autotests on the localizations. If there are no errors, then the task is completed.

Auto-testing interfaces of all localizations

Screenshots have become our optimization booster, allowing us to promptly load accurate GUI texts in all languages into the final build. Once the localization engineer receives the finished translations from the linguist contractors, they download them from the CAT system into the working branch and take screenshots for the task in all languages using an auto-screenshot tool. The tool allows us to obtain application screenshots based on specified parameters: class of a screen, localization, screen settings.

Autoscreenshoting screen settings window

Efficient UI tests of our applications became possible thanks to our autotest framework, Kaspresso.

We compile the screenshots into comparison packages, in a format used in the Screenshot Lab application — which is entirely our own creation. We developed it in order to swiftly compile screenshots, send them for verification, and log bugs. The program can also create screenshot pairs of localized and English versions. All our contractors are trained to use this tool and understand how it aids and accelerates localization. We review the texts of the English and Russian versions ourselves, ensuring they maintain a consistent style, adhere to the Tone of Voice, have the correct inflections, and so on.

A few years ago, we used to manually collect screenshots in Screenshot Lab through the program’s interface, but recently we registered this utility in the Windows registry and created a shortcut that allows us to compile screenshot packages for testing with a single right-click! Not long ago, it used to take a whole workday to collect packages of screenshots for testing. Now, if we receive screenshots from the auto-screenshot tool in the morning, we can be sure that by this afternoon they are already being reviewed by the linguistic service vendor. By the way, we will soon publish an article about our tools, where you can learn more about how Screenshot Lab operates.

Next, we upload the localization screenshots to folders on a secure file-sharing platform and send them to linguists for verification. They check the texts in terms of their linguistic properties, such as the proper use of formal/informal addresses, grammatical errors, cosmetic defects (like text overflowing designated areas), and so on. By the way, these files can be accessed via the Screenshot Lab kit: we provide each vendor with a unique key granting access to the screenshots and the utility on our file-sharing platform. This significantly increases confidentiality levels.

Upon receiving the tested screenshots, we make the relevant adjustments in the CAT system to ensure the correct text is included in the TM and in the resources themselves. We then upload them into the version control system, from which we can obtain a build with texts ready for release.

A unified terminology base — to avoid confusion

To conclude, I’ll tell you about another one of our tools — Terminarium, the company’s unified terminology system. It’s an internal web portal where all the necessary terminology in all languages is stored: names of products and their components, specific product terms, common IT terms, and so on. Terminology on the portal is organized according to a “project-platform-version” system, making it easy to work with sets of terms relevant to a specific product, compare terminologies across products, and reuse common terms, among other things.

You can connect a glossary to a project in Smartcat and include the corresponding TB (Term Base) in the project settings. This makes things easier for the translator — many words have several translations, but it’s necessary to maintain consistent terminology across the company’s products. Thanks to this shared database, the likelihood of faulty text making it into the released version of the product is significantly reduced.

If necessary, any employee can suggest adding a particular term to Terminarium. Terms are often contributed by representatives of regional offices, who have a better understanding of the marketing specifics of the localization region’s language. This facilitates the continuous improvement of the product and user experience.

For external users, glossaries can be exported into standard Excel files; for example, linguists verifying translations in screenshots can check specific terms and maintain consistency in product translations.

***

So, to sum up: previously, we used traditional offline CAT systems like Trados and Passolo. However, transitioning to the cloud-based Smartcat and Syncer allowed us to send resources for translation immediately after the English strings appear in the repository. Moreover, Smartcat lets us send texts for translation into all languages with just a couple of clicks, without the hassle of generating packages for translation and individually sending them to each translator. This also reduces the costs of storing localization resources on the server — we no longer need to store bulky translation packages and terminology databases in folders and wait for licenses to become available. Localization now takes place seamlessly between Git and the CAT system.

Developing and improving Screenshot Lab allowed us to reduce the time required to compile and send screenshots for testing from an entire workday to a maximum of one hour. It’s also become easier to process linguists’ comments on the screenshots.

User guide documentation contains the most amount of text, so using machine translation, Hash IDs, and post-editing significantly cuts translation costs.

In conclusion, our multi-stage and partially-automated localization process saves us a lot of time and resources on translation and testing. Thanks to this, we can release product updates at least once a month, localized in all 34 languages and accompanied by online help and legal documentation.

If this article intrigued you and you also want to work with such a vast array of languages and witness how all 34 localizations are combined in a single product, come join us at Kaspersky’s Documentation and Localization department (Doc&Loc). We aim to make our applications and documentation accessible to all — and to do this quickly and easily :)