Comparing Different AIs Regarding Translation in an Android Application

Published in

arconsis

13 min readJan 15, 2024

In the fast-paced world of AI and machine learning, the ability to compare and contrast different models has become a critical need. Developers, data scientists, and businesses often find themselves at a crossroad when choosing the right AI solution for their projects. In the context of my enriching experience during my practical semester at arconsis, where I delved into the realms of both app development and machine learning, I undertook the exciting challenge of integrating four distinct AI models into an Android app.

This project particularly served as a good introduction and integration into the company and exciting preparation for bigger projects.

This endeavor not only marked my initial foray into the realms of ML and app development but also aimed at creating a user-friendly and streamlined setup to swiftly deliver results.

Outline of this Article:

Model Overview: I will test and discuss the AI models integrated into the app. I used DeepL, ChatGPT, ML Kit Framework from Google, and Translator from Microsoft Azure Portal. Additionally, I’ll share insights from testing Meta’s Llama2 outside the app because there is no ready to use API. While testing, I found that Llama 2 is not suitable for translation. So I simply no longer integrated it into the app.
Development Insights: I’ll delve into the software engineering and clean structure behind this app, showcasing essential aspects of its design, code, and functionality with screenshots.
Performance Evaluation: To provide a comprehensive analysis, I will take a German example sentence and showcase its translations into English, French, Persian, and Chinese within a comparative table. I’ll also compare average response times across the integrated AI models.

Join me as I explore this innovative Android app’s journey, revealing the steps to build and the key features that allow comparing different AI models. Let’s usher in a new area of informed decision-making in the world of artificial intelligence.

At the end of the article you can find the link to the GitHub repository where the source code is available.

DeepL:

When it comes to translation quality, DeepL stands out prominently among its AI counterparts.

DeepL only supports 31 languages, but offers fairly accurate translations. DeepL also offers language detection so that you can recognize the source language based on the text to translate.

It provides a free version that includes access to all functions, allowing you to translate up to 500,000 characters per month. For more extensive usage, DeepL offers a subscription plan at €4.99 per month, granting unlimited character translations for personal use. For businesses, there exist special commercial licenses.

When it comes to performance, DeepL shines with high availability and rapid response times. This ensures that translations are not only accurate but also delivered swiftly, making it a valuable tool for time-sensitive projects.

ChatGPT:

ChatGPT is renowned for providing reliable responses, although the quality of these responses can vary depending on the nature of the query and the length of the text to be translated. The longer the text, the higher the likelihood of encountering timeouts or other exceptions, which is something to keep in mind.

One of the standout features of ChatGPT is its language versatility. It can respond effectively in a wide range of languages, eliminating the need to worry about language support. It’s truly a powerful language model that caters to a global audience.

When it comes to pricing, ChatGPT offers an attractive incentive for new users. Upon opening an account, users are provided with an initial $5 credit that can be used for all of ChatGPT’s API services. It’s important to note that this $5 credit remains valid for only four months, after which users have the option to start a payment plan for continued usage. Furthermore, ChatGPT’s pricing is based on tokens, with 1000 tokens equating to approximately 750 words (see the table below for reference gpt-3.5-turbo).

In addition to pricing, ChatGPT has certain usage limits in place. In this model, users are limited to 3 requests per minute and 200 requests per day. However, for the gpt-3.5-turbo model, users have the capacity to make up to 40,000 tokens per minute. These limits only apply to the free tier. Higher tiers get higher limits.

Google’s ML Kit

Google’s ML Kit offers a straightforward approach to translation that eliminates the need for manual API calls. Instead, developers can integrate translation capabilities by adding a simple dependency in the Gradle file:

implementation ("com.google.mlkit:translate:17.0.1")

Once integrated, developers can create a Translator object, configuring it with the desired source and target languages.

Additionally, ML Kit provides language detection functionality, supporting over 100 languages. However, it’s important to note that the translation feature itself is limited to 58 languages. To initiate a translation, the only requirement is downloading the specific language model, approximately 30 MB in size. Developers have the flexibility to set the download preference to Wi-Fi-only or allow downloading over any network, offering convenience to users.

One of the notable advantages of Google’s ML Kit is its cost efficiency. There are no additional costs associated with translation services; users only need to download the required language model, making it a budget-friendly choice. Furthermore, the localized models allow users to download and use only the languages they frequently require, enabling offline usage.

In terms of accuracy, the translation quality is reasonably good, although it may vary based on the length and complexity of the text. While it may not match the precision of some dedicated translation services, it offers a solution for many everyday translation needs.

Google’s ML Kit, with its simplicity, cost-effectiveness, and localized model approach, provides a practical solution for basic translation needs. In my exploration, I found it to be an option for users seeking localized offline translation capabilities.

Microsoft’s AI Translation Service:

Microsoft’s AI translation service is a force to be reckoned with, supporting 129 languages. In addition to translation, it offers language detection.

One of the features of Microsoft’s service is its availability and rapid response times. Whether you’re translating text or utilizing the speech recognition function, Microsoft ensures high responsiveness, minimizing delays and keeping users in the flow of their tasks.

In terms of pricing, Microsoft offers flexibility to cater to various needs. The free version (F0) provides a generous allocation of 2 million characters per month. For more extensive usage, Microsoft offers a range of subscription plans with varying costs. The pricing spectrum can vary from $10 per month for 1 million characters (S1 Standard) to $45,000 per month for 10 billion characters (S4 Standard). There are additional plan options like C2, C3, C4, and D4, catering to different requirements.
It is important to note that you must have a Microsoft Azure subscription to use these services. That’s why it’s not completely free. But a free trial period is still offered where you can test any services.

This cost structure offers users the flexibility to choose the most suitable plan based on their usage.

The accuracy of Microsoft’s translation service is commendable, delivering precise translations not only between commonly used languages but also for languages with intricate structures such as Persian and Arabic (Based on my own experience).

Llama 2:

Meta’s Llama2 offers a distinctive approach to interacting with AI models, with a focus on accessibility and versatility. While traditional methods using RESTful web services are common, Llama2 operates under specific conditions and provides alternative solutions for making API calls.

For the wider Hugging Face platform, developers can conveniently make API calls to various AI models. Hugging Face hosts a vast array of models designed for a multitude of purposes, including text-to-image, image-to-text, text-to-speech, and more. This is made possible by using a base URL like “https://api-inference.huggingface.co/models/t5-base" and appending the desired model’s name to the end. Additionally, headers with authorization details and the input text in the body are all that’s needed for an API call.

However, Llama2 operates differently. It does not conform to the conventional RESTful web service model and is accessed under specific conditions. Llama2 is a relatively new and evolving offering from Meta. To interact with Llama2, developers have alternative options, such as leveraging existing scripts like “Oobabooga,” which is based on miniconda. There is also https://localai.io/. It offers a ChatGPT API with which you can call also Llama models. Upon installation, it provides a local TextGen Web UI that allows users to interact with Llama2. Adjusting GPU memory settings is essential since Llama2 runs locally and requires resources.

Llama2 offers three distinct versions: 7B, 13B, and 70B. Currently, there are no costs for using the Llama2 model (So the model itself is free, but running it is your job and thus costs money). However, if users intend to create their own TextGen API, they may choose to utilize GPU resources from services like RunPod, incurring costs accordingly.

When it comes to translation quality, Llama2 is just acceptable for a select few languages and simple texts. For more complex translations, it doesn’t perform as well, making it less suitable for translation tasks. Llama2 may find better application in other scenarios where its unique capabilities align with the task at hand.

Continuing our journey, we will now explore the overall software engineering and structure behind the app, as well as the fascinating insights and results obtained through our evaluation of these AI models.

Development Insights: Crafting an Agile and Versatile App

The primary aim of the app is to facilitate the comparison of various AI models’ translation capabilities, offering users insights into their performance.
The app caters to a diverse audience, encompassing individuals interested in AI and translation technologies.
The development of the Android app was a journey marked by the use of cutting-edge tools and adherence to fundamental principles of software engineering. This journey allowed me to create a well-structured, versatile, and user-friendly application for comparing various AI models.

Here’s a glimpse into the development process:

Choice of Tools: I leveraged Android Studio as the integrated development environment (IDE), employed Kotlin as programming language, and used Jetpack Compose for the app’s user interface (UI). These technologies collectively streamlined the development process and ensured a modern, responsive UI.

Software Engineering Principles: Following the principles of the app architecture recommended by Google, which is based on Model-View-Viewmodel (MVVM). A single source of truth was established for data management. ViewModel was utilized to manage data and UI interaction logic, and a dedicated class was created for each translator.

User Interface Design: Adhering to the principles of “State Hoisting,” I designed the UI to ensure that events are sent upwards, and states are propagated downwards as part of the unidirectional data flow, which is a principle of the recommended app architecture.. This enhances the predictability and manageability of UI elements. Most of the time stateless Composables were employed to create reusable UI components, enhancing the app’s maintainability.

Localization: To cater to users of different languages, text strings were stored in resource files (res -> values) for both German and English. This enabled the app to display text in the user’s preferred language based on their device settings.

Secure API Key Handling: API keys are read from the Gradle file, allowing them to be generated automatically during compilation into the BuildConfig file. Please note, while this ensures the API Keys are not stored in the repository, they will still be included in the app and thus be practically public. I do this for this simple test app to not increase its scope too much. In a real-world application, one would utilize a backend for frontend to handle the keys and proxy calls to the services.

Dynamic Model Integration: To ensure the app’s generic operation, a JSON file (res -> raw -> translators_info.json) was created. This file is only loaded once at the app’s startup and contains the names of each translator and their corresponding implementation class. This minimalistic approach allows for effortless integration of new translators with a single line of code (But implementation of the translator class is still required and this single line of code alone cannot do magic):

[
  {
    "name": "DeepL",
    "className": "com.example.aitranslators.services.DeepLTranslator"
  },
  {
    "name": "ChatGPT",
    "className": "com.example.aitranslators.services.ChatGPTTranslator"
  },
  {
    "name": "Google",
    "className": "com.example.aitranslators.services.GoogleTranslator"
  },
  {
    "name": "Microsoft",
    "className": "com.example.aitranslators.services.MicrosoftTranslator"
  }
]

TranslatorService Interface: A common interface, TranslatorService, was defined and implemented by each translator class to maintain consistency in function names and structure:

interface TranslatorService {
    suspend fun translateText(query: TranslationQuery,):Result<TranslationResult>

    fun canDetectLanguage(): Boolean

    suspend fun detectSourceLanguage(text: String): DetectedLanguage
}

Database Integration: The app also features a database for users to store translations. A separate screen was implemented to display the list of saved translations, enhancing the app’s functionality.

Compose navigation graph: To keep all components within a single activity, a Compose navigation graph was introduced, enabling seamless screen navigation and transitions.

State Management: Each screen gets its data as one piece from the ViewModel, describing the current state of the screen. This allows the screen to be created consistently by one source of truth. The state is provided by a flow, thus the UI gets updated automatically, everytime the VM updates the flow.

private val _translationData: MutableStateFlow<TranslationData> =
 MutableStateFlow(TranslationData())
val translationData: StateFlow<TranslationData> =
 _translationData.asStateFlow()

Dependency Injection (DI): To facilitate the management of class creation and dependencies, the app heavily uses DI in the form of Hilt. E.g. for ViewModels and repository classes.

HorizontalPager Integration: The app’s UI features HorizontalPager, which updates the current index when users swipe between pages. When users click the “Translate” button, the ViewModel invokes thetransalateText() function, passing the current index to determine the appropriate translator class to call.

Language Selection: A fixed language list was chosen to facilitate easier comparison of different AI models, as not all models support all languages. Language names were mapped to language codes for API calls (within a mapper function).

Response Time Measurement: Response times were measured and displayed on the UI to enable users to compare the performance of various AI models.

With these development insights, we’ve strived to create an app in an agile way that is adaptable and insightful, providing users with a powerful tool for comparing AI models. Our commitment to clean software engineering principles ensures a robust and user-friendly experience.

Here are some screenshots of the small app, providing a visual perspective on its structure and functionality. The default display language of the app is English, if the device language is not supported. When developing the app, I have so far only taken English and German into account:

Performance Evaluation: Comparative Analysis of AI Models

To gauge the performance and effectiveness of different AI models, I conducted a simple evaluation by translating a German sentence into English, French, Persian, and Chinese using each of the integrated translation services. In cases where Google and Llama 2 produced incorrect translations, a follow-up test was conducted using English as the source language to assess improvements.
To measure the average response time, each translation was carried out 10 times.

Here are the tables showcasing the results of performance evaluation (The colors show the quality of the translation):

When translating from German to Chinese, DeepL and Microsoft weren’t bad, Google and Llama were OK, but ChatGPT was a bit strange. Therefore the translation again from English to Chinese:

This time the translations were better, but Google returned poor results.

As we can see/verify, Google and Lama didn’t translate correctly (From German to English). And that’s why I tested it again (From English to German). Here are the results:

Google:

What color is your brother’s bike? -> Welche Farbe hat das Fahrrad ihres Bruders?

Llama 2:

What color is your brother’s bike? -> Was ist die Farbe deines Bruders’ Bikes?

I also tested Llama 2 again specifically for Persian, in the hope that English as the source language would produce better results…
even as a Persian speaker, i don’t understand that:
چراغ کولر بیک خود آقا است؟ (Cheragh-e koolor beik khud ast?)

Conclusion:

In the grand scheme of AI translation services, the choice ultimately hinges on your specific requirements. Let’s sum up our evaluation with a concise conclusion:

If your priority is swift responses, Google’s ML Kit emerges as a choice. It provides speedy (Because of offline function) but not high quality translations.

For those seeking both rapid and high-quality translations, DeepL takes the lead.

If you desire quick responses, correctness and an extensive language spectrum, Microsoft’s AI translation service is the way to go. Microsoft excels in offering a broad array of languages and reliable translations.

If you value convenience and reasonably accurate translations, and don’t necessarily need lightning-fast responses, ChatGPT fits the bill. It’s a powerful language model for a variety of purposes.

As for my personal top three picks, they would be Microsoft, ChatGPT, and DeepL.

In your quest for the right AI translation model, consider your specific needs, whether it’s speed, precision, language support, or versatility.

It’s important to note that these assessments are current as of December 2023, and the rapidly evolving nature of AI models means that these standings may evolve over time. No guarantees can be made about the permanence of these models’ characteristics.

Congratulations for making it to the end of this article! Your dedication to reading deserves a reward, and here it is: the link to our GitHub repository, where the entire source code awaits your curious eyes.

GitHub Repository Link

Thank you for taking this journey with me through the development and performance assessment of our AI Translators app. I hope these insights provide valuable information for your AI model selection process. Happy translating!

Comparing Different AIs Regarding Translation in an Android Application

Outline of this Article:

DeepL:

ChatGPT:

Google’s ML Kit

Microsoft’s AI Translation Service:

Llama 2:

Development Insights: Crafting an Agile and Versatile App

Performance Evaluation: Comparative Analysis of AI Models

Google:

Llama 2:

Conclusion:

Written by Shoaib Amiri