Our Journey to a Multilingual Review and Rating System

Published in

Trendyol Tech

6 min readMar 26, 2024

As the Review Rating Team at Trendyol, we oversee the entire life cycle of customer-generated reviews for purchased products. Our responsibilities span from the moment a review is crafted to its approval and subsequent display on the product details page. We carefully manage each stage of the review process to ensure a seamless and informative experience for our users.

In this article, we aim to share our insights and experiences gained from transitioning our Review and Rating system to a multilingual architecture.

Why Transition to a Multilingual System?

As Trendyol pushes into new territories, the adoption of local languages is imperative. To blend seamlessly into various markets, our specialized teams, particularly those managing reviews and ratings, must evolve to accommodate multiple languages. This adaptation involves storing, translating, and delivering content in various languages, significantly complicating our codebase with each new language addition. Thus, a scalable, long-term solution is imperative.

Data Structure Transformation

Initially reliant on Couchbase for data storage, our strategy involves restructuring our data models for better multilingual support. Our previous structure segregated language-specific content (e.g., comments and tags) into separate fields, leading to a cluttered and inefficient model. The revamped structure consolidates these elements under unified fields, distinguishing them by language tags, thereby streamlining storage and manipulation across languages.

Our initial data structure was like this:

...,
  "comment": "Harika kalın kumaşı var.",
  "commentEn": "It has great thick fabric.",
  "commentDe": "Es hat einen tollen dicken Stoff.",
  "culture": "tr-TR",
  "tags": [
    "kumaşı kalın"
  ],
  "deTags": [
    {
      "tag": "dicken Stoff",
      "sortingScore": 3
    }
  ],
  "score": 1.3,
  "deScore": 1,
  "originalComment": "Harika kalın kumaşı var."
...
}

We transform our data to be more structured and suitable for multi-language:

...,
  "originalComment": "Harika kalın kumaşı var.",
  "culture": "tr-TR",
  "comments": [
    {
      "language": "tr",
      "comment": "Harika kalın kumaşı var."
    },
    {
      "language": "en",
      "comment": "It has great thick fabric."
    },
    {
      "language": "de",
      "comment": "Es hat einen tollen dicken Stoff."
    }
  ],
  "tags": [
    {
      "value": "kumaşı kalın",
      "score": 4,
      "language": "tr"
    },
    {
      "value": "dicken Stoff",
      "score": 3,
      "language": "de"
    }
  ],
  "score": [
    {
      "value": 1.3,
      "language": "tr"
    },
    {
      "value": 1,
      "language": "de"
    }
  ]
...
}

With that transformation, we resolved our store and manipulation problem with the multi-language.

Enhancing Data Serving with Elasticsearch

The transition also entails optimizing our data-serving capabilities, particularly through Elasticsearch.

By analyzing our query structures, we identified the need for language-specific differentiation in our indices. We explored various solutions, including creating multiple documents per language, separating indices by language, and adopting dynamic mapping within a single index. Each approach presented its trade-offs concerning document size, index management, and query complexity.

Our initial data structure for Elastic was like this:

{
  ...,
  "cmt": "Harika kalın kumaşı var.",
  "commentEn": "It has great thick fabric.",
  "commentDe": "Es hat einen tollen dicken Stoff.",
  "culture": "tr-TR",
  "tags": [
    "kumaşı kalın"
  ],
  "deTags": [
    {
      "tag": "dicken Stoff",
      "sortingScore": 3
    }
  ],
  "score": 1.3,
  "deScore": 1,
  "originalComment": "Harika kalın kumaşı var."
  ...
}

Here are the possible solutions for more optimized and structured queries with multi-language support.

Multiple Documents By Language

We can store each review and translation by language with keys like reviewId_tr, reviewId_en, and reviewId_de, etc.

Our data structure will be like this:

{
    ...,
    "_id":"312312_de",
    "originalComment": "It has great thick fabric.",
    "comment": "Es hat einen tollen dicken Stoff.",
    "language": "de",
    "culture": "en-EN",
    "tags": {
        {
        "score": 1.2,
        "value": "dicken Stoff"
        }
    },
    "scores": 1.2
    ...
}
{
    ...,
    "_id":"312312_en",
    "originalComment": "It has great thick fabric.",
    "comment": "It has great thick fabric.",
    "culture": "en-EN",
    "language": "en",
    "tags": {
        {
        "score": 3,
        "value": "thick fabric"
        }
    }
    "scores": 1.2
   ...
}

This solution offers several advantages:

Reduced document size.
Enhanced data organization, with each translated review stored in its document.
Simplified addition of new languages without requiring code modifications for data storage.

However, it also introduces some drawbacks:

Increased index size due to the duplication of non-language-specific fields.
A necessity for horizontal scaling of Elasticsearch clusters.
Greater complexity in query structuring.

Separate Index By Language

We can store each review in its index for each language like review_tr, review_en, and review_de, etc.

Our data structure will be like this:

review_en index:
{
    ...,
    "originalComment": "It has great thick fabric.",
    "comment": "It has great thick fabric.",
    "culture": "en-EN",
    "tags": {
        {
        "score": 1.2
        "value": "thick fabric"
        }
    },
    "scores": 1.0,
  ...
}
review_de index:
{
    ...,
    "originalComment": "It has great thick fabric.",
    "comment": "Es hat einen tollen dicken Stoff.",
    "culture": "en-EN",
    "tags": {
        {
        "score": 1.2,
        "value": "dicken Stoff"
        }
    },
    "scores": 1.0
    ...
}

This solution offers several advantages:

Compact document sizes.
Improved data structuring, with each translated review stored in its respective index.
The ability to serve different language indexes from separate Elasticsearch clusters enhances flexibility. However, this may complicate the coding required for reading data.
Eliminates the need for code modifications with each new language addition, as language management is handled via configuration. This ensures that new indexes are created with the appropriate mappings, rather than relying on automatic type inference.

However, it also introduces some drawbacks:

Increased difficulty in maintaining indexes for each language, leading to higher operational costs.
Enhanced complexity in code evaluation with the introduction of new languages.
Greater consumption of Elasticsearch resources, including data size, storage, and memory.
The need to query multiple Elasticsearch indexes based on business requirements — such as deciding if a review needs translation or displaying the original review content based on a client’s action — can complicate workflows.

Dynamic Mapping on One Index

We can keep new language fields on the same document model but instead of adding a new field for each language, we can store them inside related fields with language keys.

One of the most important features of Elasticsearch is that it tries to get out of your way and lets you start exploring your data as quickly as possible. To index a document, you don’t have to first create an index, define a mapping type, and define your fields — you can just index a document, and the index, type, and fields will display automatically. The automatic detection and addition of new fields is called dynamic mapping. The dynamic mapping rules can be customized to suit your purposes with Dynamic Field Mapping and Dynamic Templates.

Our data structure will be like this:

{
  ...
   "comments": {
    "tr": "Harika kalın kumaşı var.",
    "en": "It has great thick fabric."
  },
  "culture": "tr-TR",
  "tags": {
    "en": [
      {
        "value": "good",
        "score": 1.2
      },
      {
        "value": "quality",
        "score": 1.5
      }
    ]
  },
  "scores": {
    "tr": 1.2,
    "en": 1.2
  }
  ...
}

This solution offers several advantages:

Eliminates the necessity for code modifications or deployments with the addition of new languages, streamlining the process.
Avoids changes to the event contract for new languages by implementing dynamic events for integrated domains, enhancing system flexibility.
Facilitates more organized query execution and the dynamic generation of queries for different languages, tags, and scores, improving data retrieval efficiency.
Leads to more systematically organized Elastic and Couchbase models, promoting data consistency and accessibility.

However, it also introduces some drawbacks:

Results in larger document sizes may affect storage and performance.
Increases the size of indexes due to the complexity of nested structures, potentially impacting search performance and resource utilization.

Choosing the Optimal Solution

After a thorough evaluation, we settled on dynamic mapping within a single index as our preferred approach. This method allows for greater flexibility, easier maintenance, and scalability without necessitating significant code changes or complex deployment processes. It aligns with our goal of efficient multilingual support, ensuring that our system remains robust and adaptable to future language additions.

Conclusion

The transition to a multilingual review and rating system underscores our commitment to expansion and customer inclusivity. By carefully navigating the challenges of multilingual data management and serving, we establish a foundation for sustained growth and enhanced user engagement across diverse linguistic landscapes. The dynamic mapping solution not only allows us to reduce query complexity, disk usage, and response time but also increases our scalability and throughput capacity. Notably, the implementation of the dynamic mapping solution has led to a 10% reduction in index size, a 10% decrease in response times, and a remarkable 30% increase in throughput capacity. These improvements significantly enhance our system’s efficiency and our ability to serve our customers more effectively.

While this solution appears optimal for now, we may need to re-evaluate it in the future as circumstances change. We’ll keep an eye on the data.

Also, in Elasticsearch there’s an option called _source, which allows you to fetch only the necessary parts of the data instead of the entire dataset. This approach can lead to performance gains.

Join Us

We’re building a team of the brightest minds in our industry. Interested in joining us? Visit the pages below to learn more about our open positions.

Trendyol - Backend Developer

We were founded in 2010 with a dynamic and agile start-up spirit. The trust of around 30 million customers and 250,000…

jobs.lever.co

Our Journey to a Multilingual Review and Rating System

Why Transition to a Multilingual System?

Data Structure Transformation

Enhancing Data Serving with Elasticsearch

Choosing the Optimal Solution

Conclusion

Join Us

Trendyol - Backend Developer

We were founded in 2010 with a dynamic and agile start-up spirit. The trust of around 30 million customers and 250,000…

References

Written by İbrahim Tuğrul