Cross-Lingual Transfer for ABSA

Published in

TrustYou Engineering

9 min readNov 17, 2022

Sentiment Analysis is nowadays the most popular NLP task in the industry and is used widely to create different products for various domains. Aspect-Based Sentiment Analysis (ABSA) is a fine-grained version of the Sentiment Analysis task which is also often found in the industrial applications that deal with customer review analysis where you need to know not only sentiment but also a discussed topic. The Sentiment Analysis task can go even beyond detecting aspects and sentiments and try to detect targets, namely associated phrases, as well — Target Aspect-Based Sentiment Analysis (TABSA). For example, the TrustYou sentiment analysis system highlights spans of text in reviews associated with some aspect and polarity to present to users.

Cross-lingual transfer for ABSA. Drawn by Maria Obedkova

Problem scoping

One of the most popular solutions for the Sentiment Analysis task nowadays is to use deep learning models since they show great generalization power in comparison with the old-school rule-based approaches and better performance compared to other statistical solutions.

Deep learning solutions typically require a lot of training data and this data usually needs to be annotated. In most cases, the data gets annotated manually since it is the only reliable source of task-specific knowledge. ABSA is even more demanding since it needs more information to be annotated, let alone TABSA. Thus, the data for Sentiment Analysis training cannot be found in abundance, especially if you need the analysis for a particular domain.

Sometimes industry solutions are required to scale to a bunch of different languages which makes it even more difficult to obtain the required data. Data is not only difficult and slow to get but in the majority of cases is also expensive. The data might exist for some languages but not the others and annotating sufficient data for several languages at once might be nearly impossible.

It may sound a bit dramatic since for developing a meaningful solution you require a lot of data that you don’t have. However, there exist various solutions to tackle the lack of training data which vary from experimenting with the models and architectures to augmenting the data itself.

Nonetheless, if you

deal with the model-based solution for Sentiment Analysis
work in a multilingual domain
don’t have enough annotated data for some languages
don’t need outstanding performance for languages where you lack the data

then you might be interested in trying transfer learning for your problem at hand.

Cross-lingual transfer learning

Transfer learning became quite popular in Machine Learning tasks with the rise of big pre-trained models. The main idea behind transfer learning is that you no longer need to train a model for a specific task from scratch. Instead, you can reuse a general-purpose model trained on a large dataset in order to benefit from its knowledge and adjust this model for a task that you need. This is quite popular in Natural Language Processing and Computer Vision domains where there exist pre-trained models that possess language and image-specific knowledge, accordingly. If you deal with a model-based solution, transfer learning might be the best option considering it is SOTA to date, nicely documented and widely implemented.

Cross-lingual transfer is a variation of transfer learning. Here the main task is to train a domain or/and task-specific model and transfer its abilities to perform some task on one language to another language. This area of NLP emerged in an attempt to battle the problem of limited data for some languages and the problem of low-resource languages in general.

If we approach model-based cross-lingual transfer in the (T)ABSA context, there are quite some things to think about. Even if your solution involves multitask learning, generally, you are trying to solve several different tasks: categorizing into correct aspects, choosing the correct polarity, and optionally, detecting reasonable spans. This leaves you with some problems in the multilingual context: how to transfer

aspect assigning abilities
polarity classification abilities
span detection abilities

from one language to another.

This roughly means that

contexts describing similar topics should map to the same aspects
similarly polarized contexts should get the same sentiment
similar meaningful and syntactically acceptable phrases should be detected for similar contexts given similar polarities

with no regard to a language.

What makes it possible to use cross-lingual transfer here? For cross-lingual transfer, it is important for a target task to share some major similarities with a source task. All three tasks of ABSA share linguistic knowledge across different languages like semantic representations and structural similarities. Many languages, in general, have a lot of syntactical, lexical, and semantical similarities. So, it is quite intuitive to use cross-lingual transfer, especially for closely-related languages.

Popular methods of cross-lingual transfer

There are a bunch of ways how you can transfer model capabilities from one language to another. There exist quite an elaborate body of research on this topic ([1] — a nice overview of the topic). Here I mention the three main approaches you can follow in performing cross-lingual transfer:

transferring through embedding alignment
transferring through a multilingual model
transferring across monolingual models

Let’s briefly discuss each of them.

Transferring through embedding alignment

This approach suggests that transfer can be done by aligning monolingual embeddings (example: [2]). Given two sets of embeddings for two languages, the mathematical transformation can be derived to map one set of embeddings onto the other. After that, embeddings are expected to align semantically and syntactically.

The steps to follow are:

learn monolingual embeddings for each language in question
align embeddings by mapping them onto the shared multilingual embedding space using a transformation function
fine-tune the model for a specific task using aligned embeddings of a high-resource language
zero-shot or few-shot transfer on a low-resource language using aligned embeddings as input

Embedding alignment approach (simplified). Drawn by Maria Obedkova

The alignment method showed decent results but was outperformed by the emergence of multilingual models like mBERT.

Transferring through a multilingual model

In this approach, we are relying on the generalization power of models pre-trained on several languages at once [3]. A multilingual model generalizes across languages it was trained on, getting somewhat inter-lingua abilities.

The procedure for this approach is:

if you want to train your own multilingual model, create a joint subword vocabulary for the languages in question and join training data for those languages together
pre-train a multilingual model or use whatever is already available open-source like mBERT [4]
fine-tune the model for a specific task using high-resource language data
zero-shot or few-shot transfer on a low-resource language

Multilingual model approach (simplified). Drawn by Maria Obedkova

It may seem that training with more languages should help to generalize better but in fact, it was proven to poorly impact performance (if interested, check the “curse of multilinguality” topic). In case you need to adjust to a lot of languages, you might consider using the MAD-X approach [5] which uses adapters [6] to overcome this problem and generally boosts the performance per language.

However, there is an approach that doesn’t require to have joint vocabularies that blow up the training space. It was proved that having language-to-language transfer with monolingual models can have a comparable performance.

Transferring across monolingual models

This approach is quite similar to the approach of embedding alignment but the alignment is done not externally but internally within a model. The idea behind this approach is to realign the embedding of a low-resource language through a trained model for a high-resource language without involving any additional languages and data [7].

Here is the outlined method:

pre-train a monolingual model using high-resource language data
freeze parameters of a pre-trained model
relearn token embeddings for the low-resource language by performing MLM (Masked Language Model) training using a pre-trained model with frozen parameters
fine-tune the model for a specific task using high-resource language data by keeping token embeddings of a high-resource language frozen
zero-shot or few-shot transfer on a low-resource language

Monolingual model approach (simplified). Drawn by Maria Obedkova

Even though this approach has quite some advantages, especially in the case of closely related languages, it requires some training capacity since you need to train new token embeddings.

What to expect?

At TrustYou, we have experimented with cross-lingual transfer for (T)ABSA and here is discussed what you could expect and what you might want to pay attention to.

Generally, cross-lingual transfer works quite nicely for all three tasks of (T)ABSA but you should keep in mind some limitations. Let’s discuss them task by task.

Sentiment classification

The basic Sentiment Analysis task is the easiest to transfer compared to the other two tasks. If you manage to get a decent performance for the base language, you most probably will achieve decent performance for detecting polarities for transferred languages as well with a small amount of target language data. Of course, you will still hit some edge cases, most probably due to a tricky syntactic structure which is harder to transfer. As we know to date, word order is the main contributor to poor cross-lingual transfer.

Span detection

In the case of transferring span detection abilities, you will be able to get reasonable spans across different languages, however, syntax plays quite some role in this. You won’t be able to detect spans that differ significantly in syntax from the source language when you transfer from a high-resource language to a low-resource one using a small amount of data for fine-tuning. Even though obtained spans may be quite sensible in terms of semantics, syntactically spans might be imperfect, especially for the transfer across not closely related languages.

Aspect classification

When it comes to aspects, it will largely depend on how you framed your DL solution: the more complex the task, the more problematic it is to get a base model with decent performance, thus, the more tricky it becomes to transfer. Multilabel classification is way harder than just multiclass classification, high cardinality problem is way harder to transfer than a low-cardinality problem. It also depends on how different your aspects are from each other: if the aspects are quite ambiguous, not only the transfer is tough but also your base language model can have trouble performing well. These issues with the base model will escalate while performing the transfer.

Which approach to choose?

Generally, the choice of approach is highly guided by the scenarios in which your solution will be used. The common scenario is that you can spend enough effort on developing the elaborate ABSA system for one main language, which normally happens to be English due to available data and the ease to find annotators. However, there are normally not enough resources (time, money, human capital) to develop the same elaborate ABSA model for each required language.

You might prefer the monolingual approach when you have the resources to develop a set of good ABSA models per, let’s say, language family. When you are not in a position to do it, you might go with the multilingual model approach. When you don’t have an opportunity for intensive training and a language you need is not represented in multilingual models, you might want to consider the embedding alignment approach.

There are multiple variations of what can guide you in your decision: the set of languages you need the solution for, the success criteria you defined, the basis for comparing results for different evaluated options, what kind of resources you have access to, etc. It is always good to keep an eye on your ultimate goal, your baseline, and what is the maximum you can achieve given your limitations.

Even though Sentiment Analysis is usually perceived as a straightforward DL task, everything changes when you enter the realm of ABSA. There inevitably appear more objectives to train for and more data to annotate. Given a multilingual context, ABSA becomes even more challenging.

Good luck with your ABSA experiments!