Your customers speak different languages. Now your application can too.
Many of our users do not speak English as a first language, and it’s one of the most commonly overlooked aspects of user experience. But the good news is that if you make your applications multilingual, it can make your application more accessible for users and help drive engagement.
There are usually many points of interaction between your applications and the public audience — mobile and web apps, emails, marketing messages, web sites, SMS messages, just to name a few. One of the reasons why it can be difficult to integrate multiple language support is that there are so many places where natural language strings are used or embedded, and coordinating all the conversions is hard.
In this post we’ll see how a serverless approach can use Amazon Translate to bring machine translation and automation to help solve the problem. The service offers cost-effective, pay-as-you-go translation between 25 languages, including a monthly 2-million character Free Tier allowance that will be more than enough for our testing (see full pricing information here).
This tutorial shows how to connect an Amazon S3 bucket to Amazon Translate, so every time a new text object is added to the bucket it will be automatically translated into a number of different languages, with the translations stored in the same bucket.
A major benefit of using a serverless approach is scalability. If hundreds or thousands of objects are added to the bucket, the Lambda function will scale automatically to accommodate the load. So whether you want to use this code for just the occasional object, or integrate the process into a complex media storage environment, it scales without any developer intervention needed.
Another important benefit is that your application does not need to be serverless itself to use this solution — you can take advantage of a serverless translation process for server-based content management systems, since we are only creating additional language content to support your application.
(1) First, let’s create a new bucket for this tutorial. Sign into your AWS console and navigate to S3. Click “Create bucket”, enter a unique name and accept all the default settings, clicking “Next” through the subsequent pages:
(2) Next, navigate to AWS Lambda from the Services drop-down at the top of the screen. Click “Create Function” on the Functions card:
(3) Enter a function name (like “s3-serverless-translation”) and select the Node.js 10.x runtime. Leave “Author from scratch” selected, then click “Create function”:
(4) On the Designer card, click “Add trigger” on the left and select S3 from the list of triggers:
(5) When the Configure triggers card appears, select the bucket you created in step 1. Choose “PUT” as the Event type, use “.txt” for the Suffix, then choose “Add”.
You now have a Lambda function that will be invoked every time an object with the suffix “.txt” is added to your bucket. At the moment it only writes “Hello from Lambda!” to the function’s CloudWatch log, so we will set up the code that performs the translation next.
How it works
The code for this Lambda function is available in this gist (shown below)— let’s review the code and see how this process works.
The target languages (line 29) array contains the language list we will use for translation, and each target language will be stored in a separate object in S3. You can add or remove any of the supported language codes here.
The handler (lines 32–48) is the main entry point, and Lambda provides an event with details of the new S3 object that caused the invocation. It’s possible for this event object to contain multiple records so this handler iterates through the object to process each one. The code calls the doTranslation function for each language in the array.
In doTranslation (lines 51–73), it calls translateText and stores the result in the same S3 bucket. The translateText function (lines 76–98) is simply a wrapper for the AWS Translate service. Finally, the getS3object (lines 107–117) and putS3object (lines 131–141) functions are helpers for getting and putting objects into S3, as the names suggest.
When the S3 event invokes the Lambda function, it does not pass the contents of the object to the function, only the key and metadata describing the object. If we need to interact with the S3 object, we must open it within the Lambda function using a function like getS3object above.
Configuring the Lambda function
(1) Copy the code from the gist to the clipboard. Back on the Lambda function page, scroll down to the Function code card and replace the existing stub code with the code from the clipboard:
(2) Scroll down to the Execution role card and click the “View the <<function name >> role” link to open the role in IAM. We need to add permissions to the role to allow the function to create objects in the bucket and access the AWS Translate service:
(3) Click “Add inline policy”, select the JSON tab and paste the policy below (replacing your-bucket-name with the bucket name you created earlier). Click “Review policy” , provide a descriptive name (e.g. “s3-automatic-translation-policy”), then click “Create policy”.
You will then see the new policy as part of the IAM role:
At this point, you have a Lambda function that is configured to run when the PUT event is fired in the S3 bucket, and it has permission to use AWS Translate.
Testing the function
(1) Create a text file using your favorite text editor, add a couple of sentences of text, and save the file locally on your machine:
(2) Back in your S3 bucket, click “Upload” and select the text file. After a few seconds, click the refresh icon and you will see a new folder has appeared called ‘translations’. Click this folder to see all the new translated objects:
(3) In the Lambda function page, click the “Monitoring” tab and then “View logs in CloudWatch”. Select the most recent log entry, and you will see the translations that were generated by AWS Translate:
This function is now live and will be invoked every time a new object with a “.txt” suffix is PUT into the S3 bucket. You can change the list of translated languages by modifying the targetLanguage array in the Lambda function. Or you can disable it entirely by disabling the S3 trigger from the function console.
Avoiding infinite loops
This Lambda function is automatically invoked whenever an object is stored in S3 since it is associated with the object’s put event. But what happens if a function writes new objects back into the same S3 bucket? It’s possible to create the cloud equivalent of a while(true) infinite loop, since each new file will trigger another invocation of the Lambda function, which then creates another file, and so on.
Fortunately there are a few simple precautions to help avoid this problem:
- Store the new objects in a separate bucket. This is the sure-fire method to guarantee the new objects cannot trigger Lambda invocations from the source bucket. This isn’t always the desired configuration since it requires two buckets, but it is fail-safe.
- Test your Lambda function using the mock events in the Lambda console during development, rather than using the S3 integration. This ensures that you are controlling the invocation process until your code is ready for production.
- If you store the new objects in the same bucket, use an object prefix or suffix when configuring the trigger (e.g. “original_” for a prefix, or “.txt” for a suffix) , and ensure your new objects use a different naming pattern to avoid triggering the event.
- Check the event object in your Lambda handler to ensure the incoming object is the expected type for processing. In this example, the code will ignore any object key containing the word “translations”, and exit immediately. This prevents the translated objects from starting a loop.
Use-cases for automatic translation
AWS Translate is a near real-time service so it makes language translation available to many different types of application:
- Static resources for websites: many content management systems can reference multilingual web assets. Using this method to produce content translations, you can then create a publicly-readable bucket (or use CloudFront to distribute the contents of a private bucket), to enable your CMS to access assets in different languages.
- Mobile app development: for example, Android allows developers to handle localization in the strings.xml file in the resource folder. The Lambda function could parse this file and submit all the text strings found to Amazon Translate, saving the localized content for each language you want to support.
- Translation for DynamoDB: DynamoDB is used by applications as a data store for shopping carts, customer support logs and many other uses where there are dynamic natural language strings stored in attributes within items. This function could be modified to support DynamoDB streams and writing the translation back to items in a table.
- Live chat applications: for customer support or peer-to-peer conversations where parties are using different languages, Lambda can use AWS Translate to allow each party to see the entire conversation in their native language.
This post showed how to automatically translate the contents of objects PUT into an S3 bucket, using the AWS Translate service. We created a Lambda function that is invoked by the PUT event in our bucket, but it could just as easily be invoked by a DynamoDB stream or API Gateway endpoint.
With so many mobile and web application users speaking many different languages, this serverless approach provides an easy, cost-effective way to improve communication and make it easier for customers to use your software, without having to maintain any complex infrastructure.
If this blog post was helpful, don’t forget to applaud to let me know. Have an AWS Serverless question? Tweet me @jbesw. Thanks for reading!