How to Easily Handle Emoji Unicode in Java

Udy Dhansingh
The Startup
Published in
2 min readSep 28, 2020

Emoji Challenge

Building telegram bots are fun. Choosing Java to create a bot and use emojis sent me into a tricky situation.

The unicode code points for emoji must be converted to surrogate sequence for Java code to process it correctly, otherwise the character will not be rendered rightly to visualize.

Java needs surrogate pair for the unicode point, which is a bit daunting to start with, and it is even more crazier to keep it synced periodically with the list of emojis that get created. This is best automated, so that when things change, code can be adapted easily.

As of this writing on Sep 27, 2020; there are 1816 code points. New ones are created and tracked here.

This article demonstrates the solution by applying ETL (extract-transform-load) design pattern to generate partial Java code from HTML page!

1. Analysis

Understand document structure

The unicode.org’s full listing of emoji page is rendered as shown below. I’m illustrating the elements required to create a predictable connection between human readable name and emoji unicode point.

The page consists of HTML table. The table is split into multiple sections (with table header elements). The table rows represent the relevant unicode point information, and the columns contain specific values of interest.

Model a class to store row entries

Let’s create a Java POJO UnicodePointEntry to extract the web page content into a structured format. This class provides a method to convert unicode surrogate pairs into a visually representable emoji with toEmoji().

2. Extraction

Let’s create HTML downloader to process unicode page listing, to convert the codes and human readable names into a valid UnicodePointEntry.

Note: This EmojiUnicodePointAndValueMaker class uses JSoup library to process HTML content

3. Loading

Create a placeholder Emoji enum class, that will help us represent the extracted unicode values as an enum value.

Note: An emoji may have a sequence of code points.

4. Bringing the extract-load-process together!

Now that the HTML page extraction code is ready, it can be put to use in a test class

Note: This tests are based on JUnit

5. Code

Code is available in GitHubTo print emojis onto display, run
mvn -DenumCompatibleSyntax=false
To generate enum values onto display, run
mvn -DenumCompatibleSyntax=true

6. References

How unicode works?List of emoji chartsSurrogate Pair Calculator

7. Loading generated values into Emoji class

Here is a random list of enum values generated, values truncated for brevity.

grinning_face(0x1F600),
grinning_face_with_big_eyes(0x1F603),
grinning_face_with_smiling_eyes(0x1F604),
beaming_face_with_smiling_eyes(0x1F601),
grinning_squinting_face(0x1F606),
grinning_face_with_sweat(0x1F605),
face_in_clouds(0x1F636, 0x200D, 0x1F32B, 0xFE0F),
face_with_spiral_eyes(0x1F635, 0x200D, 0x1F4AB),
flag_england(0x1F3F4, 0xE0067, 0xE0062, 0xE0065, 0xE006E, 0xE0067, 0xE007F),
flag_scotland(0x1F3F4, 0xE0067, 0xE0062, 0xE0073, 0xE0063, 0xE0074, 0xE007F),
flag_wales(0x1F3F4, 0xE0067, 0xE0062, 0xE0077, 0xE006C, 0xE0073, 0xE007F)

You can copy and paste the enum value constructs into Emoji class that was created earlier.

7. Full listing of Emojis

--

--