MySQL storage of UTF-8 characters

While reviewing a recent pull request, I realized I didn’t fully understand how MySQL stores UTF-8 characters. It is especially confusing due to inconsistencies with the family of text column types (TINYTEXT, TEXT…) that we use for aggregate storage (such as JSON blobs) versus typical string (VARCHAR) fields for most of our Rails models.

After some extensive research, below is an overview of how UTF-8 is stored in MySQL across the various data types.

UTF-8 Overview

Later, Unicode was extended beyond 16 bits (code point U+10000 and beyond) to make room for some ancient languages. UTF-8 was extended to 4 bytes max. This is when MySQL added support for UTF-8, but to optimize storage they only supported the 3-byte form. Later, more emoji were added to Unicode beyond the ancient languages, and MySQL added a new character enocding utf8mb4 to support these.

In MySQL 5.x, sticking with the 3-byte form improved performance, but with the downside of limited emoji support. MySQL 8.0 apparently has major speed improvements for utf8mb4 and actually deprecates utf8(mb3).

MySQL Types

CHAR(n), VARCHAR(n)

TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT

Hobo Fields Migrations for Text

MYSQL_BYTES_PER_UTF8_CHARACTER = 3MYSQL_TINY_TEXT_UTF8_LIMIT   = 0x0000_00FF / MYSQL_BYTES_PER_UTF8_CHARACTER #            85 characters
MYSQL_TEXT_UTF8_LIMIT        = 0x0000_FFFF / MYSQL_BYTES_PER_UTF8_CHARACTER #        21,845 characters
MYSQL_MEDIUM_TEXT_UTF8_LIMIT = 0x00FF_FFFF / MYSQL_BYTES_PER_UTF8_CHARACTER #     5,592,405 characters
MYSQL_LONG_TEXT_UTF8_LIMIT   = 0xFFFF_FFFF / MYSQL_BYTES_PER_UTF8_CHARACTER # 1,431,655,765 charactersFootnote: MySQL vs Ruby MethodsNote that Ruby and MySQL took opposite approaches to mapping the concept of "length" to characters vs bytes.  See length vs. LENGTH in the table below:RubyMySQLcharacter.size
.lengthCHAR_LENGTH()byte.bytesizeLENGTH()

Invoca Engineering Blog

Invoca is a SaaS company helping marketers optimize for the most important step in the customer journey: the phone call.

Engineering Team

Written by

Invoca Engineering Blog

Invoca is a SaaS company helping marketers optimize for the most important step in the customer journey: the phone call.