In the past weeks at work, we faced a problem in our application. A user tried to express himself with an emoji in the description field. We were not expecting it then you can imagine what happened. 💥 in production 😱.
We received an alert in the team’s chat.
Going through the error details, we can see that MySQL raised an error.
And looking at the request, we can see the emoji in the body attribute:
Why did this happen? Is not Emoji a unicode character supported by the UTF8?
Yes, it is. But some of them uses 4-bytes to store their data, and if we look at the UTF8 charset support at MySQL’s oficial doc, we can see that it can only accept 3-bytes.
- A maximum of three bytes per multibyte character.
It’s a different approach from Postgres’ UTF8. In Postgres charset table we see that UTF8 can store up to 4-bytes, meaning that Postgres already accepts Emojis by default.
How to store Emojis on MySQL database and avoid the Incorrect String value error?
I created a simple scaffold application for demonstration. Let’s use the model
Comment with 2 properties (body and name).
If we try to save the comment with an emoji on the body, it will raise an exception.
Let’s generate a migration to convert this table and the columns to utf8mb4
$ bin/rails g migration change_comments_to_utf8mb4
and add the following content to the migration:
lass ChangeCommentsToUtf8mb4 < ActiveRecord::Migration
execute "ALTER TABLE comments CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin"
execute "ALTER TABLE comments MODIFY name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin" execute "ALTER TABLE comments MODIFY body TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin"
execute "ALTER TABLE comments CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
execute "ALTER TABLE comments MODIFY name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin"
execute "ALTER TABLE comments MODIFY body TEXT CHARACTER SET utf8 COLLATE utf8_bin"
See that we set the column
VARCHAR(191). This is because MySQL's max key length for index on InnoDB engine is 767 bytes. With
utf8 (3-bytes), we can store VARCHAR with a maximum of 255 chars (255 chars * 3 bytes = 765 bytes), but with
utf8mb4 we can store the maximum of 191 chars (191 chars * 4 bytes = 764 bytes). If you want to store more bytes on the index, please look at InnoDB large prefix.
Run the migration:
$ bin/rake db:migrate
We need to change the
database.yml to set the encoding to utf8mb4. So, open the
config/database.yml and change the line with
Restart the server and now we can save emoji in our comment 😎.
If you are creating a new project, I highly recommend to start with
utf8mb4 to avoid these issues in the future and eliminate the necessity of a migration for all tables or simply use Postgres instead ♥️
Originally published at carlosribeiro.me on March 7, 2016.