Understanding Composition Browser Events

What’s an IME, and why do I care?

Our Square sellers need a consolidated view of their whole business — that’s where Dashboard comes in. Dashboard is a large frontend Ember application that enables sellers to run their business effectively through analytics, reporting, and various services. We’re constantly adding new features to Dashboard to help our sellers do even more with the Square platform.

Recently, my team has been working on one particular feature for searching across various products in Dashboard. As the user types in a search box, a brief list of search suggestions shows underneath the input, and if they hit the enter key, a new tab opens that shows all results. Seems simple enough, right?

Unfortunately, when I began testing this search box in languages other than English, I noticed some pretty undesirable behavior: as I was carefully crafting my search query, suddenly a new tab would open out of nowhere! I decided to investigate, and I ended up learning some really interesting details about browsers and text composition in other languages. But first, let’s back up and talk about what an IME is.

An input method editor, or IME, is an operating system-level program that converts characters from one language or character set to another language or character set. For example, let’s say that I wanted to type the character 桜 (Japanese for “cherry blossom”). I don’t have that character on my keyboard, but if I enable an IME on my OS, I can set my input source as Japanese and type the Romanization of the phrase: “sakura”. Let’s see what happens!

As I begin typing, an underline appears under the text to show that the IME is active. The English characters get automatically converted into various Japanese character sets as my OS tries to smartly decide what the best representation would be. (Many languages like Japanese have hundreds of homophones, and thus for one pronunciation, there many be many viable conversions.) At any point in this process, I can hit the spacebar or up/down arrow keys on my keyboard to open a dialog box which lets me pick which set of characters matches the meaning I’m trying to convey.

From this point, hitting enter will lightly confirm my selection, letting me use the left and right arrow keys to move to another section of my sentence to convert next. For my simple example, I don’t have any other sections to convert, so I can hit enter a second time to commit my decision. Now, the character 桜 is placed in the input, and the IME disappears. 桜 acts just like any typical text, so it can be deleted, copied/pasted, and so forth. Character conversion via an IME is one of the primary ways that users from various countries are able to type in their native languages with ease.

Okay, great! Now we know how an IME works. So what was the bug?

Notice how there was a lot of keyboard input happening for interacting with the IME — we can use the arrow keys, spacebar, enter, and more. At first, I naively believed that, since the IME is an OS program, any typing or editing would exist outside the browser, and thus the browser would be oblivious to the fact that I’m using an IME. But this isn’t the case! Each keyup/keydown in an IME is responded to both by the OS and the browser! So, when the user would hit the enter key to confirm one of their IME selections, our code would immediately process a keydown event, opening a new tab even though the user may not have been done with their search query. Not good! With the code in this state, we would be making it very cumbersome for many Square sellers to use our search box in their native language.

(Important note: This bug isn’t limited to just the enter key! If you have an input listening in on any events that coincide with how an IME is controlled, you may have a similar bug in your code.)

So how can we fix it? It took me a bit of research, but I eventually learned about two browser events that I’d never encountered before: compositionStart and compositionEnd. When a user starts typing with an IME, modern browsers fire the compositionStart event, and when the text is finally confirmed, compositionEnd will fire (with one exception — see below). This is exactly what we need! Now we can set some state in the application about whether or not the user is currently composing some text via an IME, and if they are, we won’t use any of our own keyboard event logic. The code looks like this:

Hooray! This works perfectly in the browsers we support here at Square, with one exception. The issue occurs when the user wants to finish their composition; in this state, isComposing is true. First, let’s take a look at the chain of events in the happy case.

  • User presses the enter key to end the composition.
  • Browsers fire keyDown, running our code in keyDown(). isComposing is true, so we exit early.
  • Browsers fire compositionEnd, running our code in handleCompositionEnd(). isComposing gets set to false.
  • User hits enter again.
  • Browsers fire keyDown once more, but since isComposing is now false, we run our custom code.

This is what the user should expect; only after finishing the composition should subsequent enter presses open their search query in a new tab. However, in my testing, Safari appears to fire the keyDown and compositionEnd events in the opposite order. Here’s the logic path:

  • User presses the enter key to end the composition.
  • Safari fires compositionEnd, running our code in handleCompositionEnd(). isComposing gets set to false.
  • Safari fires keyDown, running our code in keyDown(). isComposing is false, so we run our custom code.

In other words, the enter press that finished their composition double-counted as one to open their query in a new tab, which is a poor experience. To get around this, I wrapped the line in handleCompositionEnd() in an Ember queue, so it would always run after the keyDown event:

At long last, our input successfully handles text input from multiple sources! This was a really cool bug that took me quite a while to track down, but once I understood it, there was a really elegant solution. Some personal takeaways:

  • It’s critical to think about users who are very different from you. This applies to both internationalization (different input sources, content shifting sizes when translated, right-to-left languages) and accessibility (screen readers for blind users, keyboard-only access). If you only test the way you would use your software, you’re probably missing a significant portion of your user base!
  • If you end up writing a bunch of really complicated logic that feels hacky, it might be best to take a step back and reevaluate your options. When I started this, I didn’t know about the composition events and tried some really ridiculous things to detect if a user was typing via an IME. That pain forced me to do more research, and I eventually landed on a clean solution.
  • There are some hidden gems in the MDN docs. 😉 Also, this site is invaluable for figuring out cross-browser oddities.

Thanks for reading! If this was interesting to you, take a look at some job openings here at Square — we’re always looking for talented engineers to join our team!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.