Data Mining Typeracer (Part 3)

8 min readMay 18, 2020

This is (likely) the final installment in my series on Typeracer. If this is your first time here, feel free to check out Part 1 and Part 2 to get up to speed.

My goal with this series of articles was to demonstrate the challenges and rewards associated with each stage of data analysis. In Part 1, I talked about discovering a promising dataset, and I described the complex process of parsing and collecting that data. In Part 2, I explained how I stored the data, and I walked through the analysis and visualizations I generated during my exploration. Here were some of my key takeaways:

Most typists improve primarily by making faster key transitions; going for increased accuracy appears to have marginal returns.
Characters transitions that include the Shift key are substantially slower than transitions that do not.
The slowest character transitions are almost always made by digits on the same hand (and the usually involve the same digit). The candidates that don’t use the same digit are usually incredibly awkward movements that span multiple rows (e.g. c → t or e → x on a QWERTY layout).

In this installment, I will use the latency data from Typeracer to model the efficiency of different keyboard layouts. If a user has the same level familiarity with both QWERTY and Dvorak, would he or she achieve a faster typing speed using Dvorak?

Recall the heat maps generated from character frequencies in relation to both QWERTY and Dvorak.

Figure 1: QWERTY heat map (left), Dvorak heat map (right)

The positioning of frequently occurring characters on the home row in the Dvorak layout suggests a more ergonomic and efficient typing experience (at least for English text). I will attempt to quantitatively analyze these differences.

Model

My idea for modeling typing speeds of a QWERTY user on Dvorak was relatively simple. For a specific user, given that we have the average latencies between keystrokes and metadata related to the physical location of those characters on a QWERTY layout (row, digit, and column information), in theory, we can just remap those results to Dvorak (or any other layout).

A problem arises due to major differences in frequency between the character transitions corresponding to the same physical key positions on each layout. For example, on QWERTY, k → j is a very uncommon key transition to make (unless you’re a Vim user). To prevent confusion, I’ll refer to this transition as Q[k → j]. Q[k → j] corresponds to D[t → h] (Dvorak notation), which happens one of the most commonly occurring bigrams in the English language.

In the below diagrams, notice how the character bigrams highlighted on the Dvorak layouts occur much more frequently in the English language than those highlighted on QWERTY. Transitions are represented by blue → orange:

Figure 2: ; → s on QWERTY and s → o on Dvorak

Figure 3: k → j on QWERTY and t → h on Dvorak

Figure 4: , → d on QWERTY and w → e on Dvorak

This becomes problematic because we might not have enough data for certain key transitions from QWERTY to make an informed corresponding latency estimate in Dvorak.

To deal with this potential sparsity in the dataset, I made a couple assumptions about the data based on my perception of how most people type. It goes without saying that all of these assumptions should be carefully considered when interpreting the results of the analysis.

Recall that I previously set up indices for the keyboard rows:

I also added in column information in order to precisely identify the location of each key on the keyboard:

Using this information, each physical key can be given an equivalent representation by a tuple [row, column], giving us the ability to directly compare different keyboard layouts.

Features

There are a lot of different ways to group the data, so the notation used below will vary quite a bit. For each QWERTY character transition, I assigned it a score that was based the following features:

The mean latency observed from exact transitions m and the number of observed transitions c.
The mean latency observed from the reversed transition m_rev and the number of those transitions c_rev. For example, the transition Q[i → t] (or [1, 8] → [1, 5] in [row, column] notation) would remap to Q[t →i] or equivalently [1, 5] → [1, 8]. Due to symmetry, it’s reasonable to expect that the reverse latency should be similar to the forward latency.
The mean latency observed from “neighboring” transitions m_n and the number of those transitions c_n. For a transition from [r1, c1] → [r2, c2], I also include data from [r1, c1 + 1] → [r2, c2], [r1, c1 - 1] → [r2, c2], [r1, c1] → [r2, c2 + 1], and [r1, c1] → [r2, c2 - 1]

Figure 6: For the transition Q[d → o] ([0, 3] → [1, 9]), these are the transitions included in the “neighbor group”

The mean latency observed from transitions with the same “shape” m_sh and the corresponding counts c_sh. For a transition [h_prev (hand), r_prev, c_prev] → [h, r, c], I include all other transitions [h_prev, r1, c1] → [h, r2, c2] where r2 - r1 = r - r_prev, and c2 - c1 = c - c_prev. Here is a diagram to illustrate this feature:

Figure 7: Each pair of keys with the same color would have the same “shape”

I use the following equation for scoring each transition (each f corresponds to a feature defined above):

All this is really doing is associating more data to each transition by making a couple assumptions about typing habits. The goal of this is to mitigate the previously discussed problem of sparsity.

We can now write a fairly straightforward algorithm to compare the different keyboard layouts.

Algorithm

Given data from source layout S,

Calculate m_f and c_f for each transition S[ch_prev → ch] and each feature f.
Join these aggregated values by the row and column indices to individual character transitions in S, and compute the score for each transition. The final score is equal to:

Repeat Step 2 for transitions in the target layout T and compare the scores between S and T.

Results

For all of the following results, I only looked at non-shifted character transitions. I also ignored all transitions that included the Space key.

score_qw and score_dv are the QWERTY and Dvorak scores respectively (computed by the above algorithm). mean_qw and mean_dv are the expected average latencies of both formats (see if you can guess which user I am). actual is the unadjusted total latency recorded for each user. Lastly, ratio_dv is score_dv / score_qw.

All of the QWERTY users were able to achieve a lower score when remapped to Dvorak. If these scores perfectly corresponded to latency, this would result in an average latency reduction of 3.4%.

I then repeated this analysis for Dvorak users as I wanted to confirm that this algorithm showed similar results going from Dvorak → QWERTY.

The schema is basically the same as above.

In this case it appears that the Dvorak users would be worse off or experience little to no benefit from switching over to the QWERTY layout. Their projected scores increased by 1.5% on average.

The last thing I wanted to look at before concluding was the estimated performance of the Colemak layout.

This layout has a lot of similarities to QWERTY, but like Dvorak, many of the most commonly used keys are found on the home row. I repeated the above analysis with this new layout and got the following results.

Figure 11: Results for QWERTY Users (including Colemak)

Figure 12: Results for Dvorak Users (including Colemak)

For QWERTY users, the improvement from switching over to Colemak was 1.2%. We also observed that the Dvorak score was lower than the Colemak score for every user.

For Dvorak users, the average score went up by 1.2%, indicating that users would likely not benefit from switching.

It’s difficult to reach definitive conclusions from this analysis, but the data points to the Dvorak layout being more efficient than QWERTY from a transition latency perspective. This is likely due to the positioning of the vowels and most frequently used consonants on the home row. A similar argument can be made in favor of Colemak.

An important detail to consider is that these results don’t account for the underlying bigram frequencies found in the corpus (Typeracer in our case). Factors like muscle memory are extremely difficult to include in the model. Common bigrams in the corpus will likely have faster average latencies in the source layout because users type those transitions more frequently, and as a result, those users gain familiarity. When remapping layouts, this bias is not captured, so the frequency weighted average

will almost always favor the source layout over the target. The previously mentioned Q[k → j] vs. D[t → h] scenario is a prime example of how this might arise.

My guess is that as a user become more accustomed to a particular layout, common bigrams will see faster latencies. D[t → h] ([0, 8] → [0, 7]) will be in the same ballpark if not faster than Q[t → h] ([1, 5] → [0, 6]).

Conclusion

I’ve really enjoyed working on this series of posts. If I were to revisit this topic in the future, I would likely try to design a more “optimal” keyboard layout. This is a challenging problem that will require lots of planning and experimentation. I would also need to develop a more sophisticated toolkit for generating custom keyboard images and visualizations.

You can find all the code (scrapers, database schema, and analytics) at https://github.com/jarry-xiao/typescraper.