How to interpret reading level scores

Fleisch Kincaid and other reading level metrics are sometimes employed to compare the arguments made by politicians in their speeches, interviews, and writings. What are these metrics and what do they actually tell us about these verbal performances?

Fleisch Kincaid examines sentence, word length, and syllable number. Texts are considered “harder” when they have longer sentences and use words with more letters, and “easier” when they have shorter sentences and use words with fewer letters. For decades, Fleisch Kincaid and other reading level metrics have been used in word processors. When you are advised by a grammar checker that the reading level of your article is too high, it’s likely that this warning is based on word and sentence length.

Other reading level indicators, like Lexiles, use the commonness of words as an indicator. Texts are considered to be easier when the words they contain are more common, and more difficult when the words they contain are less common.

Because reading-level metrics are embedded in most grammar checkers, writers are continuously being encouraged to write shorter sentences with fewer, more common words. Writers for news media, advertisers, and politicians, all of whom care deeply about market share, work hard to create texts that meet specific “grade level” requirements. And if we are to judge by analyses of recent political speeches, this has considerably “dumbed down” political messages.

Weaknesses of reading level indicators

Reading level indicators look only at easy-to-measure things like length and frequency. But length and frequency are proxies for what they purport to measure — how easy it is to understand the meaning intended by the author.

Let’s start with word length. Words of the same length or number of syllables can have meanings that are more or less difficult to understand. The word, information has 4 syllables and 12 letters. The word, validity has 4 syllables and 8 letters. Which concept, information or validity, do you think is easier to understand? (Hint, one concept can’t be understood without a pretty rich understanding of the other.)

How about sentence length? These two sentences express the same meaning. “He was on fire.” “He was so angry that he felt as hot as a fire inside.” In this case, the short sentence is more difficult because it requires the reader to understand that it should be read within a context presented in an earlier sentence — “She really knew how to push his buttons.”

Finally, what about commonness? Well, there are many words that are less common but no more difficult to understand than other words. Take “giant” and “enormous.” The word, enormous doesn’t necessarily add meaning, it’s just used less often. It’s not harder, just less popular. And some relatively common words are more difficult to understand than less common words. For example, evolution is a common word with a complex meaning that’s quite difficult to understand, and onerous is an uncommon word that’s relatively easy to understand.

I’m not arguing that reducing sentence and word length and using more common words don’t make prose easier to understand, but metrics that use these proxies don’t actually measure understandability — or at least they don’t do it very well.

How reading level indicators relate to complexity level

When my colleagues and I analyze the complexity level of a text, we’re asking ourselves, “At what level does this person understand these concepts?” We’re looking for meaning, not word length or popularity. Level of complexity directly represents level of understanding.

Reading level indicators do correlate with complexity level. Correlations are generally within the range of .40 to .60, depending on the sample and reading level indicator. These are strong enough correlations to suggest that 16% to 36% of what reading-level indicators measure is the same thing we measure. In other words, they are weak measures of meaning.[1] They are stronger measures of factors that impact readability, but are not related directly to meaning — sentence and word length and/or commonness.

Here’s an example of how all of this plays out in the real world: The New York Times is said to have a grade 7 Fleisch Kincaid reading level, on average. But complexity analyses of their articles yield scores of 1100–1145. In other words, these articles express meanings that we don’t see in assessment responses until college and beyond. This would explain why the New York Times audience tends to be college educated.

We would say that by reducing sentence and word length, New York Times writers avoid making complex ideas harder to understand.

Summing up

Reading level indicators are flawed measures of understanding. They are also dinosaurs. When these tools were developed, we couldn’t do any better. But advances in technology, research methods, and the science of learning have taken us beyond proxies for understanding to direct measures of understanding. The next challenge is figuring out how to ensure that these new tools are used responsibly — for the good of all.


Benchmarks for complexity scores

  • Thinking complexity required to pass the Montreal Cognitive Assessment — early level 10 (1000–1025).
  • Most high school graduates perform somewhere in the middle of level 10.
  • The average complexity score of American adults is in the upper end of level 10, somewhere in the range of 1050–1080.
  • The average complexity score for senior leaders in large corporations or government institutions is in the upper end of level 11, in the range of 1150–1180.
  • The average complexity score (reported in our National Leaders Study) for the three U. S. presidents that preceded President Trump was 1137.
  • The average complexity score (reported in our National Leaders Study) for President Trump was 1053.
  • The difference between 1053 and 1137 generally represents a decade or more of sustained learning. (If you’re a new reader and don’t yet know what a complexity level is, check out the National Leaders Series introductory article.)