We Keep Talking About AI Writing — We Should Be More Excited About AI Reading
The biggest impact of AI on the written word may be more about reading than writing
I recently read an article called, AI’s Impact on the Written Word is Vastly Overstated, by Bryan Hobart. A lot of it I really agree with — it basically argues that we’ve had too much content to deal with for a very long time, and AI won’t change that. The creation of blogs and the internet, for example, led to an explosion of content that long ago left behind our ability to sort through it all.
It inspired a thought, though. My first company was a data analytics company in publishing in 2014 — we were trying to teach computers how to read books in a way that was useful to how humans read books. Even back then, we used to talk about this problem. An analysis of Goodreads we did at the time found that virtually every book you or I have likely ever heard of comes from only the top 48,000 books that exist. There are millions and millions of books, but we simply don’t have enough reviews and user votes to meaningfully tell each other about most of them.
But while that’s a problem for someone, it’s not really a problem for the average reader. Unless you are unhappy with the books you’ve already read (the Harry Potters, the Jane Eyres, the Lord of the Rings, or Wheel of Times…), then the recommendation engine of our universe did a reasonably good job.
What it fails is that one really great book that you would have loved more than all the rest, but never discovered. It’s possible that the best book for me is in the 48k books we know about… but it’s more likely that it’s really out there somewhere in the ocean of millions of books I’ll never hear of. I’ll never know, but I’ll likely never really suffer from not knowing.
The author of the book suffers, though. Current recommendation engines may be doing a reasonably good job for the average reader, even if it could be better.
It’s a disaster for the content creator, though.
Let’s Talk More About AI Reading
One of the lines from the article that triggered this post was that humans “can only consciously process 10 bits of information per second, which amounts to ~2GB in an average lifetime.”
That’s an amazing statistic. The issue we have with too much content coming onto the market, from a discovery standpoint, is that we’ve been horrible at keeping up with tools that can sort through the content as fast as it can be produced. Social recommendation engines have struggled valiantly with this — we aggregate a lot of user generated content and we’ve been able to successfully tag 48k books with enough info to be useful for discovery. That’s way more than we could have done before modern tools, so that’s a win.
But AI’s ability to read and apply the preferences of an individual, as if they are mirroring the tastes of that individual, is equally monumental to book readers as an AI that can write.
Imagine I have an AI model trained on my tastes and preferences in reading. A quick Google search finds that the average LLM on a moderately powerful computer can ingest content at 188,000 times the 10 bits per second of the human brain.
In other words, the 2GB of data a human can consume in their lifetime would take an LLM about 5.2 seconds to read for you.
And then make a recommendation.
Creation of large volumes of content has been a thing for a long time. The ability to create metadata with computers has been a thing for a long time, as well — at least since we started doing it with BookLamp in 2007, and others before.
What is new is that maybe now people will actually start listening to it. We’ve never had a tool that people would use at scale that could keep up with the volume of content creation. We’ve tried. We’ve largely failed.
Every time I read an article about the pros and cons of AI in publishing, it tends to focus on the volume or quality of content creation, but I think we should be equally aware — and equally excited — about the reading side of it. After all, content creation only creates an issue if we don’t have the tools to match on the other side.
The rise of computer-based content analysis that people will actually listen to, metadata that can be generated about each book specific to the tastes of the individual user… that’s really interesting, and (ironically) about as unprecedented now as it was when we founded BookLamp in 2007.
Maybe there should be a public tool for metadata production and extraction targeted to AIs
A final thought. In 2014, we were trying very hard to convince the publishing industry that using computers to understand content in a form that was useful for humans was important. We made substantial progress on this before the company its “Book Genome Project” was acquired by Apple, and the technology left the public view. The need for such tools still exist.
Technology tools that are author, publisher, and reader friendly (instead of specifically retail friendly) have always been hard to find. Or at least, they tended to be deployed by people outside the publishing industry, into it.
With all the open source LLM projects out there, maybe there should be one for the production of “non-human but human-readable” metadata for recommendations. It could be as simple as an analysis of a book’s key attributes — language, writing style, thematic tags, etc — packaged as a profile that could be downloaded and distributed by the author or publisher. In other words, a public system that an author or publisher could trust, and apply to their book online or locally, extract a set of standardized metadata to contribute back to any system that supports it. This would allow large and small authors to easily participate in the upside of AI discovery without full on turning their book over to be read by an AI outside of their control.
A tool designed to do one thing, and one thing only — reliably generate deep metadata that could later be personalized to the tastes of each individual reader, and distributed to AI systems for recommendation without exposing the author to loss of control.
Worth thinking about.
Final, Unrelated Note on Book Recommendation Engines (you can stop here, technically the article ended already):
I made a comment earlier in the article about how recommendation engines are reasonably successful for the average reader. I can’t help but add a very important caveat to that. If you are already an avid reader, then the current recommendation engines are doing a reasonably good job, yes — it’s finding you things that you are enjoying reading, and it’s a question of whether or not there are things that would be even better.
BUT. Books are unique in our entertainment consumption in that they are both exclusive and sticky. What I mean by this is that it’s really hard to do anything other than read, while you are reading. You can’t do emails. You can’t watch TV, or hold a conversation. Most other types of media are non-exclusive. You can watch a movie and still do emails. You can listen to music and still surf on your phone. Audiobooks are as close as we have to a non-exclusive book — which is why you’re listening to them on the commute instead of bouncing off cars with your nose in a novel.
To read a book, you must decide to only read a book, and so the opportunity cost of other things you can’t do is high. I will often put on a movie and reply to emails on a lazy day — I’ll never reply to emails and read a book at the same time.
Books are also sticky. What I mean by this is that we feel a commitment to books that is probably unhealthy, and that we don’t feel to movies, songs, or most human relationships (kidding). If I don’t like a movie, and end up turning it off half way through… I will just start another movie the next time I want to watch something. If I don’t like episode 1 of a TV show, I’ll switch TV shows.
But somehow with books we feel like we have to finish them.
Years ago I recommended Ender’s Game to a friend of mine who was not really much of a reader. He asked me for a recommendation to try to get into reading, and heck, I loved Ender’s Game since I was young... so, why not?
He’s still reading it. It’s probably been 10 years. If he were to decide to read a book right now, I’m pretty sure he’d go back and try to make progress on Ender’s Game. If you ask him what he’s reading, he’ll tell you he’s reading Ender’s Game. Clearly it was not a good recommendation for him, because he’s been stuck in it for years. But he also won’t give up on it and go find another book with a better chance to hold him. Ender’s Game was his shot, and it was a miss (even though he knows I use him in this exact example regularly, he’s still not given up… at this point I think he’s just mocking me).
The point is that book recommendations, unlike movie, music, or TV recommendations, are like a tar pit. They’re high investment, and they kill off casual readers if it doesn’t catch on fire. Which I guess would then be a burning tar pit. So yes, current readers are generally being served by book recommendation engines, but the industry at large… there is so much more to do.
The consequences of poor book recommendations for our industry are not readers being lukewarm on the books they try, it’s the large segment of people who just never read at all.