When Nothing Ever Goes Out of Print: Maintaining Backlist Ebooks

A talk given at the ebookcraft conference, March 30, 2016

A lot of what we’ve talked about today, and a lot of what we talk about at digital publishing conferences generally, assumes that all we do is make brand new ebooks, usually to go along with brand new print books. And so we have sessions on how to handle fonts, on how to enhance our ebooks with JavaScript interactivity, and on what tools and workflows to use, if you’re starting fresh from the very beginning of that process to produce the most accurate, semantic, accessible, high-quality ebooks.

And when I started my current job, as a digital managing editor at Houghton Mifflin Harcourt, four years ago, that was primarily what I thought I would be doing — working with my group to make those new ebooks. And in the present, I think that’s what most people at my company still perceive us as doing.

What one year of our work looks like

But when I started actually counting the titles my group was working on, it turned out that working on these brand new ebooks was just a small part of what we were doing.

In 2015, my group sent 1400 ebook files for distribution. About 300 of those were new frontlist ebooks — almost all of them ebooks that went along with brand new print books (maybe a few digital-only projects).

About 150 more were titles that were appearing in ebook for the first time — new ebooks of old print books that for some reason didn’t have an ebook yet. This is us filling in the backlist.

The remaining 950 titles that we worked on in 2015 were updates or redos of ebooks that were already on sale.

So only about 20% of the files we worked on were the brand new ebook files that people think of as the majority of our work. And almost 70% were reworks of ebooks that already existed, ebooks that were already on sale. And so why these old ebooks keep coming back to us, why two-thirds or three-quarters of the work we do is on these old ebooks, is what I want to talk about today.

Our entire ebook list

That was one year of our work. And now here’s the total scope of our responsibility: all of the ebooks Houghton Mifflin Harcourt has in print or on sale. I think of our ebooks being something like this …

Iceberg image: Uwe Kils, Wiska Bodo CC BY-SA 3.0

Where we have the perhaps 300 or 400 new ebooks we’ll be making in a given year up above the surface of the water — that’s the part of our work that everyone sees and is aware of. That’s where all our colleagues, the editors and designers and the publicity and marketing team, are hanging out.

And then, down below the surface of the water, we have approximately 5000 ebooks — and we’ll come back to why that’s approximate — that are all the ebooks we’ve already made, and that we now have to take care of. And, like an iceberg, it’s this underwater part that fills those of us who are aware of it with terror — or at least significant professional concern.

And if I can diverge a little from actual facts about icebergs, let’s also mention that this underwater portion is growing every year. The 300 new ebooks we made in 2015 have now become part of this giant underwater mass that we need to maintain, and above water, we move on to the 300 new ebooks we’ll make in 2016.

Aren’t print books in the same situation?

This is not entirely different from the situation with print books, where there’s a large backlist supporting a smaller frontlist. And there are plenty of people at my company who work on the print backlist, making sure our print books are reprinted as needed, and that they stay in stock at the retailers and come back to the public’s attention when relevant.

But because this is a talk for a digital conference, what I want to talk about is the ways in which this is totally different from print books. And I’ll point out reasons this is true as we go, but here are a few major differences to start with.

1. Many of the ebooks we have already made are not that good

I think many publishers are in this boat, of having converted a lot of their backlist books to ebooks in an awful big hurry when it started looking like ebooks were going to be A Thing. So all of a sudden you have hundreds or thousands of backlist ebooks — made to old standards, with poor-quality OCR, in outdated formats, never proofread, with low-res art, built by processes incompatible with your current ebook workflow. And perhaps ironically, your most important backlist titles are the worst, because they were the first ones that somebody thought to convert.

So this is very unlike print books, where, I hope, anyway, you’re not looking at books you designed and typeset in 2010 and saying “oh, gosh, that’s so embarrassing; we just didn’t know anything about how to make books back then.” But that’s how we feel about a large share of our ebooks.

And so one of my group’s priorities has been to triage this 5000-title ebook backlist and to think about how we are going to fix the worst of them. What criteria do we use for determining which to spend time improving and which to stop selling altogether? How do we balance the time we spend on maintaining the backlist with getting the frontlist out and continuing to innovate in our processes and workflows? And because we’re not the only ones who have noticed this about our older ebooks: How do we respond to customer and retailer complaints about backlist titles?

2. Ebooks last forever

Another major difference is that we expect print books to go out of print.

When we make ebooks, by contrast, there seems to be this idea that we are making them to last forever — that ideally, every ebook we make will stay on sale for eternity. We hope our ebooks will become part of a perpetual money-making long tail for our company, but I think we also feel that by digitizing these books, we’re adding them to our culture’s eternal repository of all books ever published.

I love the idea that some of the work we do is about making sure that our books will always be available to anyone who’s ever interested in reading them. But that’s a lot of responsibility. And it’s not something I knew I was getting into when I took this job. And I’m not sure it’s a business that our companies think very much about being in.

Not all books are meant to stay in print forever.

With print books, on the other hand, I feel like there’s more of an understanding that not all books are meant to last forever. Books like Let’s Go: Europe 1999 or Kardashian Konfidential might be ones we could cease to publish when they cease to make money for us or when there’s more cost to keeping them up to date than we want to invest.

And because there are costs to printing physical books, and to storing them in a warehouse, and to shipping them to where they’re needed, there’s a much higher bar for when it makes sense to keep them in print. So the situation we find ourselves in is that many of our backlist ebooks no longer have a print equivalent — the print book is out of print or print on demand. And that means there’s no longer a physical reminder of the book in our office — no copies or cover proofs or reprint agendas — to keep that book in our colleagues’ consciousnesses.

3. Ebooks always look brand-new

All books have a life cycle — usually, first, the book is published, and for a while it’s new and interesting and relevant, and then it passes into being less interesting, less relevant, and less correct. And some books may move into other stages, where they become classics, or they become vintage kitsch, or interesting as historical records or the subject of textual study.

Where a book is in this cycle becomes relevant to us as we think about what kinds of changes we might make, what kinds of choices we make as we continue to maintain the ebook. If we’re trying to keep a book in the “relevant” bucket, we might correct mistakes in the ebook, update figures, add a new afterword, and the like. If we think a book is destined for “textual studies,” we might focus on making sure the ebook never deviates from exactly what appeared in the first printing.

But a big difference with print books is that as they go through this life cycle, they pick up markers that show their age. We can judge whether a print book is likely to be relevant or up-to-date or to contain outmoded language or incorrect facts, through its physical package, its cover design, its typography, and the wear on the actual physical book. There are a million clues that you don’t even consciously note that set your expectations for what you’re going to find inside that book.

Print books show their age. Image: www.thejoykitchen.com

Ebooks, on the other hand, don’t have the physical markers to show you the difference. Every ebook looks like a brand-new ebook. You’re not necessarily confronted with age-appropriate typographical and design cues; there’s no wear on the ebook, no yellowing, no old book smell. And you’re reading the book on an ereader, or an iPad, or a phone that is, at the outside, maybe two years old? So nothing about the interface or the reading experience suggests that the content you’re consuming might be five, or ten, or twenty, or even fifty years old.

(This is a bit of an exaggeration. I’m aware that it’s possible to put covers on ebooks.)

So what I think we lose with ebooks is the chance to set expectations for what they’ll contain. In the same way that we expect our iDevices to “just work,” we expect our ebooks to just be perfectly up-to-date, accurate, timely, and customized to our needs at the moment we’re reading them.

Maintaining the backlist

So if you’re with me that old ebooks are a bit of a different beast, we can talk about how you’re going to maintain your ebook backlist.

In his ebookcraft talk, Sanders Kleinfeld asked whether those of us who work on ebooks consider ourselves primarily coders. And I don’t dispute his point that ebooks are made of code, or that the work we do when we’re making them is software development. But my group’s work doesn’t end when we make our ebooks — we also have to attend to all the tasks necessary to keep them up to date and on sale. So we’re not just coders, though we work with code and we do technical work. We’re also editors; we work with content and we do editorial work. And that’s a really interesting place to be — at that intersection of of technical and editorial work, looking at what happens at that place where code meets books.

When we talk about maintaining these backlist ebooks, what are we talking about? I’m going to zip through a handful of the issues here, without suggesting that those are all of them, or that all of these apply to your particular backlist, just to map out the range of possibilities.


One of the ways that ebooks are different from print books is that they get technically out of date. They start looking out of date faster. Compare what we thought looked like a reasonable ebook in 2010 versus 2015.

Life Mask (Emma Donoghue), 2010 and 2015 ebook editions.

My concept of backlist ebooks is that they’re all basically ticking time bombs: as soon as you make an ebook, the clock is running toward the day when that ebook is going to look out of date and embarrassing. And that is just not the case with print books, that your designers feel that books they made five years ago are now unacceptable to have on the market.

Some of these changes are purely cosmetic — I know some of you are thinking, just swap in the new CSS, no big deal. But some of the changes pose new problems — you might be able to see that we moved from using straight quotation marks to curly quotation marks between 2010 and 2015. And though you can use your tools to automatically determine which way 98% of apostrophes and quotation marks should be turned, you still have the problem of figuring out how to find and correct the remaining 2%. Imagine British dialogue, with its dropped consonants (‘This ’ere ’ouse,’ ’e said); or foot and inch marks that need to stay straight; or even the quotation marks in your HTML class and attribute names.

We actually need to add new semantic information to do this update. And this is a trivial example, but the idea also applies to updates meant to add accessibility, or epub:type semantics, or more sophisticated design or enhancements. To retrofit old ebooks up to current standards requires human beings to think about the content of each book and what it means. This is not information that’s already in the ebook or available in digital form; there’s not an automated way to write good alt tags or know whether a sidebar is part of the <main> content or an <aside>.

Spain in Our Hearts (Adam Hochschild), 2015-style ebook index

Some of the need to update is us just getting better at what we do. In 2010, we thought inserting a picture of the print book index was a reasonable way to do ebook indexes. In 2015, we expected a text-based, fully linked, nicely designed, easily navigable, and actually functional index.

Sometimes we need to update ebooks to respond to technical specs updates — to make our epub2s into epub3s, for example — or because an e-reader software update unexpectedly breaks something.

Sometimes what the ereaders are able to do has changed. In 2010, there wasn’t a proper way to make a picture book ebook. (That didn’t stop us from trying.) In 2015, we have fixed-layout.

In any case, these are all reasons why you need to be consistently reviewing and updating your ebook backlist. And let’s take our hypothetical 5000 ebook backlist: how often does each of those titles need to come up for review and update? Based on the examples above, five years looks like too long. At the other extreme, sometimes my team will look at an ebook they made a year ago and say, this looks terrible, please let us redo this one.

So, maybe we say every three years? Can you reissue a third of your entire ebook backlist every year? For us, that would be 1600 backlist books getting reissued every year. Last year, we did 950. So we’re not there yet …


Keeping those 5000 backlist ebooks on sale legally takes up a lot of our time. This is not a concern that’s different than for print books, except that we have more of these old books as ebooks, and less likelihood that they’ll go out of print, and fewer editors and authors invested in doing this complicated work for the relatively small returns they’re seeing on ebooks.

I think there’s no place where the ticking time bomb of backlist is more obvious than in the case of rights and permissions for your ebook, the legal right to publish and sell your backlist.

To start there’s the question of whether you have the rights to make an ebook at all. This can be complicated for a book that was published before the invention of ebooks — your contract may not clearly specify whether you have the right to sell an ebook or how much you pay in royalties if you do. That’s been a huge effort for our contracts department, just to clarify whether it was legally permissible to make each of those first 5000 ebooks. And this is ongoing — I get a report every single day of books our contracts department has just determined are OK to convert to ebook.

And once you have the right to make an ebook, that right may comes with certain stipulations, like the illustrator gets to approve it, or you must disable text-to-speech, or you can make an ebook, but not one with audio or video enhancements. And that right doesn’t necessarily last forever — another set of emails I get every week are titles where the rights are reverting to the author, and if we have an ebook we need to remove it from sale.

So easy enough, 5000 ebook contracts to keep track of to keep those 5000 ebooks on sale.

But the bigger piece is the rights or permissions for all the separate assets that also exist in a book. First, the cover, which often includes a photo or piece of artwork that we may have the right to use for a certain number of copies or for a certain amount time of time. And again, if that cover was designed before the invention of ebooks, we probably did not get the right to use it on one — that’s why you see generic covers on a lot of backlist ebooks.

And maybe your book interior contains art, or photographs, or a map, or quotations from poetry or song lyrics. And again, you may or may not have requested permission to use those in an ebook when you were publishing the print book, so they may need to each, individually, be recleared for ebook use. And again, each of those permissions may have its own terms or expiration, where it’s good for a certain number of copies or a certain amount of time, or include restrictions like printing at a certain size, or with a certain credit line, or only below a certain dpi or in an ebook that has some kind of copy protection.

Specific to the ebook is the right to embed any fonts you’ve decided to use, which is a separate thing from the right to use them in print or on the web. And my company has agreements with font foundries that allow embedding their fonts in a certain number of ebooks in total, so let’s also keep track of how many fonts from which foundries we’ve embedded across that entire backlist.

So now, for our approximately 5000 backlist ebooks, we have the rights for the book itself, the rights for the art on the cover, and the rights for, say, between 0 and 50 other assets contained within the book. So now we have 10,000 or 100,000 separate contracts and agreements, all with different terms and expiration dates, that you need to be keeping track of in order to legally keep your ebook backlist on sale.


A type of maintenance that seems straightforward at first is the editorial error — the spelling error, the grammatical error, the factual error. The sort of thing that is easy to fix in an ebook, and so we do quite a lot of fixing of.

Spelling and more

The most unambiguous case is when we’ve introduced an error — one that’s not in the print edition — especially for the ebook. The most common case of this is the OCR error, very frequent in our oldest ebooks, where the ebook was produced by scanning an old print book and was probably not fully proofread afterwards. And just for fun, here’s a recent favorite:

“She did not think of it driving on Sunset Boulevard, which was always awake, the billboard advertisements for new films bright as movie screens, the twenty-foot feces of famous people staring vacantly in her direction.”

The question here is not whether to fix the errors, but how quickly and efficiently we can get these fixed. If it takes, on average, 10 hours to proofread an ebook fully, for our 5000 ebooks, we’ll need 50,000 hours. If you have someone who can work 40 hours a week on only that, you’ll get through your backlist in something like 25 years.

And as those of you who have worked as proofreaders know, there’s really no substitute. You can use all kinds of technical means to get closer and closer to 100% OCR accuracy, but to really feel confident that you have not missed a single comma from the original source, you still need to do that full 10 hours of proofreading.

Now, ebooks can easily have automated spellchecking applied to them, and this does help a lot. My team now has every instance of the word “feces” in our ebooks flagged for them to manually review.

But our retailers also are interested in the quality of our ebooks, and also can run automated spellchecking on them, and so they helpfully flag things that they think are errors for us. Which means we spend a considerable amount of staff time processing retailer-reported “errors” that are actually dialect, quotations from nineteenth-century primary sources, deliberate neologisms, or other creative uses of language.

Flowers for Algernon (Daniel Keyes). These are not errors, no matter what your automated spellcheck thinks.

And so, you say, look at the print source. But unfortunately, my company does occasionally publish a print book with an error in it.

And then what do you do? Thinking again of the life cycle of the book — some books are classics where textual errors are important enough to study and we’re taking something away from the text by making the correction.

My company publishes The Hobbit and The Lord of the Rings, and I am petrified of making any changes to the ebook text, even when we think we spot something obviously in error, because the integrity of the text is something people care about. And then write scholarly papers about.

In one of these papers, Tolkien himself is quoted complaining about a printing of The Fellowship of the Ring:

“the impertinent compositors have taken it upon themselves to correct, as they suppose, my spelling and grammar: altering throughout dwarves to dwarfs, elvish to elfish, further to farther, and worst of all elven to elfin.”

Can we predict which books published in 2015 will be the Tolkiens of 50 years from now? Do we want to be this decade’s impertinent compositors?

And sometimes we’re not sure if what we’re seeing even is an error. The following passage, from Philip K. Dick, displays a lot of creative use of language, but the “zommed” in the third line — while matching the print book — seems a bit suspect. Should it be “zoomed”? How can we know?

Ubik (Philip K. Dick)

Factual errors

About Time (Bruce Koscielniak, 2004)

Sometimes a book is completely accurate at the time of publication, but becomes factually inaccurate over time, giving the wrong dates for the beginning of daylight saving time or an incorrect planetary status for Pluto. In a social science book, statistics grow out of date over time; in a travel guide, every fact becomes wrong over a long enough period.

And again, because every ebook looks like a brand-new ebook, and because you’re reading it on your brand-new eighth-generation Kindle Fire, these kind of factual errors are more jarring than they would be in a print book that clues you in to its age.

Should we fix these factual errors? Do we need the author’s input? What if the author is dead, the agent is retired, and the editor has left the company? Should we fix them silently or with some kind of editorial note?

Offensive content

Then there’s the case where the content of a book is not incorrect, per se, but may have become outmoded or offensive.

Here’s an example of a diet book from the 1980s, which — in addition to the diet advice now being the exact opposite of what we would recommend today — is a bit behind the times in referring to Asian-Americans as “Orientals.” And probably just plain racist in going on to suggest that “Orientals” who live in the United States are somehow not Americans. What do you do here? Do you update the language that’s incidental to the content of the book? Does it matter who you think is buying this — whether it’s people who want the diet advice or people who are researching the historic participation of Asian Americans in diet programs?

The Fit or Fat Target Diet (Covert Bailey, 1984)

And here is a lovely classic children’s book — from 1960 — that’s about two little girls, and a witch, who has a baby, and a spelling bee, an actual bee that spells things. And there’s a minor part of the story where the girls are putting on their Halloween costumes, and one is a witch, and the other is “a little Chinese girl … and she had makeup on her face.”

The Witch Family (Eleanor Estes, 1960)

I think we agree at this point that a nationality is not a super-cool Halloween costume, but I’m not clear on whether Clarissa’s putting on yellowface or has just borrowed her mother’s lipstick. And so how do we handle this? This is not Huckleberry Finn — it’s not a book about race, where we talk about the history and the controversy. Should we be concerned with this type of incidental racism in an ebook that we’re selling today, one that looks just like the new, and hopefully more enlightened, children’s ebooks we’re publishing in 2015?

Ambiguity of source

Now if you’re a strict constitutionalist, you can dodge a lot of these questions by saying just follow the print book. But even if we believe for a moment that our only goal for ebooks is that they be a perfect replica of their print equivalents, we still run into questions because print books themselves contain some ambiguity.

I think this is something we don’t talk about a lot — that the physical print book doesn’t have all the answers (particularly if we don’t have digital source files). Print books contain some ambiguity as to their content.

Some of this ambiguity has to do with the original intention of the author or designer, which can’t be determined from the finished product. A common case of this is the print book index (or anywhere the print book gives a cross reference to a particular page). When your print book index says “ambiguity, 99,” we know that what the indexer wants us to see is somewhere between the first line of page 99 and the last. But if we’re working from a printed index, we don’t have any way to know the exact point in the text the reference was meant to take you to, even though technology would allow us to take you to the word-precise location in the ebook.

Another ambiguity is the location of a piece of art, which is often chosen to work with the layout of the print page. If the designer puts a photo at the bottom of the page, resulting in its being in the middle of a paragraph in the ebook, we want to move it. But which paragraph does it belong with, the one before or after? Below is a book we’ve resisted converting to ebook for a long time, because no one seems to know whether the position of the tarot cards that run through the book is highly significant — or merely decorative.

The Castle of Crossed Destinies (Italo Calvino)

As well, there are some conventions of print design that just don’t work correctly in an ebook. This is a print book; do you see an error here?

Beware, Princess Elizabeth (Carolyn Meyer) — the print book

But in the ebook, the missing open quotation mark before the dropcap is suddenly obvious. The design convention of omitting a quotation mark before a dropcap doesn’t make sense in the digitized text.

Beware, Princess Elizabeth (Carolyn Meyer) — the ebook

Similarly, a little text ornament in a space break is easy enough to replicate in an ebook. But then maybe we notice that the space break ornament appears only once in the ebook. Why? Because the designer’s only used the ornament when the space break occurs at the end of a print book page.

Mr. Splitfoot (Samantha Hunt). There’s only an ornament at the bottom of the left page because the space break falls between pages.

And here the print book has a hyphen that falls on a line break. There’s no way to tell if it was meant to be a hard hyphen (frame-story) or a soft hyphen (framestory).

CliffsNotes on Asimov’s Foundation Trilogy

Or what about a poem that runs several pages long — is there a stanza break at the page break or not?

Which is all just to say that as we’re cleaning up and improving our backlist ebooks, we find places where we, the ebook developers, need to impose editorial decisions and judgments — places where the source material doesn’t provide everything we need to create a completely accurate ebook.

And as I mentioned before, a major way that old print books and backlist ebooks are ambiguous is in their semantics — as we begin to use epub:type or HTML5 markup or try to add types of accessibility that are available in electronic texts, the information we need to do that may just not be present in the print source.


There are two kinds of metadata I worry about in relation to our ebook backlist, the customer-facing metadata (the kind of information you see on the book’s Amazon product page), and the internal metadata (our in-house record-keeping about the ebook).

Customer-facing metadata

The metadata describing an ebook grows out of date over time, of course, and needs to be maintained. This is true for the print book, as well, but again, the problem for us is exacerbated by the never-out-of-print ebook backlist, especially when the ebooks have become orphaned by their editors or authors.

Obviously, as new types of metadata come into use, like keywords or updated BISAC subject codes for juvenile and YA titles, you’ll need to go back and add these to each title in your 5000-title backlist that you plan to keep selling.

And then you’ll need to keep updating the existing metadata. One place we see a lot of this is in the author bio. The author bio may say something like “this is the novelist’s first book,” when she’s since published six more; or that she lives in California, when she’s moved to Paris; or suggest that the author is living when she no longer is. And that’s been a real question for us — should we leave the biography as it was written originally, true as of the time of publication of the book (“Will Shakespeare is a thirty-year-old actor from Stratford-upon-Avon. This is his first play.”)? Or should we edit it to be true as of the current moment?

Here’s the biography we have in our title database for Elizabeth Gilbert:

Elizabeth Gilbert is the author of the story collection Pilgrims, a finalist for the 1998 PEN/Hemingway Award. Currently a writer-at-large for GQ, Gilbert lives in New York’s Hudson Valley.

Which is great, except that it ignores the existence of a certain other book, one that she published with another company quite a while after this author bio was written.

Probably better known than Pilgrims.

But this gets tricky — if you are updating, whose responsibility it is to keep track of what’s going on in your authors’ and former authors’ lives? It gets personal: Who’s going to ask where they live, whether they’re married, how the two dogs are doing, what they’ve written since? If the twenty-year-old bio says “she lives with her son,” can we assume he’s since left the nest?

After my talk, an audience member told me that his table had discussed this problem and determined that the solution was to provide a link to the author’s Wikipedia page instead of an author bio. I thought that was pretty cool. Image: https://en.wikipedia.org/wiki/Elizabeth_Gilbert

The customer description of an ebook is another thing that can grow stale quickly. You can imagine the sort of breathless descriptive rhetoric — this is “new,” “revolutionary,” “cutting-edge,” the “most up-to-date,” the “best,” the “only” — that we use to sell new books. Two, or five, or twenty years later, that kind of language just seems silly. But revising that description is not just a matter of removing those telling words; to do it right, we would recontextualize what is relevant or important about the book today, and explain to the prospective reader why it’s still worth buying.

Internal metadata

I’ve been saying we have 5000 backlist books, but that comes with an asterisk, because we’re not really sure.

Internal metadata and record-keeping for ebooks is tricky because our in-house record-keeping systems were probably not designed to answer the kinds of questions we now have.

As an example, you may be making different ebook file formats for different retailers. The technically correct way to do that is to assign a separate ISBN to each format. But I don’t believe that anyone actually does that (and if you do, it creates its own problems in terms of the metadata that should be common between all of those files).

So somehow, under your single ebook ISBN, you need to be capturing information about things like

  • What formats of this ebook have been made?
  • How were they made? (In-house, by a freelancer, by a vendor, by a retailer, inherited from an acquisition, by the author, as an experiment?)
  • Which retailers are selling this ebook? And who’s selling which formats?
  • How many times has that ebook file been updated? What changes were made each time?
  • Which version of our ebook making tools did we use to make this book?
  • Can this ebook be updated by running it through our current standard tools?
  • If versions were removed from sale, why?
  • If changes were made from the print edition, what were they and why?

And maybe you’ve been capturing this data in a sort of freeform way, because you don’t have the right fields for it in your title database that’s designed for print books. But at some point it turns out that you do need that data to be searchable and reportable, because someday someone will ask you, How many ebooks do we have on sale? How many of those are epub2? How many ebooks do we have on sale at Apple that aren’t at Barnes & Noble? Why does this one book exist in an epub2 version and an epub3 version and another epub3 with audio added and a separate Kindle file?


Many of our books have URLs in them. Particularly our adult nonfiction books, which often have endnotes with lots of URLs. And in ebooks, we make these URLs into hyperlinks.

And as we know, URLs can stop working.

The web community has gotten good about talking about this problem, and they call it link rot.

There’s scholarly research on the prevalence of link rot — when URLs stop working — and of reference rot — when the information at a given URL changes from what it was when the author cited it. This study found that more than half of the URLs cited in US Supreme Court decisions suffered from one or the other. Which is not a great thing for the history of American jurisprudence.

Other studies have looked at the half-life of a URL, suggesting that it might be about two years. (The concept of half-life here is just like that for radioactivity, the amount of time it takes for half of something to decay, or in this case, for half of the URLs in a set to stop working.)

What does this look like in an ebook? Powers of Two is a pretty typical nonfiction book from our backlist. It has 275 URLs in it, mostly in the endnotes. It was published in August 2014.

So if we use that half-life of two years model, this summer, in August 2016, we would expect that only 50% of those URLs would still be good.

By August 2018, only 25% of the URLs would still be good.

By August 2024, ten years after pub, this model would predict that only about 3% of the URLs would still be working. And unfortunately, I don’t plan to be retired yet by then. So this is a real problem for us.

When I actually tested the URLs in Powers of Two with the W3C Link Checker, I discovered that about 47 (or 17%) of the URLs were not working.

To be totally fair, I also discovered that some of these were errors in the URLs that we introduced while making the print book or the ebook. Those URLs hadn’t ceased working; they had never worked.

On the bright side, 83% percent were still working. So we’re running a bit ahead of the curve there. (Or perhaps the websites our authors choose to cite are more reliable than the average website.)

As usual, there’s a “do it right from the beginning” solution, which certain parts of the web and scholarly communities are embracing (using things like DOI to identify electronic documents or a service like Perma.cc that archives cited online content). I haven’t seen trade publishing take this problem on.

If only our authors could travel back in time to properly archive their links.

But because we’re talking about backlist, and we don’t have the advantage of going back in time to do ask our authors to archive their links correctly, we’re talking about inheriting tens of thousands of old URLs that were not deliberately preserved in any way. (Not all our books have 275 URLs, and some have none, so what if we say 20 on average? Hypothetical 5000 book backlist at 20 URLs per book = 100,000 slowly decaying URLs.)

And this is a problem for print books as well as for the ebooks of course, but I think we’re more content to let the URLs in print books function essentially as decoration—as signs that there is scholarship underlying their claims. And we also assume that a very motivated reader who types out an entire URL from a print book to try to get to a source document will also have some basic web literacy around using Google to search for an alternative.

The URLs in ebooks, however, we transform into hyperlinks, which imply that the information is just a click away, and when it isn’t, it doesn’t just seem like the link is broken. It seems like your ebook is broken.

The hyperlinks in ebooks can also be checked in automated fashion, and then we get this:

So this is the scale at which we’re dealing with this problem right now — our backlist has a hundred thousand slowly decaying URLs, little time bombs embedded in our backlist — and the retailer response is to send me an individual email every time they notice one. Fortunately, they don’t seem to be looking very hard right now.

And then, the interesting part: what do we do to fix this? There are a few options, but they all have some authorial implications:

  • If the site’s just been reorganized, find the new correct URL.
  • If the URL is just meant to point to general further information, find a new site that contains that information.
  • Remove the URL altogether. We’ll do this if a citation given is complete enough that the reader can probably find the print version of the material cited, even though the online version seems to be gone.
  • Leave the URL, but remove the hyperlink. This looks like a mistake in the ebook to me, but sometimes you just don’t have any good options. This seems to satisfy retailer complaints (or at least evade their automated link checking).

Solutions and approaches

In conclusion, backlist ebooks are hard. What do you do?

Develop a process

In my group, we’ve developed a process to periodically review our backlist ebooks. We have two production reports, one for new ebooks, and one we call “updates,” and the updates have schedules and deadlines just like the new frontlist ebooks do. So every week we dredge up and work on at least a few of these older books. There are a few ways books get on the update list:

  • A retailer alerts us to a problem (it shows up on the Amazon rework list)
  • A new edition is coming out in print (the paperback is about to come out, with a new cover and some corrections)
  • The ebook is scheduled to be promoted
  • News or events lead to new interest in a particular book (the Pope mentions it in a speech)
  • We’re working through a list of bestselling ebooks
  • We’re working through a set of related books (reissuing an entire series)

This isn’t a systematic approach, in the sense that some of our backlist titles never make it on to the list and other titles cycle on to it over and over again (for example, a holiday title that is promoted every year). We’re working on finding and addressing the oldest of the old in our backlist, but we’re also comfortable that the titles that are selling the most and getting the most attention are getting re-reviewed the most frequently.

Take small bites

It might be ideal for us to update 1600 ebooks every year. We don’t have the capacity to do that. But if we just threw up our hands and didn’t do any, we wouldn’t have updated 950 last year. We don’t fully proofread all of our ebooks, but most backlist we reissue get an hour or two of review. Do what you can, and do it regularly.

Improve your tools

There are technological and workflow tools that can make this process easier, like scripts to search for common errors. One of my group’s projects last year was to replace our primary ebook making tool, reducing the time it took to build an ebook from three minutes to thirty seconds. That’s not a lot of time in a 40-hour week, but it substantially lowered the barriers to getting a lot of ebooks rebuilt. My advice to you is to make it easy to update an ebook, because it’s something you’ll want to be doing frequently.

Make it easy to make an update

Along the lines of making it easy to update an ebook, as you design your frontlist processes, I think a big piece of your brain should be devoted to thinking about how much you’re going to want to redo (and redo and redo) those processes as you maintain the ebooks. In my group’s standard process, we generate the ebook file and then optionally break it open to add customizations. And every time we do that, I want to be thinking about whether the value of the manual customization is worth the cost of having to maintain it … potentially forever.

Keep good notes

There’s a record-keeping piece, where we’re trying to keep good notes about the decisions we make for each ebook, so when we come back to remake each of them in two or five or twenty years, we know why that book had custom CSS, and what the rights are for the embedded fonts, and what kinds of workarounds we employed that will no longer be needed because we’ll be living in the perfect standards-compliant future. And what I’ve seen in my group is that the need to do this record-keeping suddenly begins to seem very important when you inherit someone else’s ebook to work on that doesn’t have that critical information attached.

Accept that it’s hard

But I think there’s also a part of this work that’s just hard. Maintaining backlist ebooks requires smart human beings to continually engage with the kinds of problems we’ve just discussed and to make case-by-case calls on what needs to be done to keep them available and representing your company the way you want them to. And I think that’s a process that shouldn’t just be delegated to a lone digital production intern or your ebook outsourcing vendor, but actually requires everyone at your company who cares about the content of your products to get involved.

What I hope I’ve expressed is that I find these issues interesting as well as challenging — that figuring out how to tackle these kind of novel problems, at scale and yet while respecting the different needs of each individual title, is what makes what we do intellectually engaging and rewarding work.

Ask for help

That’s why I’m here today. Convincing people to notice this backlist work, to acknowledge that it’s important work, and to be thoughtful about how your ebooks will gracefully become a part of your backlist is now an important part of making your frontlist books and ebooks. If you can get the rest of your company to care about this stuff — the content of their ebooks — you’ll also be getting them involved in the digital future of their words and ideas. And that makes the kind of work we do make a lot more sense and be a lot more meaningful.

Sketch by the inimitable @epubpupil.

Teresa Elsey is a digital managing editor at Houghton Mifflin Harcourt. Opinions are her own. She tweets at @teresaelsey.

Like what you read? Give Teresa Elsey a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.