“The fundamental thing is the book”: A Q&A with Dave Cramer

Hederis Team
Hederis App
Published in
11 min readFeb 24, 2021

When we decided to kick the year off on the Hederis blog with a focus on automation — what it is, how it’s implemented, and how book publishers can benefit from it —it wasn’t long into brainstorming ideas before Dave Cramer’s name came up. Dave is a Senior Digital Publishing Technology Specialist at Hachette Book Group who develops standards, workflows, and automated tools for both print and ebook production using web technologies. He’s a former member of the W3C’s CSS Working Group who helped edit the specs on paged media, and he’s a current co-chair of W3C’s EPUB 3 Working Group, working on the forthcoming EPUB 3.3. He’s been a speaker at BookExpo, Digital Book World, Tools of Change, eBookCraft, and Books in Browsers.

Dave Cramer at the W3C offices at MIT

We approached Dave to talk about his experience with implementing an automated publishing system based on CSS and HTML at Hachette, one that’s successfully brought over 1,000 book titles to market as print and ebooks. We also had a chance to get his in-depth thoughts on the current state of the art and future of publishing. We hope you enjoy!

How did you first get involved in development and publishing? You were working for a small typesetter doing XML work before you came to Hachette. Were you also doing work with HTML and CSS at that time?

My career in publishing was entirely and literally accidental. Almost thirty years ago, I had a bad accident while backcountry skiing in the Sierra Nevada mountains. I recovered at my father’s house in Vermont, and I was introduced to the Macintosh computer. My dad sent me home with his Mac Plus, which I eventually used to self-publish a Jazz discography. The first version of the book was done in Filemaker Pro, but I wanted running heads, and so I bought Pagemaker.

That random experience got me an entry-level job at a typesetting company in Brattleboro, Vermont. Stratford Publishing Services composed books for big publishers, using Quark XPress. I helped make PostScript and PDF files from Quark, which was then sent to the book printers. More importantly, I tapped into this incredible tradition of typesetting, from people who had experience with Linotype machines and molten lead. I learned about repro and blues, widows and orphans, spreads and kerning. I was lucky — there are lots of books about typography and design, but still so little is written down about the nuts and bolts of composition.

My involvement in ebooks was also literally accidental. Around 2001, Random House asked all their typesetters to start producing OEB files. I was supposed to meet my boss at the office on a Saturday to figure it all out. He had a flat tire, and by the time he arrived I had a pretty good understanding of what we needed to do. After that, I was just in the right place at the right time a lot of times, and learned a lot about XML as well as ebooks.

When I was reading your piece for XML.com about your work at Hachette, I was struck by one line, when you are summing up the first several years of the HTML- and CSS-based automation pipeline: “We’ve sold more than fifty million print books, and untold numbers of ebooks.” In that same piece, you talk about the fact that Hachette did a lot of digital only or digital first titles in that time as well. Was it your intention with your publishing system to push digital first and digital only, or was it more a case of form following function, as a natural outgrowth of the system?

The fundamental thing is the book; words written by the author, edited by the publisher, designed and composed by my friends. That was the genius of our system — putting the content at the center, rather than a particular manifestation of the content. InDesign is a tool built around the idea of a spread, which is print-specific. Dante [the core of Hachette’s system; their in-house flavor of a platform created by Infogrid Pacific] is built around the content itself. That works especially well for trade publishing, where we may produce as many as seven editions of the same book: hardcover, ebook, large print, trade paperback, mass market, etc. It’s all the same HTML with different CSS. This also means that print is not “privileged”; we don’t need to have a print edition first to make an ebook. So what we call a “digital original” is easy enough to do.

The fundamental thing is the book; words written by the author, edited by the publisher, designed and composed by my friends. That was the genius of our system — putting the content at the center, rather than a particular manifestation of the content.

How mainstream is automation in publishing now versus when you first started out with your system at Hachette? Is it primarily the big 5 publishers who are utilizing it, or is it gaining wider traction?

First, I want to rant about the idea of automation. Automation isn’t an end, it’s a means. What we really want is high-quality, beautiful, perfect, usable books. Our employers want to make money. We ourselves want to do interesting and satisfying work. Automation is a means of achieving some of those goals, but works against other goals.

I love computer typesetting. One great advantage is I’m much less likely to get lead poisoning! I’m glad computers do an excellent job of counting pages, so I don’t have to. But a computer needs to know something about grammar to decide how to hyphenate the word “record.” So our goal isn’t to eliminate the human touch. Our goal is to have humans apply their skills and judgment — their “eye” — to the book. A human being, with experience and judgment, is very good at making the trade-offs that typesetting requires. Will fixing this loose line cause a bad break somewhere else? How tightly can I set text to avoid two lines on the last page of the chapter?

Sadly, I don’t see other big publishers doing what we did. The Religion of Adobe remains dominant. Doing what everyone else does feels easy, and safe, and fits into the dominant narrative of outsourcing. I worry about this for a lot of reasons. Publishers lose in-house expertise and become dependent on their suppliers. The suppliers are afraid to push back on the publishers. One side effect is that there is no one to help move the whole industry forward through standards. The publishers don’t know enough; the vendors don’t have the resources and don’t have a compelling reason to share their expertise.

Automation isn’t an end, it’s a means. What we really want is high-quality, beautiful, perfect, usable books. Our employers want to make money. We ourselves want to do interesting and satisfying work. Automation is a means of achieving some of those goals, but works against other goals.

You’ve said that automation helped save money for your company and bring work that had been outsourced back in house. I was heartened to read that — I don’t think that’s typically what people envision when they think about automation. Can you talk about how automation and insourcing has affected the culture at your company? Do you think it’s something that can happen at other publishing houses?

First of all, I think the insourcing has been a much bigger deal than the technology. The lines of communication are shorter. Language and time-zone issues are reduced. At least two members of our composition team have been working with Dante for a decade now. We can outlast the editors! We’ve been able to turn around projects in incredibly short periods of time — a manuscript coming in at lunchtime; first pass pages done before dark. But production always seems to be at the bottom of the hierarchy — if the author is late, we’re the ones who have to make up the time.

It’s funny. The big publishing houses each have such different cultures. What’s possible at Hachette might be impossible at Random House. In some sense what we did was an accident, the result of a visionary COO not quite understanding what he set in motion.

Can you talk a little bit about your use of CSS for design specs — its so powerful in terms of how much polish it can add to an HTML document, and it has really done so much to beautify the Web. I’m always excited to hear about systems like yours and, of course, Hederis (plug, plug), that use CSS for books, and not just ebooks. Was there a lot of interest in using CSS for paged media when you first started out? How has that community grown or changed since that time?

CSS is indeed awesome. The creators of CSS, Häkon Wium Lie and Bert Bos, were always interested in print, and even published one of the first books typeset with CSS. But the web has evolved into an application platform, despite its roots in documents.

I sometimes see hopeful signs of increased interest in paged media. Browsers realize that people print web pages. Changes to the code inside browsers are making it more possible to implement book-friendly features. But implementing these features takes money and engineers. Perhaps publishers could band together to have companies like Igalia write code for the browsers. Perhaps low-level browser features like Houdini will make it easier to create fancy layouts using Javascript.

What’s the current state of the art with CSS and EPUB? How much of CSS is available to EPUB developers these days? And what is the current conventional wisdom in terms of device support and how much developers should try and support with the EPUBs that are being produced today?

This is something I struggle with. We want to make beautiful books. We are likely interested in design and typography and what’s new. But I think ebooks have fundamentally changed the balance between the reader and the designer. Now the reader gets to pick the fonts, choose night mode, choose even whether the text is justified. In print, the compositor decides where every dot of ink on every page goes. That level of control was never going to survive the web, where you don’t even know how big the page will be.

Ebooks are, in some ways, now an accidental side effect of the web. What we can do is limited by what the browsers think is important. What they think is important is speed, speed, speed. Yes, we could have much better hyphenation and justification in ebooks, but the browsers don’t care about that because the code to do that would be slow. And there is no one who is both interested in that problem, and has the money to do something about it.

We want to make beautiful books. We are likely interested in design and typography and what’s new. But I think ebooks have fundamentally changed the balance between the reader and the designer. Now the reader gets to pick the fonts, choose night mode, choose even whether the text is justified. In print, the compositor decides where every dot of ink on every page goes.

I do have moments of hope. If publishers could work together, there are ways of bringing more capabilities to the browser. There is a very interesting company called Igalia, in Spain. It’s owned by the workers, and they write code for all the browsers. They did the CSS grid implementation for most browsers, and they’ve been working on MathML for Chrome. If we came up with some money, we could get them to implement features that would help with printing, paged media, browser reader mode… I think there is a lot of common ground there.

How are technologies like AI and machine learning affecting the automated systems? If they’re not already, how do you envision them having an impact?

I have not yet been convinced that machines learn. I am wary of what we are trying to teach them. Code is written by humans, whether directly or indirectly, and humans are famously biased and short-sighted. Books are more complicated than we think, even when we know that books are more complicated than we think. We published a novel that included the Riemann-Zeta function. Is your automated typesetting system ready for text set vertically because the author is documenting a Scrabble game in progress?

We’ve done some experiments having “machines” analyze the text of our books, even for simple things like identifying proper names and geographic locations. We did this with the Michael Connelly novel The Lincoln Lawyer. It had trouble distinguishing between the president and the car. It had no idea that the character “Fernando Valenzuela” was a bail bondsman and not a major-league pitcher.

You’ve done a lot of work with the WC3 around CSS and EPUB and had an impact on the evolution of these standards. I know there are some different possible trajectories for where digital publishing is going — I’ve heard a lot of talk about Web publications and books in browsers. I was poking around the WC3 Github and I noticed a repo for EPUB 4 — not sure how far off that release is. We know that print is not dead and is very much still the driving force for a lot of big publishers. Is there anything you are seeing gathering momentum that you think of as the “next big thing” in publishing across these multiple formats? And how can automation help publishers prepare for whatever the next big thing is?

I’ve never been able to predict the future. We expect more change than we get. Ebooks haven’t changed much in a decade. Print books haven’t changed much in a century.

If I had to make a few non-predictions…

  1. There are already books in browsers. You don’t need a new standard — you just need a website. https://resilientwebdesign.com is a great example. But I don’t see how a web-based model would fit in with the economics of the existing book publishing industry.
  2. EPUB isn’t going anywhere. I hear a lot of frustration about how ebooks are too much like print books (I hate the phrase “print under glass”). But this makes the ebook business possible — they are enough like print books that the business works out, from the distribution model to royalties. Let’s not forget that, despite all our frustrations, EPUB has been a tremendous success. We really did create a universal ebook standard! The publishing industry has managed to get people to pay for digital content without advertising.
  3. I hope there will be an EPUB 4, but I don’t know if it will be possible. In my dreams, EPUB 4 looks a lot like EPUB 3, but without all the odd extensions and compromises. We would allow the HTML serialization of HTML5, get rid of the NCX and the idea of multiple renditions, and generally try to be closer to the mainstream of the web.
  4. One thing I’m sure of — the focus on accessible ebooks will only increase. And this is great!
  5. I really hope we get better typography on the web at large. But we might have to build it ourselves with script and Houdini.

While automation is regarded by some in the industry as “the beast still hiding in the shadows, ready to destroy jobs and ‘the way things have always been done,’” we here at Hederis are on a mission to dispel that myth and help publishers reconnect with their content and creativity through automated production. Our cloud-based publishing tools can help you quickly create custom book designs that can be exported to PDF, EPUB, and HTML. Visit the Hederis site today to learn more and start your free project!

--

--

Hederis Team
Hederis App

Insights on publishing, design, and innovation from the Hederis Team.