Automated Publishing Workflows, Explained

Nellie McKesson
Hederis App
Published in
11 min readFeb 5, 2021
Ascii character representation of a dragon spitting fire at a knight who is protecting himself with his shield.

If you’ve been to a handful of publishing conferences, you’ve almost certainly seen a presentation about automation on the schedule (very possibly with my name attached to it!). Automated book production and automated workflows have been a hot topic for years — certainly since before I got into the game — and have taken many forms. “XML/HTML-first” workflows, “single-source,” and the more recently popular “digital-first” are all variations of the same theme. To many people, while these terms clearly signal a “high tech” approach to publishing, the reality of the actual technology and the day-to-day impact remains a mystery, and so their names are spoken with either bravado by folks who have successfully implemented automation, as if it is a great beast being steadfastly battled; or a grimace, the beast still hiding in the shadows, ready to destroy jobs and “the way things have always been done.”

In this post, I’m going to attempt to remove some of the mystery by giving an introduction to what people usually mean when they talk about automated workflows and listing some of the pros and cons.

(I’ll also do my best not to talk too much about my own work or what my company is doing, although since publishing automation could very well be called my “life’s work,” I’m not sure how well I’ll succeed — please forgive me in advance!)

What Are “Automated Publishing Workflows,” Really?

First, I should note that there are many things that can be automated as part of the publishing value chain — file management, metadata, distribution, and more — but for this post, I’m focusing on the book production process: editing and tagging a manuscript, laying out the book pages, and creating the files that will be sent to a printer or to a digital retailer like Amazon or Apple.

Automated production workflows are a response to a few problems that publishers have faced in the digital age:

  • The demand to publish in more formats
  • To publish faster and to create more books
  • To manage files more efficiently
  • And of course, to cut costs at the same time

My first experiences with automated workflows were at a publisher of math journals, and then after that, at a publisher of books about technology. Both of these publishers needed to be able to keep their content up-to-date, fixing errata and outdated information on a rolling basis so that the next edition would be more accurate. Indeed, one of the main tenets of almost all automated workflows is the “single source of truth”: a text file that is constantly kept up-to-date, so that you can churn out any formatted file that you need and know that the text contains all the latest changes.

For technical or educational publishers who are constantly updating their texts, the appeal of this type of workflow is obvious, but for trade publishers with well-established print-centric workflows, it’s a harder sell. Enter ebooks.

When publishers only had to worry about one final file — the print file — it was easy to overlook any inefficiencies in their workflows or file management. They could make text changes directly in the print layout files with few consequences. However, with the added step of creating a completely distinct ebook product with design constraints that don’t necessarily correspond to the needs of print, publishers have to make sure that the content and overall structure in both the ebook and the print file are the same while ensuring that both files meet industry standards — ideally without spending a massive amount of time or money to make that happen.

The accepted process that developed was to shoehorn the print text and design into an ebook file, requiring a fair amount of adjusting things and moving text around, and making edits to the ebook file itself after it is created in order to make sure it meets all design and technical requirements. Folks who had created print layout files in older software (or who didn’t have the original layout files at all) had to get creative about generating ebook versions of their backlists, either investing a fair amount of their own time and money in developing a process, or paying a vendor to do the dirty work for them.

Still, for some, that process was (and is) fine. But others began to wonder if maybe there was a better way — something that might help streamline their current processes, and also set them up for success in the future, should the publishing landscape change yet again.

A Very General Overview of How Automated Workflows Work

Automated workflows can take many forms, but ultimately they tend to work like this:

A flowchart describing an automated workflow, as described in the accompanying text
  1. The “automated workflow” exists as a collection of code or scripts, somewhere on a computer server or in the cloud.
  2. The final book manuscript exists as some sort of text file (Microsoft Word or similar).
  3. A person goes through the text and applies “styles” or “tags”, which classify each paragraph of the text, for example, as “chapter title” or “body text” or “extract text” and so on. This makes it easier for computers to know how to handle the text, and how to apply the design.
  4. A person submits the styled text file to the automation scripts via some sort of user interface (or directly through the command line, if you’re working with tech-savvy staff).
  5. The automation scripts turn this text into code as well (typically into a markup language like XML or HTML). This code version of the text is now even easier for computers to understand.
  6. The scripts send the code version of the text to a “PDF processor” — a type of software that can turn code like XML or HTML into a laid-out PDF.
  7. The scripts also send another code file to the PDF processor, containing design instructions that tell the processor things like how big to make the text of chapter titles or body text, how much space to add between paragraphs, what fonts to use, how big the page margins should be, and every other element involved in the page design. (This is usually a CSS file for HTML workflows, or XSL-fo for XML workflows.)
  8. At the same time, the scripts turn the code version of the text into an EPUB ebook file. This is generally much easier than the PDF, since EPUB files are, themselves, made up of XML and HTML files and have fewer nitpicky design requirements.
  9. The scripts take the final PDF and EPUB files, and deliver them back to the person who started the whole process.
  10. The person reviews the files, and if the files need adjustments, they make those in the text and then start the process over again. If not, they proceed to the next step of the publishing process (e.g., sending the files out to editors for review).

In summary: text files go in, scripts run, laid-out files come out. Now imagine that a new book format gets created, or that the EPUB standard evolves from the current 3.0 to a new 4.0 version that includes all new tech requirements; or imagine that you need to add more accessibility features to your EPUB files. In this type of workflow, you could add just one new script (or set of scripts) to re-transform all your books, instead of having to deal with each book on a case-by-case basis (most likely by paying a vendor to reconvert or update it).

The Pros and Cons of an Automated Workflow

After working with automated workflows at a math journal publisher and a tech publisher, building and deploying an automated workflow for a Big-5 trade publisher, helping to build and maintain automated workflow tools for some smaller publishers, and consulting with or just talking to many people about their experiences with and/or plans for automation, here are some of the benefits I’ve seen:

  • Multi-format publishing, from a single source. The bread and butter of automated workflows, and a real improvement to existing workflows. The workflow is typically built in such a way that it creates the print file and the ebook file simultaneously, reducing the need for an additional ebook production cycle after the print layout is finished.
  • A well-structured single source of truth. One of the crucial parts of automated workflows is having structured text, where each paragraph is classified as what kind of thing it is — a chapter title, body text, a heading, an extract, a line of verse — according to a set of classification tags or styles that are used consistently for every book. This means that over time, you build up a catalog of well-structured text files containing all the latest text changes, that are easy to understand and work with (and ultimately transform into any other format you might need).
  • MONEYYYYY. After the initial up-front investment in building or customizing the tools and workflow, automation — even using a pay-as-you-go tool — tends to be cheaper than the traditional layout process.
  • Get books to market faster. The layout process in automated workflows typically takes a few seconds to a few minutes — a big improvement over the average multi-day or week-long wait times it takes to get laid-out print files in the traditional publishing process. Additionally, since the ebook file is made at the same time as the print file, publishers can get ebooks to market as soon as they’d like.
  • A path for clean file management. Since the automation scripts are creating the files, you may also be able to program them to adhere to a file naming and storage scheme, and create a well-structured file system as a byproduct of their natural functionality.
  • Enables you to do more in-house, with your own staff. Implementing an automated workflow can give you a path to keeping more work in-house, while also building the skills of the day-to-day staff and bringing them closer to the books they handle every day.
  • Templated book designs. Automated workflows are often built around design templates — meaning you use the same design for a bunch of books — which can be great for folks who want a consistent look across a series of books, or even an entire imprint. It also speeds up the production process, since you don’t need to re-evaluate the design for every book.
  • A path for experimentation. A lot of publishers are feeling the pressure to try new formats or target new audiences. An automated workflow can provide a cost-effective way to test a new market without needing to build a full-fledged process around it. (While I wouldn’t necessarily recommend building an automated workflow from scratch solely for trying out a new market (since it takes time and money to develop these tools from scratch), if you already have one implemented for other book workflows, or if you’re using a cloud-based automation tool, it can be a great way to quickly try something new at a relatively low up-front cost.)

That’s certainly an exciting list, right?! But, of course, it comes at a cost. Here are some of the biggest pain points I’ve seen:

  • Limitations with layout. Most automated workflows exist as a “black box,” meaning that there isn’t a window into how the pages are being laid out. Design instructions are created via code, then you feed those design instructions into the tool, and then cross your fingers that the pages come out looking ok, without too many widows or orphans or terrible line breaks. This was actually a big issue for a lot of the trade publishers I’ve worked with, and part of the problem we’re attempting to solve at Hederis by combining automation with a visual layout and design tool.
  • Templated book designs. Wait… wasn’t this in the list of pros? Yes, but it can also be a con! Again, because the designs typically need to be created via code, this can create a barrier when it comes to customizing the design of books (as well as being potentially expensive to get a design created by a coder). This means that for a truly streamlined workflow, the design often must be templated. (Another problem we’re solving at Hederis!)
  • More tedious than InDesign, at times. Because of the “black box” nature of these kinds of workflows, it can be fairly frustrating to try and fix a bad break, or some other design flaw in the final laid-out print pages. It generally involves a repetitive process of making an adjustment to the text file and feeding it back into the tool to see the new layout, ad infinitum, until you get something decent. And if a change needs to be made to the design itself, this generally has to go through someone who writes code, which can cause delays. I spent many hours in this cycle back in my days as a production editor working in an automated workflow, and this was one of the top things on my list to improve with the tools we built at Hederis.
  • Potentially expensive to create. At this point, there are a lot of existing tools that you can use to piece together an automated workflow, or even use out of the box. Building your own workflow (either using a collection of existing pieces, or entirely from scratch) is going to come with a price tag that will vary depending on how much you need to create from scratch and how intricate your needs are (and whether you’re hiring new technical staff to help maintain the tool). This is a big reason why you mainly see the larger publishers building their own automation workflows — they have the resources to fund the development (and also tend to have stricter requirements). However, there are a variety of cloud-based automation tools on the market these days (including my company’s) that give smaller publishers a chance to take advantage of the benefits of automation where larger publishers have historically had the advantage, without big up-front costs.
  • It’s a big change. I have been guilty in the past of taking change too lightly. The truth is that implementing a workflow like this will change job responsibilities and require a coordinated training effort. For day-to-day staff who might already be feeling overwhelmed with the number of projects on their plate, being asked to adopt a whole new workflow — in spite of it potentially giving them new skills and a closer connection to their books — can be truly overwhelming. In this regard, smaller publishers actually have an advantage: it’s much easier to train a handful of people than it is to train dozens, and to keep communication open between the folks who are finding themselves in new territory and re-learning how to do their work and to work together. A lot of our documentation is inspired by the questions and issues I’ve seen people struggle with when working with an automated workflow, and I encourage you to check it out.

If you’re interested in learning more about building your own workflow, my company put together a case-study about implementing automated workflows, that lists out some of the costs and requirements. Of course, it focuses on our tool as a solution, but you can also get a sense of some of the other routes you could take and what goes into them. I’m also always happy to chat about automation, your experiences with it, and the options that are available — just drop me a line!

Nellie McKesson is a publishing technologist and the founder of Hederis. You can find her on LinkedIn and (extremely sporadically) on Twitter, or you can email her at nellie@hederis.com

--

--