I enjoy designing books, because typography and beautiful books are dear to my heart.
When I receive a new manuscript to turn it into a print book and ebook, I like to spend my time on working with the book’s visual design rather than laboriously cleaning up text and structuring content. I want to enjoy the typographical playground rather than fixing issues inherited from a history of editing the manuscript. I want to experiment with the visual design and not worry about the book’s content and semantic structure.
I want to click a button and enjoy my hot chocolate, while I watch my computer labor to get me to where I can have fun.
Building such an automated workflow that is both comprehensive and intelligent is a challenging and difficult journey, and a continuous evolution of the product.
Several years ago I built my first ebook by hand.
At first I tried multiple web services and tools, but I found their results dissatisfying: the mess of the original manuscript was simply transferred to the ebook, there were rendering issues on different devices and apps, tools seldomly checked for issues in the text, and the final ebook failed to validate against official standards. And so I opened my favorite text editor and went to work myself.
One by one, I arduously transcribed the paragraphs from the original Word manuscript, and with tedious handiwork I cleaned up their formatting issues and typos. Sometimes the text styles left me guessing the book’s structure, but leafing back and forth through the manuscript helped me to understand what the author may have intended the book’s structure to be. It was easy to build the EPUB scaffolding around that transcription, and lo and behold, there was my first ebook! It took advantage of the e-reader’s default styles, it was clean and simple, it rendered beautifully on whatever device I tried, and most importantly, it validated perfectly.
In that same manner I worked through my second book manuscript, but already I began to feel bored by the repetitiveness of the process. I felt like I was wasting my time doing work that my computer could do so much better, and I felt robbed of the joy of actually being creative and designing the book itself.
These are the moments when being a seasoned software engineer comes in tremendously handy, and this was one of them.
And so I set out to write myself a simple program that would construct the EPUB scaffolding from the HTML file that I had produced so laboriously. Soon, I switched to using XML because HTML wasn’t expressive enough for my needs; it was too messy (hence the common term “HTML soup”), and industry-strength tools worked much better with XML.
I soon noticed that little mistakes were sneaking into my manual transcriptions. So I expanded my XML digestion tools with error checks in order to first find spelling and punctuation problems and typographical nits, and then to either fix them automatically or at least flag them for me to inspect.
Ebooks weren’t enough, though. Word and its clones don’t produce well-designed print books, and often I looked at the original documents wishing that they would look less ill-designed, less unshapely and crooked, less amateurish. When I then stumbled upon Prince XML, a tool that allows me to apply CSS styles to my XML files, I excitedly began to design and create print-ready PDF files that looked beautiful and professional.
Because both ebook and print book came from the same source, I could now edit the original manuscript at will and then generate the final books automatically.
I now realized the sense and importance of strictly separating a manuscript’s content and semantic structure from its visual design and presentation.
However, with all the automatic generation of ebooks and print books from XML, I still found myself spending long hours insipidly transcribing more or less ill-designed book manuscripts to XML, stripping them of their visual mess, and cleaning them up manually.
Once again, I rolled up my software engineering sleeves and set out to make my life easier. This time around though, the task was much more challenging: I could automatically extract content and visual styling from a book’s manuscript, but could I automatically semantically structure that content based on that visual styling? Could I write a blob of code that took text and its visual styling information, and automatically derived the text’s intended semantic structure from that styling?
As it turns out, this is a nontrivial problem that has kept me busy for the better part of two years now.
Think about it: an author writes and structures a book using chapters and sections, emphasizes text portions with formatting, references blurbs within the book or externally, elaborates on the text using footnotes and endnotes, and so forth. All these different text elements are styled visually to set them apart from one another, and to guide the reader through the book without becoming distracted.
I wanted to automate the reversal of this creative design process: derive the author’s intended semantic structure of the book from how she visually styled the text elements!
And so I have been busy designing and implementing an intelligent structure and content classifier using AI/ML techniques (hooray to this week’s buzzwords), and combining them with other heuristics that flow into the classification. As it turns out, this is not only an engineering challenge, but also an interesting academic research topic.
Since I wrote my first simple script many years ago, my tools have grown and gotten much better. I’ve gotten where I wanted to be all along: today I can click a button and slurp my hot chocolate while I watch my laptop do the tedious work. Still, because no algorithmic solution will ever be perfect, I need to confirm or adjust the structure classification or proposed punctuation changes of a book, at least occasionally. But that’s just another mouse click, and I don’t have to put down my cup.
Finally, when I’m done with my hot chocolate, I can indulge in designing my book, contented with the assurance that the rest of my workflow — content extraction, semantic structuring, text cleanup and fixes — has all been taken care of while I enjoyed sipping from my cup. And that’s what I really wanted all along.