My Summer at a Publishing Company
by Emily Lau
This summer I was in Manhattan, amidst the constant noise of cars stuck in traffic and the beautiful view of bustling buildings, working as an intern at the headquarters of Penguin Random House.
Penguin Random House is one of the world’s largest publishing companies, home to over 250 independent imprints. It was formed in 2013 from the merger of Random House and Penguin Group, two of the “Big 6”, a term for the six largest English language trade publishing companies. You have almost certainly picked up or read a book published by an imprint of one of the (now) Big 5: Penguin Random House, Hachette, HarperCollins, Simon & Schuster, and Macmillan.
At Penguin Random House, I was an intern for the Ebooks Development team, which is a tiny subteam within the Ebooks Tech Team, which itself is part of the Ebook Development & Operations Group. Most days at my internship I worked alongside four other members of the Tech Team: two on the Dev team and two on the Research and Documentation (RAD) team. The rest of the Ebook DevOps group is made of the various production teams, who work with the imprints as well as external vendors to convert print files into proper ebook files. The typical ebook file format is EPUB, a standard developed and published by the IDPF, now part of the W3C. As an intern on the Tech Team, I became quite familiar with EPUB files!
On my very first day, after a couple hours of orientation activities, I took the elevator down to the 3rd floor and met my manager. She quickly introduced me to people in Ebooks on our way toward my cubicle. As we walked around the floor, I immediately noticed the stacks of books, cool posters, and climbing plants (so many plants!) that crowded many of the desks and shelves. Sadly, my desk had no such decoration. I dropped off my bag, grabbed the Macbook Pro on my desk and clutched my orange folder from orientation as I joined my manager in a separate room to get everything set up.
Once I was properly set up on the Ebook team’s GitLab, Jira, and Slack, I received a crash course in the technical specifics of EPUB files as well as the workflow of the Ebooks Tech Team. EPUBs are essentially like zip files full of XHTML, CSS, image and other files. There are also a few critical files for something to be considered a “proper” EPUB:
mimetype
(really short and specific: things like extra trailing whitespace will cause this file to fail)- the OPF, an XML file which usually has: manifest (a list of all of the files in the EPUB), spine (an ordered list of contents), landmarks (identifies important components in the EPUB, such as the start page and the Table of Contents page), and a bunch of metadata about the EPUB
META-INF/container.xml
(identifies the OPF)- Some file that provides navigation, whether it be the NCX (for EPUB 2) or a navigational document (for EPUB 3, usually I’ve seen this file named
nav.xhtml
)
I asked my manager a lot of questions to try to understand the intricacies of EPUB files and how they worked with them at Penguin Random House. While my brain was still processing EPUB files, I also learned that there were a number of different workflows for ebook production at Penguin Random House. Some ebooks were produced in-house with our own tools. Others were converted by third-party digital publishing companies and sent back to the ebook production teams to be revised using various tools. And these were just for the frontlist titles! We couldn’t forget about the tens of thousands of ebooks in the backlist, produced following specifications which needed to be handled differently.
The dev team is responsible for workflow and software development, testing, and documentation for ebook production tools as well as internal team tools. Once the production teams get the print book files (for example, coming out of Adobe Indesign), they need to convert them into ebook files following whatever specifications are needed from the imprint and other parties. Among the things one may need to consider: fonts, images, audio, links, tables, footnotes, endnotes, accessibility and basically everything about how the ebook looks and functions. That’s where the tools that the dev team works on come in!
When I joined, the team was in the midst of finishing up one large phase of development and setting out their goals for the next phase. They were working on getting a significant project, a big rewrite of a tool that updates an EPUB 2 file to the specifications of an EPUB 3 file, out into beta testing (this tool finally went into beta testing in July). Some of my first tasks were to look through some of our common Python tools that could be called by other tools, and write some much needed comments in the code. I also helped with a lot of tool testing. For the first week, I spent time looking through the dev team’s extensive code base.
This was my first tech-related internship. I didn’t exactly know what to expect, but I was soon swept away in the fast-paced environment at Penguin Random House. I was included in the daily check-in meetings with the tech team, where each member talked about what they did yesterday and their plans for the day. My manager invited me to one of the meetings where the tech team planned the next phase of development using Google Sheets and a lot of back-and-forth discussion. I had lunch and looked for free books from the take-shelves with fellow interns.
In week 2, my manager and I discussed the first long-term project that I would work on: a rewrite of an existing tool. I would rewrite it in Python with considerations to new ebook file specifications. The tool would take an EPUB file and create a Table of Contents (TOC) page from the navigational document (navdoc). Creating a TOC page from a navdoc is at its core really simple: mostly a bunch of find-and-replaces. However, taking into account my lack of experience with regular expressions (regex) and the overall unpredictability of the different EPUB files this tool could be run on, starting work on the tool was difficult.
Luckily, my manager was incredibly helpful with answering all of my concerns. The tech team also held a quick regex refresher meeting. Through the weeks, I wrote code, tested code, and reached out to the dev team for advice and help with debugging. I made an AppleScript wrapper for my tool with both drop-in and double-click functionality, created a basic graphical user interface, and did lots and lots of testing to find and address any issues that could come when the tool was used in production.
Although I did spend a lot of time in front of computer monitors, there were opportunities to learn more about the work of the other departments at Penguin Random House. During my lunch breaks, I often took the elevators to the other floors and explored. I checked out displays and big posters of best-sellers, walked by desks overfilled with books and book-related decorations, and scoured take-shelves for reads to bring home. Every Thursday, the HR department held brown bag lunches, inviting the interns to hear from employees from different departments talk about their experience at the company. I loved hearing employees talk about what they did, how they got to their current role, and their advice to us interns. The U.S. CEO talked with us in Q+A style in our last brown bag lunch.
In July I saw my tool pushed into beta. I wrote my first ever beta ticket and assigned it to someone from the production team to test my tool and provide feedback. It was an exciting day! That same week we also decided on my next project. Someone from the production team had requested a tool to sort EPUB files by version (2 or 3) and then by reflowable vs fixed-layout. This tool could potentially be run on batches of hundreds of files (and I later did use it to sort a folder with almost 2,000 EPUBs).
Using some of the common tools and functionality of existing apps, I quickly wrote the Python code and AppleScript wrapper for this tool. Preliminary tests looked good; the dev team was surprised with how fast it ran on hundreds of files. After more testing, debugging, and adding user notification to the tool, it was pushed into beta. The person who had requested the tool had talked with me a couple of times during the development of the tool, expressing how useful it would be for everyone in Ebooks. Their enthusiasm inspired me and I was so happy that my work was appreciated and would have an impact!
In the second half of July, I helped the dev team with recording all of the expected logger messages coming from their EPUB 2 to EPUB 3 updater app (the tool that they’ve been working on for months that I mentioned earlier). I also created test sets for various tools, including the 2 that I had written. I helped the RAD team with testing a bunch of fonts on various devices.
For the remaining 2 weeks of my internship, I worked with nearly 2,000 files from a recent acquisition. Many of these files did not pass the current EPUB specification. My manager and I determined a checklist of fixes for the files, I whipped up a Python script, ran some tests, and then set my poor laptop to copying the files from the server, running the script, and copying the revised files back to the server.
I loved my summer at Penguin Random House. My team members were dedicated to their work and always willing to explain ebook production concepts to me. Everyone on the team seemed to get along well and there was open communication. I enjoyed the opportunity to learn more about ebook production; I’d say most of what I read these days is in digital format, and I had always wondered how ebooks are created. Before starting my internship, I hadn’t known how much tech impacts the world of books and publishing. Working in a tech environment at a company where books are published combined some of my greatest passions (coding, reading, learning new things, collaborative work).
If you’ve ever wondered what it’s like to work in publishing, I hope you learned at least a little from my account of my experience at Penguin Random House this summer. I’m always open to discuss more (and if at anytime you want to talk about books, I’m will be very happy to as well).