How to Train Your Microservices

DreamWorks Animation takes filmmaking to the cloud

Pivotal
Built to Adapt
11 min readJun 1, 2017

--

TThe entertainment industry is in a state of hyper-speed evolution. The financial demands of changing audience behaviors, multiplying viewing platforms, and disruptive business models mean movie studios are continually pushing the edge: more releases, smaller timelines, and what Doug Sherman calls “unbounded creative ambition”.

Doug is a Principal Engineer at DreamWorks Animation, home to such hits as the Kung Fu Panda franchise, How to Train Your Dragon, and the upcoming Captain Underpants: The First Epic Movie. The studio releases two or three films per year, each costing more than $100 million in production and distribution, and each upping the game in visual effects and character detail. DreamWorks Animation typically has seven concurrent films at different stages of production, and the technology requirements to keep this machine going (and working to build a globally distributable pipeline) are gargantuan. One 90-minute film features nearly 130,000 frames, each frame with hundreds of digital assets and control points for each character. It all adds up to more than 500 million digital files — that’s 75 million CPU hours, 10,000 cores, and 200+ terabytes of data.

Doug Sherman, DreamWorks Animation

Doug is driving an evolution of sorts within DreamWorks, led by CTO Jeff Wike and touching thousands of employees worldwide. DreamWorks is rapidly adopting microservices and the cloud as part of its moviemaking process, first transitioning to the Spring Framework in 2010 and now leveraging Spring projects such as Spring Boot, Spring Data, and Spring Cloud.

A new architecture means many new challenges, according to Doug. While one global, uniform application that tells employees where files are, choices artists made, details and metadata related to each artist’s work seems like a no-brainer — it’s much more complicated than that. Even positive transition is change. We sat down with Doug to talk social engineering, putting people in the cloud, and why artists don’t like change.

Built to Adapt: What is your role at DreamWorks?

Doug Sherman: Officially, I act as a Principal Engineer for DreamWorks Animation in a role that centers on architecting and designing solutions for a microservices-based platform aiming to take advantage of going cloud-native.

I have to engineer solutions that enable artists to efficiently access and share resources critical to achieving their day to day output.

Unofficially? I’m a data traffic cop. Imagine millions and millions of files, and you have to make sense for everyone of who gets what and when. In some ways, that’s what I do. I have to figure out how to make an artist’s life easier to the point where they come in and they just do what they do best. And in order for that to happen, I have to engineer solutions that enable artists to efficiently access and share resources critical to achieving their day-to-day output. There’s a lot that goes on between each artist and the final product from layout and modeling to lighting and rendering, so someone has to organize all of what happens when that artist is done and it’s time for someone else to work on the next step.

Animation used to depend on hand-drawn cels. An artist would have a piece of paper and they’d sketch out a bit of animation on that — just a rough pencil sketch. And a production assistant would have to come in, take all that physical pieces of paper, and carry it to someone else’s desk, put it down, and they’d erase all the lines and they’d clean it up. And then the next person would come in, pick it up, and hand it off to the next artist, who would ink it in, and on and on.

In the digital world, you’re doing the same thing. You have software that goes in and makes sense of how to identify the stuff that the next person needs, transport it digitally so it’s actually in a place they can access it, and then appropriately label it, because if something goes wrong, or someone wants to change their mind during any part of the process, you have to remember how to go back and find all those assets.

BTA: How has the cloud affected these digital workflows?

DS: We’re about 50% of the way in having some amount of production coverage powered by microservices which are deployable in cloud containers powered by technologies such as Spring and Spring Cloud. We’re moving all this infrastructure into the cloud, and we thought, not only can we push computing power to cloud when we need to render things out, but we can also push storage, and not buy so much in advance…and people, too! We could push people out there, allowing them access inside. This is huge in this industry. With each film, we hire a massive army to get it done, and it’s very difficult to find the right balance of headcount to ensure the studio is getting maximum utilization out of their artists. It gets very expensive. What about pushing out to them? Push out to them as if you’ve written them as microservices. They become accessible now, become deployable out there. It’s a huge bet we’ve made, and we’re starting to get the pay off now.

BTA: So how did DreamWorks employees first react to microservices?

Microservices were a foreign concept, and people were very resistant. People were terrified of it in the beginning. As much as we just wanted to say “Oh, here’s this magical thing where we just identify everything and we just seamlessly hand it off,” the reality is there’s a team of people called TDs — technical directors. And there’s many of them that are employed here.

A short explainer for those unfamiliar with microservices.

TDs are technologists — a lot of them come from computer science backgrounds — that get plugged in, and they literally sit in a particular department at the studio. They might work with the animators, or they might work with the modelers or the layout artists. And they’re very skilled at understanding how those artists particularly work or want to work. They are way beyond normal tech support. They’re so deeply placed in. So, to get films done, they would hire mass loads of these people to come in and ensure artists can stay focused on art — not tech — hand-hold the artists through all the stages.

What we’re trying to do, and it’ll take a while, is trying to automate a lot more of what those people have to do day-to-day, and alleviate the need for an artist to have to have a team physically lead them through from step to step. Services aren’t something that these TDs would typically write. These TDs typically stay with quick shell scripts and Python scripts because they’ve got very thin timelines, and there’s no process in place half the time. So, they have to be really agile in just generating throwaway code to get artists from Point A to Point B.

What we’re trying to do, and it’ll take a while, is trying to automate a lot more of what those people have to do day to day, and alleviate the need for an artist to have to have a team physically lead them through from step to step

So, the idea that this infrastructure team would come in and replace one of the things they do with what appears to be heavier-weighted solutions, in a different language space…that’s intimidating. Really scary. They felt like they have all this domain knowledge, so how can this external thing running on a server possibly know how to do what they do in a local scope? It’s understandable.

BTA: How do you convince them?

It’s been a lot of convincing around (1) How we can architect a solution that’ll alleviate them from having to do all of these sort of handwritten scripts day to day, and (2) How we’ll better utilize them, because we definitely don’t want to get rid them. There’s a lot of really good knowledge they have.

But it’s very, very intimidating. You’re going from someone who’s just writing a quick Python script to educating them on, “This is what it means to run something server-side, and oh, by the way, this is the whole cloud platform that you have to deploy these solutions to.”

Even for some of the seasoned developers here, they’re used to running on bare metal. So, when we talk about deploying to the cloud, if they have to learn what Docker is and what Kubernetes is, and I have learn how to write this kind of file, even our DBAs are having to re-skill in deployments and maintenance of databases that run in a cloud environment.

The latest from DreamWorks Animation opens this weekend.

BTA: And there’s the added stress of being in the filmmaking space where your release date can not slip.

Yeah, if what you’re proposing is not routine, there’s the panic that if they’re the first one to try this out, they’re in jeopardy of not making that date or having to push their artists to the weekends to compensate for things that were unexpected.

It has to be a convincing argument. Microservices came down from the previous CTO, Lincoln Wallen, as a mandate; he was ahead of his time on this: “We will do this. We will take those risks. We will force production onto this, and the reward will be that we’ll get better and better. That will become routine, and the comfort level will get there.” He believed in the technology.

Nerves came from a couple of things. One is language domain. There’s a lot of Python experts here in the studio, and they didn’t want forfeit their language expertise. They didn’t want to have to transition to Java or something else in order to participate in this.

The whole “microservices” word was a feared thing, but now groups that once completely pushed away are embracing it and wanting to learn it. I’ve recently taught classes in how to write microservices. The interest in joining our group has grown times ten, because people want to learn how to do these sorts of things. They’re recognizing why it’s good to do it this way instead of scripting everything and doing brute-force solutions.

Nerves came from a couple of things. One is language domain. There’s a lot of Python experts here in the studio, and they didn’t want forfeit their language expertise. They didn’t want to have to transition to Java or something else in order to participate in this. So, one of the confidence-builders was to say, “Well, there’s nothing in web services that says it has to be built in any one particular language.” Some languages offer better tooling or an easier way to understand how to deploy them if they’re written in a certain language base. But I was able to demonstrate in Python that you can write a web service just the same.

Knocking down all the buzzwords and all the scariness of what it is — that’s what swayed a lot of people over.

The other fear factor they had with microservices was that it was this: “Even in Python, how much code do you have to write to make something run on a cloud?” That feels really complicated compared to writing a couple lines in a script that executes locally. And I was able to show with certain frameworks out there, it can be as simple as a three-line script. Again, a lot of how it stands up as a web server and how it translates JSON, say, into objects and back, how simple that could be even in Python, in only minimal lines of code. Once I was showing how that could be stood up and how they could manipulate the ins and outs, that sort of fear of this big, giant, mysterious thing got reduced down.

Knocking down all the buzzwords and all the scariness of what it is — that’s what swayed a lot of people over. They saw that it wasn’t this big, difficult thing. The language base does matter to some degree, depending on how mature that language is in pursuing middleware. So, Python and Java (Java probably more) we spend a lot of time making tooling. This is where I love Spring.

BTA: So much of your job is clearly social engineering. What’s your biggest lesson learned when it comes to helping teams accept and thrive with change?

We’re in the business of finding the best algorithm to get something done, but if you don’t have social engineering skills, you’re in trouble. Half my week is often getting teams together, understanding what we’re doing, building things in their direction so they will evangelize to their teams.

So now, I’ve gotten into a habit: before I even write a single line of code, I interview everybody that potentially will use the solution that I’m going to write, and I keep them in lockstep with me and my team just about every week.

You have to understand what people want to do in their domain. In the past, I’ve gotten it wrong. I’ll come up with an idea I think is sound — I think it’s the coolest thing ever — and I’ll work six months in isolation with my team, and then we’ll do this big reveal. And every time we’ve done that, it’s gone horribly wrong, because 1) people feel like we’re lecturing to them, like we know better than them. And then 2) we would typically have over-engineered it! It would be like the 747 cockpit, you know? There would be this overwhelming amount of knobs and bits and pieces that I think are great to have, but from their viewpoint, they only need to do a few things, and that’s an overwhelming amount of stuff to have to sign up to be able to do.

So now, I’ve gotten into a habit: before I even write a single line of code, I interview everybody that potentially will use the solution that I’m going to write, and I keep them in lockstep with me and my team just about every week. We keep them engaged, helping to influence the direction I’m basically trying to echo out in code all of what they want. It’s gone so much better, because they feel invested. They don’t feel like in six months I’m revealing this big, mysterious thing. They feel like this is just something they’ve seen through iterations. And what’s empowering about that, too, is if you can get the spiritual leaders of the different departments that you’re trying to encourage to use your solution, they’ll help sell it for you.

Watch Doug’s talk from SpringOne Platform 2016:

Change is the only constant, so individuals, institutions, and businesses must be Built to Adapt. At Pivotal, we believe change should be expected, embraced, and incorporated continuously through development and innovation, because good software is never finished.

--

--