Cloud Bursting for VFX Rendering
Headquartered in Toronto, Canada with offices in Los Angeles and Atlanta, SPINVFX is a creative and technically dedicated visual effects studio producing captivating imagery for feature film and television. Established in 1987, SPINVFX is a creative and technically dedicated visual effects studio producing captivating imagery for feature film and television. Over its 30 years of business, SPINVFX has evolved into an internationally recognized studio working with respected Directors, VFX Supervisors and Producers. Its film credits include Suicide Squad (2016), John Wick and John Wick 2 (2014 and 2017), and the Academy Award winning biopic Spotlight (2015). SpinVFX’s television credits include the Emmy award-winning series Game of Thrones (HBO), the thrice-nominated show The Borgias (Showtime) and the upcoming Netflix original series, The Umbrella Academy.
On the heels of years of success on the awards circuit, and an increasing audience for premium television content, SPINVFX requirements for infrastructure capacity had outpaced the availability from existing hardware investments. Driven by a growing queue of scheduled render jobs waiting for processing, client production schedules keeping existing physical server assets consistently occupied, and on-demand reshoot and change requests compounding the requirement for additional resources, the SPINVFX management team decided to evaluate cloud infrastructure for burstable render capacity supplementary to SPIN’s existing resources in lieu of procuring additional on-premise server infrastructure. Beanfield Metroconnect, Amazon Web Services (AWS), and Curious Orbit worked with SPINVFX to create a framework for Proof of Concept (POC) testing for the suitability of Amazon EC2 virtual machines for remote mount burstable rendering using Pixar Renderman, SideFX Mantra and Tractor Render Management. The POC goals included:
- Defining the minimum bandwidth and latency performance thresholds for render bursting
- Testing multiple AWS regions for the viability of render bursting
- Benchmarking EC2 render workload performance for comparison with physical server assets
Working in partnership with Beanfield, Megaport, and AWS, SPINVFX connected their internal network via AWS Direct Connect circuits to the Amazon Virtual Private Cloud (VPCs) and EC2 resources located in various AWS regions, including US-East-1, US-East-2, US-West-2, and Canada Central.
The SPINVFX POC test leveraged AWS resources in each of those regions for benchmarking purposes that would capture performance metrics detailing the impact of latency and bandwidth on VFX rendering workload performance. This POC was unique because AWS EC2 servers would access media assets stored on SAN infrastructure at the SPINVFX office, instead of using cloud-based, cloud-adjacent, or cloud-caching storage services.
The SPINVFX POC successfully proved the viability of remote mount burstable rendering without large data synchronization transfers to cloud storage, it defined the bandwidth requirements and latency limitations for AWS Direct Connect services delivered over long-haul fat networks (LFNs), and finally highlighted the requirement for edge cache filing solutions to overcome latency related performance limitations above 8 milliseconds for render workloads.
Following testing and benchmark analysis across all four of the previously defined regions, Canada Central was selected as the AWS Region for a large-scale production VFX render workload test. This AWS region met the minimum latency and performance thresholds that allowed EC2 C5 and C4 VM instances to successfully mount NFS shares located on in-house Dell EMC Isilon equipment for VFX frame processing. As part of the production test, a record-setting number of Amazon EC2 C5 and C4 virtual machines in that region were procured by SPIN through On-Demand billing.
“Historically the service models available in the Canadian market for VFX render workload outsourcing don’t provide VFX studios with very much flexibility. Projects need to be scheduled, and that server capacity often requires a term commitment. Rescheduling or increasing capacity can be even more difficult. Production schedules, however, evolve rapidly. When our render demand spikes, ideally SPINVFX can quickly and easily double or triple our available VFX render capacity. Bursting to EC2 HPC resources give us that flexibility. For VFX studios to burst to cloud resources at scale, we need access to AWS Spot Markets for cost-efficient EC2 pricing. Accessing the US EC2 Spot Markets will allow us to match that flexibility with cost efficiencies we won’t find in our home market or by purchasing equipment.”
- Over the 60-day POC period, SPINVFX leveraged AWS Direct Connect to move over 330TB of data (ingress) into their AWS test VPCs and extracted over 80TB (egress) data back to on-premise storage.
- Over 17,000 vCores and 32,000 GB of RAM using Amazon EC2 C5 and C4 virtual machines in Canada Central were deployed during a single workload render test.
- SPINVFX successfully defined the connection latency upper limit for performance viability of remote mount burstable rendering over AWS Direct Connect at 8 milliseconds (RTD —AWS VPC to Customer).
- SPINVFX deployed over 500 high-performance Amazon EC2 C5 and C4 virtual machines during a single render workload test proving scalability of high-performance computing applications over AWS Direct Connect.
The following lessons and limitations have been identified through the POC:
- When compared to On-Demand EC2 pricing, Reserved Instance or Spot Market pricing discounts are ideal billing models for cloud bursting for VFX rendering, in particular for companies looking to displace on-premise infrastructure.
- Availability of EC2 Spot instances in Canada Central was limited during the POC period. Connecting to multiple AWS regions with Spot Markets will provide VFX studios with the capacity and scale required for cost-efficient pricing of EC2 compute optimized VM types needed for VFX rendering workloads.
- Canadian customers will require long-haul connectivity to US AWS regions closest to their traffic’s point of origin meeting the bandwidth and latency performance requirements defined in this POC.
- Testing is required to determine if “Edge Cache Filing Technology” such as Avere vFXT can extend the usability of AWS Direct Connect circuits beyond the current upper limit of 8 milliseconds.
About Beanfield Metroconnect & Megaport Canada
In partnership, Megaport Canada and Beanfield Metroconnect provide your business with the comprehensive service needed to connect your enterprise to the cloud directly. Available now from both the Beanfield and Megaport Cloud Exchange services, Hyper-Route for AWS VXCs are ultra low-latency private point-to-point services connecting our customers to strategic AWS Direct Connect locations across the United States & Canada. In January 2019, Beanfield Metroconnect was proud to announce with Amazon Web Services Canada that Beanfield was the first telecommunications company in Canada to be awarded an AWS Network Competency.
About Curious Orbit
Curious Orbit is an Advanced AWS consulting and training partner located in the Greater Toronto Area (Oakville). Their team has a demonstrated expertise and focus on providing AWS training, consulting, professional services, and managed services to mid-sized organizations embarking on digital transformation and cloud terraforming journeys.
About the Author:
Daniel Simmons is the Director of Cloud Strategy at Beanfield Metroconnect, a cloud product manager, cloud evangelist, and a solution architect for cloud and data centre services. Daniel has spent the past year overseeing the launch of the Beanfield and Megaport partnership with the goal of bringing multi-cloud connectivity to every office building in the city. He lives in Toronto with his partner and dog.