Running DaVinci Resolve on Linux and the meaning of stability
Configuring the ideal Linux workstation that’s connected to the Internet
Recently Blackmagic Design announced that version 11.0 of its DeckLink driver would work on CentOS 7.6. I inquired as to whether this meant that DaVinci Resolve itself had been tested by the Blackmagic Design team on CentOS 7.6, but received no such confirmation.
That shrug about CentOS 7.6 resurfaced a long-standing question in my mind about how Blackmagic Design develops Resolve on Linux, what versions of CentOS are actually advisable to use, and overall how Linux should best be deployed in a post-production facility as a platform for DaVinci Resolve.
Despite forum threads full of eccentrics porting DaVinci Resolve to all sorts of other Linux distributions, the official word from Blackmagic Design is that they only test and support their very specific customized build of CentOS 7.3, a.k.a. CentOS-7.3-1611 . The “1611” in that naming convention refers to the build of CentOS compiled from the upstream open-source code from the release of RHEL 7.3, from November, 2016. Blackmagic Design provides a link to download their specially tweaked build of CentOS 7.3 in their installation instructions alongside the copy of DaVinci Resolve for Linux, which is available for download on their website.
Troubleshooting in the guts of CentOS 7.4
Last year, back in the spring of 2018, with some new HP Z8 workstations, I nearly drove myself crazy troubleshooting what, in the end, turned out just to be just a bug internal to Resolve. Fortunately, Blackmagic Design eventually did fix the bug.
After months and months of suffering from GPU issues on a MacPro6,1, first with dual D500s, and then, after a bizarre repair by Apple in which they installed one D500 and one D700, I finally switched to the HP Z8. The terrible GPU issues on the MacPro6,1 had been due to an intrinsically flawed hardware design by Apple — high-end professional software like DaVinci Resolve really does try to use as much as it can from whatever GPUs it can access, and it seems that Apple didn’t seriously anticipate actual professional workloads when designing the MacPro6,1.
I sourced several HP Z8s, each with two M.2 NVMe SSDs. I had HP load one with Windows 10 Pro so that I could run Adobe CC apps. The other M.2 NVMe SSD came blank, ready for me to install Linux for Resolve.
Initially, just referencing the system requirements in the release notes for Resolve, I saw that a minimum of CentOS 6.8 was required. Seemed simple enough, right? Wrong.
I installed CentOS 7.4, figured out how to properly install the NVIDIA driver, installed DaVinci Resolve, and was then horrified to suffer the “Timeout: Waiting for Frame” error.
In the course of troubleshooting that error, I frantically called not just the Blackmagic Design office in the United States, but also the Singapore office and the Japan office. At each office, the very first question from each support rep was the same: was I using the particular build of CentOS 7.3 provided by Blackmagic Design? I wasn’t, although, in the end, this was just an internal bug that turned out to be irrelevant to my issue, and Blackmagic Design fixed it in 15.1.1.
Blackmagic Design’s specially tweaked build of CentOS 7.3
In the course of troubleshooting, why was Blackmagic Design so focused on making sure that I was running their one specially tweaked build of CentOS 7.3? Their ISO of CentOS 7.3 is modified in a few important ways so that Resolve can run nearly out-of-the-box:
- It has an NVIDIA driver pre-installed, with the
- For the workstation to act as a PostgreSQL server for other Resolve workstations to connect, for “collaborative workflow” and “remote rendering,” it has: (1) PostgreSQL pre-installed and configured; and (2) the firewall disabled.
Blackmagic Design’s build of CentOS 7.3 has version 390.42 of the NVIDIA driver pre-installed. This is provided because, for the less experienced, installing that driver is a bit of a pain.
Back in 2012, Linus Torvalds, the creator and maintainer of the Linux kernel, crudely castigated NVIDIA for their development process, in a now-infamous moment. [Warning: NSFW language and gesture]
Torvalds’ preference would be for NVIDIA to contribute open-source code into the Linux kernel natively, but NVIDIA prefers to keep their driver code proprietary.
The CentOS Project doesn’t ship NVIDIA’s driver code in any of their official repositories, so one way to get the proprietary NVIDIA driver onto CentOS is to grab a driver’s
.run file from NVIDIA’s site and then to install it manually. That involves:
- Downloading and installing the kernel development headers;
- Downloading and installing the exact version of
gccthat was used to compile the exact version of the kernel installed on the workstation; or if a newer version of
gccis installed, changing an environmental variable for the NVIDIA installer to recognize the newer version of
gccas the older version of
gccas if that newer version of
gccwere indeed the older version of
gccthat was actually used to compile the exact version of the installed kernel;
- Switching over to one of Linux’s virtual terminals;
- Permanently blacklisting the included
- Loading the driver as a kernel module via DKMS.
This is quite an involved process!
On the “Additional Information” tab on NVIDIA’s page for their driver download, they admit:
Note that many Linux distributions provide their own packages of the NVIDIA Linux Graphics Driver in the distribution’s native package management format. This may interact better with the rest of your distribution’s framework, and you may want to use this rather than NVIDIA’s official package.
On CentOS, one third-party repository that indeed provides NVIDIA driver packages is ELRepo.
The Blackmagic Design build of CentOS 7.3 also includes PostgreSQL pre-installed so that the particular workstation can be used as a PostgreSQL server. Strictly speaking, Resolve by itself doesn’t need any kind of PostgreSQL tools installed to the local boot drive — any Resolve workstation could just access a remote PostgreSQL server running PostgreSQL 9.5. I myself run an Intel NUC as a PostgreSQL server in just this way.
Typically a dedicated PostgreSQL server is on a single LAN in a facility, but Jan Klier has reported that he’s successfully connected Resolve to a PostgreSQL database out on the Internet, hosted on AWS.
Originally, Resolve on Linux required PostgreSQL, either accessing a database local to the boot drive, or a remote server. Nowadays, Resolve on Linux can also use disk databases, which by default are set to a directory on the boot drive, but disk databases can also be put on any persistent local storage volume.
Blackmagic Design’s build also has the firewall disabled entirely. CentOS is typically deployed as a server, often on the open Internet, so the generic upstream build from the CentOS Project has the firewall enabled by default. For a workstation on a LAN protected by a typical firewall, another local firewall probably isn’t necessary. macOS, for example, has a built-in local firewall that’s disabled by default, and that’s fine for many millions of users. For a Resolve workstation on a LAN using “collaborative workflow” features, CentOS’s local firewall that’s enabled by default interferes with the east-west traffic from one PostgreSQL client to another.
Back when I was scratching my head about the “Waiting for Frame” error on the upstream build of CentOS 7.4, before I had properly disabled the workstation’s local firewall, I found that some GUI elements wouldn’t quite work with “collaborative workflow” enabled across different workstations: on the Media page, I wouldn’t see other users in the other bins; on the Color page, I wouldn’t see other users on the other clips; and the chat window wouldn’t really work at all. Later, on CentOS 7.6, I noticed that if I tried to use “collaborative workflow” without disabling each workstation’s local firewall, I’d see a pop-up warning me to check network settings and disable any VPNs.
A brief history lesson
To understand why Blackmagic Design recommends this one configuration of CentOS 7.3, and why they don’t test or vouch for later versions of CentOS, we need to understand a little history.
Back in August of 2018, Ramy Katrib of DigitalFilm Tree gave a great presentation to LACPUG that partially focused on Resolve’s history, going all the way back, before da Vinci Systems’ bankruptcy and its subsequent fire sale to Blackmagic Design. It’s worth watching the whole presentation to get a sense of where Resolve came from, and where it’s going.
Take a look at the official hardware and configuration guide for Resolve 15. Blackmagic Design writes:
We recommend Mac or Windows systems for their ease of configuration and use, their simple files systems (sic) and off the shelf components that generally plug and play. In comparison, Linux systems have a very rudimentary operator desktop, have the most restrictive hardware options, require exacting configuration and also a high performance file system.
The Linux OS does however offer an extremely reliable and powerful platform for facilities that consistently need high performance with high-resolution files from various codecs and is therefore the system of choice for the largest facilities in Hollywood and beyond.
From seeing Katrib’s presentation and the reading language in the hardware and configuration guide, you can start to sense how Blackmagic Design treats software development for Resolve on the Linux platform.
Resolve used to be exclusively deployed on CentOS or RHEL in a big iron configuration, in suites costing several hundreds of thousands of dollars. Resolve was initially developed to replace color timing suites and film laboratories. It was a groundbreaking way to switch from the analog film world of yesteryear to the digital, file-based workflows of today. Software is eating the world, and the post-production industry is no exception.
Blackmagic Design’s strategy for Resolve to disrupt the post-production industry has been composed of several interdependent ingredients:
- Switching from selling entire turnkey systems to just selling software that users install onto their own hardware;
- Heavy reliance on GPUs for image processing;
- Cutting the price from $600,000 down to a $999 version and a free version, and then later, even further down to a $299 version and a free version; and
- Porting the software from RHEL/CentOS to Mac and Windows.
Each of these ingredients allowed Resolve to skyrocket to more than a million users by version 12.5 in 2016.
Despite the explosive growth of the number of users on the Mac and Windows builds, Blackmagic Design didn’t abandon development on Linux. For a few years, they just limited it to owners of the $30,000 Advanced Control Panel. Given that properly configuring Linux usually requires an IT professional, and that Linux is the OS capable of running a supercomputer with up to eight GPUs, it just wasn’t a priority to open it up to anyone else. Also, my guess is that Blackmagic Design probably didn’t want to expend support resources on tedious questions from inexperienced Linux users.
How Blackmagic Design approaches Resolve for Linux
Blackmagic Design seems to expect that the typical Linux workstation for Resolve is in a kind of high-end, Hollywood-caliber post-production facility, with its own IT support staff, and even an air gap. They expect that these workstations are purpose-built for DaVinci Resolve.
What are the implications of such an environment? It’s completely understandable as to why they don’t vouch for the latest NVIDIA drivers. It’s also understandable as to why they continue to provide dongles, even though they began providing license keys. License keys allow users to activate “Studio” builds of DaVinci Resolve through the Internet, but that won’t suffice for post-production facilities with an air gap. DaVinci Resolve Product Manager Peter Chamberlain explained,
But some systems cannot be connected to the internet ever, even when initially registering, because these systems are doing high end feature films and other secure type of work. So to handle that, we will continue to offer the dongle version of DaVinci Resolve Studio. This version is called DaVinci Resolve Studio Dongle so it can be identified and it’s also available for order if you need that version.
Limitations of Blackmagic Design’s build of CentOS 7.3
For purpose-built workstations in a facility with an air gap, Blackmagic Design’s build of CentOS 7.3 is fine. However, I want my Linux workstations connected to the Internet, so I’ve found Blackmagic Design’s build somewhat limiting, in a variety of ways:
- The ISO for the build is missing the
isohybridfeature, so the ISO can only be used to create bootable DVDs.
- On the HP Z8, it can’t use UEFI. It can only use legacy BIOS.
- Any kind of OS update immediately breaks the system. I don’t know why, but I think it has something to do with the NVIDIA driver. I made this mistake once, and I couldn’t get the GUI to start at boot. I don’t know if it was
Xor the NVIDIA driver or something else. I didn’t bother troubleshooting where in the stack it went wrong. I ended up just reinstalling the whole OS, with my tail between my legs. That was a lesson painfully learned. Blackmagic Design explicitly warns against updating their build, and they even configure it so that it won’t ask for updates, although there’s a little checkbox in the window when rebooting or shutting down that does tantalizingly offer to install system updates.
- Because the system can’t be updated, there are serious security problems with connecting it to the Internet. Take a quick glance at Red Hat’s list of security advisories for RHEL. Red Hat releases tons of security patches and bug fixes constantly. For a workstation actually connected to the Internet, even behind a firewall protecting the whole LAN, it just won’t do to ignore all those security patches.
- Google Chrome can be installed manually whenever there’s a new version, but the stable release channel is updated fairly frequently, and it’s a bit annoying to constantly have to go to the Google Chrome website and download and install a new
.rpmfor each release. Sometimes there are urgent fixes for catastrophic CVEs, and you want to reduce as much friction as possible to make sure that users update. It’s much easier in the long-run to let the system’s RPM Package Manager ping the Google repository and automatically download and install the latest versions, but if you can’t actually update the system, you won’t be able to make use of the Google repository.
- I couldn’t successfully get
hfsplus-toolsfrom the Nux Desktop repository to work. I also couldn’t successfully install Paragon’s tools for NTFS and HFS Plus. I think this had something to do with the software requiring newer kernels and/or other newer pieces of software in the distribution. Not being able to mount HFS Plus is a headache in the world of post-production. Many different production teams still rely heavily on Macs and HFS Plus, and I need to be able to mount production drives and ingest media with DaVinci Resolve’s Clone Tool or
rsync, without having to reboot and switch back to Windows.
- It comes with Python 2.7.5 installed, but I couldn’t successfully install Python 3.6. I asked in the Blackmagic Design Fusion forum, the We Suck Less forum, and on Twitter, but never figured it out. Python 2.7 is EOL on January 1st, 2020, so I don’t want to invest too much time tinkering with scripts on 2.7.
The allure of upstream CentOS
For about a year, I’ve been wondering if I could alleviate some or all of the frustrations I’ve had with Blackmagic Design’s build of CentOS 7.3 by switching to upstream CentOS. I knew that if were I to try it, Blackmagic Design might shrug were I to encounter any strange unforeseen issues. Still, I figured that the fact that some users had been clamoring for a DeckLink driver for CentOS 7.6 must have been some kind of indication that folks really have deployed Resolve on upstream CentOS out in the wild.
Maybe, even though Blackmagic Design wouldn’t vouch for upstream CentOS, it would still be stable, predictable, and ready for a production environment anyway.
I went ahead and installed upstream CentOS 7.6-1810, and installed DaVinci Resolve on it. Happily, for the most part, it all works better than ever. I keep notes on how to do this over on GitHub Pages, because software is constantly changing, and it’s great to able to merge pull requests from anyone who might notice any out-of-date or incorrect information.
The meaning of stability
I thought it would be worth risking a deployment of upstream CentOS, because of how I understand the explicit development philosophy around the Linux kernel, RHEL, and CentOS.
Linus Torvalds has stated that “first rule of kernel maintenance” is to “never break [user space].” The Linux kernel maintainers always aim to ensure, however they modify the kernel, that applications will continue to work.
This approach is generally what software developers mean by “stability,” as codified by ISO/IEC 9126. In this context, stability is “the capability of the software product to avoid unexpected effects from modifications of the software.” What would the Platonic ideal of kernel stability look like? It would mean that every time a new kernel gets released, and every time a Linux distribution pushes that new kernel out to users, that all applications already installed on the Linux system in user space would continue to function flawlessly. In practice, it’s never perfect, but it is a principal goal.
Down in the kernel, the maintainers strive for stability, but what about up in user space, in RHEL/CentOS? Red Hat officially espouses stability:
One of the core goals of the Red Hat Enterprise Linux family of products is to provide a stable, consistent runtime environment for thirdparty applications. To support this goal, Red Hat seeks to preserve application binary compatibility, configuration file compatibility, and data file compatibility for all package updates issued within a major release. For example, a package update from Red Hat Enterprise Linux 6.1 to Red Hat Enterprise Linux 6.2, or a package update that fixes an identified security vulnerability, should not break the functionality of deployed applications as long as they adhere to standard Application Binary Interfaces (ABIs).
Within the RHEL ecosystem, you can even fine tune
yum so as to only install security updates, if you’re particularly conservative and want to minimize changes to the system while still responsibly patching to squash dangerous CVEs.
The fundamental question for a sysadmin of a post-production facility considering deploying DaVinci Resolve on upstream CentOS is: to what degree does Blackmagic Design actually adhere to Red Hat’s formal guidelines for stability for their different software and hardware products? Red Hat does indeed have formal designations called “compatibility levels” that communicate how stable any particular third-party package or library really is, but Blackmagic Design just seems stuck on their one build of CentOS 7.3. After all, Desktop Video 11.0 was only released on March 5, 2019, but CentOS 7.6 was released back on December 3, 2018. I also recently ran into some frustrating challenges using the Mini Panel on CentOS 7.6.
I have no formal relationship with Blackmagic Design. I’m just an especially enthusiastic user and outside observer, but I am puzzled and frustrated at how they’re willing to test DaVinci Resolve on Mac and Windows, and provide support for those millions of Mac and Windows configurations, but won’t consistently test and support CentOS as it gets released. Perhaps Blackmagic Design will eventually put more resources toward Linux support, as Resolve grows in popularity and as Apple continues to abandon professional hardware. DaVinci Resolve is an amazing product, and my only wish for it is to improve even more. I have high hopes for whatever Blackmagic Design announces at NAB 2019.