Reverse Engineering the GoPro Cineform codec
Following the fine traditions of great codec reverse engineers it has now become the norm to write-up the trials and tribulations of the process.
Basically for many years nobody in the open-source world really cared about the Cineform codec (aka. CFHD, Cineform HD etc) - it’s a niche wavelet codec that if uploaded to Youtube, Facebook or Vimeo fails to decode and so clearly isn’t important enough for these sites to go and license a proprietary decoder. But recently GoPro bought Cineform and made it the default output in GoPro Studio, their editor, and Adobe Premiere added it as an output option. It appears the main goal of using CFHD is to let people edit 4K files easily on underpowered hardware and/or systems with poorly performing proprietary (no alliteration intended) H264 decoders (seriously, just use libavcodec, it’s far better than anything anyone will license to you).
Worse still it has what appears to be a very convoluted binary decoder when looked at in a disassembler making the process possible very painful and so the community worked on supporting lower-hanging fruit instead.
This all changed in 2014 with the following headline: “GoPro® CineForm Codec Standardized by SMPTE® as the VC-5 Standard” — unfortunately the open-source world doesn’t like paying SMPTE for standards developed by a secret club (more on that another day). But nonetheless it looked like these files could be decoded one day by someone who felt like stumping up the cash. Unfortunately, about 1.5 years later, for $dayjob reasons I had to become a SMPTE personal member to get hold of some other standards and so I now had access to all them.
The contents of SMPTE ST 2073 looked sane, there was a description of the bitstream format based around 4-byte tag/value combinations, entropy coding, dequantisation, transform; some of which would have taken ages to figure out from the binary. The document was in many ways similar to the BBC’s Dirac codec and could easily have been implemented in a weekend. However, real-world files didn’t seem to match a lot of the document which was odd. All of this was buried in the small-print of course: “the core technology behind the GoPro CineForm Codec has been standardized [sic]”. So in reality existing files were not decodable by a VC-5 decoder.
In practice what would happen is you’d be forced to use a specific vendor’s decoder so that your legacy CFHD files would play along with your new Standardised® files. It’s a game many vendors play with SMPTE, intentionally or otherwise — they produce a document that only explains part of how to implement things in the real-world but the creator of the technology can go and claim their product is standards-based. You can only rely on vendor-specific implementations to actually have things work as intended — AVC-Intra is a good example of this — I did huge amounts of reverse-engineering to add this to x264.
Actual Reverse Engineering
So the main reverse engineering started during my spare time on trains. A couple of the tags were present but there were a lot of undocumented ones. A few of the tags also didn’t behave as they were described in the spec and there were dozens of undocumented tags. The binary did explain what a few of them were though:
The spec mentioned that lowpass coefficients (in this case a 1/64th sized version of the frame) were written without quantisation. By luck one of the samples I had was largely a single colour and so it was easy to locate those in a hex editor:
From there the actual tags to demarcate coefficients could be found (0x00 0x04 0xf 0xf) and the first picture could be decoded (that’s what a landline phone used to look like for those under the age of 16):
The spec said coefficients were Run-length (RL) Variable-length-coded (VLC) and thankfully libavcodec already has support for doing fast decoding. It was pretty simple to borrow existing code from the DV decoder to generate signed RL-VLC tables. Interestingly, the decoder had never supported codewords as long as CFHD’s before (up to 26-bits) and so I had to send a patch for that.
Some files didn’t seem to have the expected escape codeword and it was clear they had a different codebook. Thankfully the binary decoder showed us exactly what the codewords were:
Strangely, they start at table 9 (what happened to 0–8?), table 17 seems to be the same as 18, which is the one in the spec. Also table 9 seemed to have 3 escape codewords, or perhaps special codewords — not really sure what’s going on there.
Once all the coefficients were decoded, it was a case of implementing the dequantisation and decompanding as described and the implementing the separable 2/6 wavelet as described. In this part the spec was pretty good (except for the alignment discussed below). The only thing was that the output was very dark. The spec talked about doing a pre-scale shift of one of the transform outputs — this seemed to improve the picture but the suggested level appeared to be the wrong one. Samples looked better when you shifted at a different level. EDIT: turns out the spec is fine about this part, though merely suggests you do the shift when it’s actually mandatory.
And there is the final picture playing nicely in FFplay:
The rest is pretty much tidying things up before mailing list review. It is quite tedious and boring so I wrote this blog post instead.
Interesting things not in the spec
- An interesting feature where coefficients were pre-aligned to mod-8 widths to presumably allow for SIMD optimisations. It was a bit confusing at first to see these extra coefficients but aligning the width to mod-8 made the calculations always give the correct number of coefficients. This saves the trouble of having to extract all your coefficients and then add the padding back or track where the padding should be during the extraction process; shame it was dropped. The cost of coding that padding is negligible.
- How to identify pixel formats (see below)
- Any of the metadata in dozens of tags
- A bunch of tags have duplicated or unnecessary meanings, not entirely sure why
Interesting things in the spec
- One of the goals of wavelet codecs is to let decoders choose to decode only certain subbands in order to show a lower resolution decode but decoded much faster. Unfortunately the CFHD bitstream appears to have no way of telling you how large the highpass coefficient data is going to be which means you have to do RL-VLC decoding and then discard all the data. VC-5 adds a length which makes that much easier. In principle you could look for the end of coefficients tag but that might well occur in the coefficient data, not sure if it’s escaped or guaranteed not to happen à la MPEG-2.
- There’s also a general fixup of issues in the bitstream format
- There is at least one older CFHD file with more than the 10 mandatory subbands that all other samples seem to have. Not sure exactly what’s going on with that one, the luma looks ok but the chroma is totally broken. It also has a tag to mark a frame as a repeat which suggests they were trying to make a higher compression variant. Has a different transform-type flagged too.
- It’s not really clear how to distinguish pixel formats (RGB, YUV, Bayer) etc. There’s a tag at the beginning of the file but I only have a few samples to compare. At the moment YUV422 samples only work which is the majority of the ones I have. EDIT: YUV422 and RGB work now, RGBA dependent on a new FFmpeg pixel format. Bayer at some point.
Thanks to Kostya Shishkov for his advice and to Steinar Gunderson for providing some sample files. 2016 is the year of CFHD on the Linux desktop!