Road to the VideoCoin Testnet — Part 1

Published in

VideoCoin

5 min readOct 18, 2018

I am very excited to provide a detailed update on The VideoCoin Network Development and our roadmap to launching a testnet.

TL;DR: We’ve made a lot of progress on our testnet and we are on track to launching it in Private Alpha end of January 2019 as previously noted on our Roadmap. We are working towards a preview of the testnet, that will be available earlier for a sneak peek.

Since there is a lot of material to cover, I will write this in a five part series split in the following fashion.

Proof of Transcoding: Research and Development Update
Distributed Encoder/Live Video Streaming Platform Development Update
Wallet and toolchain development update
Architecture overview of the VideoCoin testnet
Beyond the testnet, path to Beta and Mainnet

And we will conclude by looking at hiring, opportunities, challenges and our plans to mitigate these.

We are also planning a meetup here in San Jose on November 9th (More details coming up on it soon). Where we will plan a demo of our live streaming platform (NOT our testnet.)

Proof of Transcoding (PoT): Development and Testing

As a recap from our white paper, proof of transcoding is a way to establish whether a transcode operation — a fundamental operation in video infrastructure — was properly completed or not.

When we first set out to implement PoT earlier this year, we implemented a bitstream analysis based model. The original architecture was as below

This bitstream method worked, but it had a bunch of serious limitations. Namely,

Both source and destination encoders needed to know exact frame numbers, without which the exact bitstream match is not possible
Encoders were forced to use CQP settings, as CRF or VRF would make bitstreams vary based on present encoder state. The only way to work around this problem was to recreate the encoder state. More on this later.
Implementation was codec-dependent, meaning that every codec had a different version of header parsing methods.

Sample diff operation from a bitstream compare method. Both streams are the same.

In order to overcome these limitations, we have re-written our PoT code using an additional perceptual hash check. We briefly covered perceptual hash in our white paper as a means to establish proof of retrievability, but this excellent research thesis by Christoph Zauner is a great in depth explanation of pHash and its applications.

We rewrote PoT to use the pHash implementation by Evan Klinger & David Starkweather.

pHash implementation is itself thoroughly tested and documented by Evan and David, which can be found here. It includes a DCT Variable Length Video Hash which extracts key frames from a video and forms a similarity index.

We wanted to implement a faster and less compute intensive version of the video hash function, and also evaluate how the pHash function works during one of the most common use-cases, bitrate and resolution reduction.

Optimized Video pHash for Proof of Transcoding

We started with these two videos of Cater Lake, shot on our Live Planet Camera in 4K Stereoscopic 3D.

Still from the source video 2. H264, 4096x2160, ~50Mbps

We then integrated pHash into our PoT implementation (see architecture below) for some pretty impressive results.

Hash distance over bitrate change

Input Video: H264, 4096x2160; Mining Parameters: -vf scale=1280:720

Hash distance over resolution change

Input Video: H264, 4096x2160; Mining Parameters: -vf scale={1920:1080,1280:720,640x360}

Hash distance over codec change

Analysis

We ran these tests over hundreds of variations of source and destination formats and established the following

Hash distance sharply increases at very low bitrates. This is not really a problem because we took a source video that was at 4096x2160 resolution at 50Mbps and cut it down to 50, which never happens in the real world
The anomalous behavior at higher bitrates where hash distance has changed by 2 can be simply explained by the fact that when ffmpeg extracts a frame from the destination stream, it sometimes seeks ahead to match time with a frame boundary. This may cause some similarity drift, causing hash distance increase.
pHash threshold of ≤2 seems to be a good target to hit for a successful compare operation

Performance

Both the bitstream analysis method and pHash method took approximately 80 Milliseconds to complete operation on a 4 CPU Core i5 Gen 7 CPU. That is nearly 12.5 Transactions every second on a low end CPU!

Attack vectors

Although using pHash looks great on paper, it does not come free of cost. It opens up many attack vectors for PoT itself, and we needed new strategies to come up with proper defenses.

No Transcode Attack — The hash distances on various bitrates are very close. What this means is an attacker can simply forward the source file with absolutely no transcoding and PoT would still establish that the mining operation was successful. Solution — While at first it looks like a daunting challenge, the easiest way is for a verifier node validating PoT to look for estimated chunk size based on mining parameters and compare destination file size. Additionally, the verifier can occasionally replicate encoder state and get an exact bitstream match.
Repeat Frame attack — An attacker can pick up a random frame and repeat it through out the video in hopes of getting a close enough hash distance. This is specially likely in video surveillance type footage where frames do not change much.
Crop, cut and watermarking and other advanced encoding like captions and overlays fail — If any such advanced features are added in mining parameters, pHash will most likely resolve very far and PoT will trigger a false negative. Solution to this is again to replicate encoder state and do a bitstream match.

Conclusion

Our re-implementation of PoT has resulted in massive performance gains and perceptual hash calculations opens up a whole new arena of possibilities with precomputing hashes for a client free verification.

What to expect in the Next Part

We will discuss in depth our implementation of our distributed encoder. I plan to make it a presentation/screencast so we can witness the massively parallel distributed encoder in action.