Media Streaming with Distributed Storage

There are several Distributed Storage Solutions offerings many benefits over Cloud offerings. I have previously written an article about some benefits of distributed storage. I also wrote an article comparing efficiency and features of various distributed platforms.

One potential disadvantage compared to Cloud however, is because of additional complexities introduced by ‘Sharding’ or Erasure Coding (EC) of data used by some solutions, it is no longer so easy to share public data or in particular media files as there needs to be a middle layer to process and serve the files.

This becomes particularly prevalent when having to serve large media files. In most cases, the entire file needs to be downloaded and reconstructed from the parts respective storage hosts before the file can begin to be 'served’ to the user.

The nature of (streaming) media files

All the while, the data expected to play in the next few seconds or even minutes is buffered in advance to the user experiences a smooth, uninterrupted playback experience.

While certain protocols such as HLS effectively split a video stream into small separate segments (files), this requires a CPU intensive conversion process and therefore adds another layer of complexity.

While this remains an option for some situations, providing the file uses a modern media format my main area of focus is to serve the original file in the most efficient manner without such processing.

HTML5 Browser Media Streaming

This initial request has multiple purposes:-

  1. The response header should specify the full file size, so this may influence subsequent data requests.
  2. It would also usually return an initial chunk of data.
  3. The method and speed of response if the initial chunk of data *may* also influence subsequent data requests.
  4. This initial chunk of data may contain metadata, bit-rate etc. which may also influence subsequent data requests.

So a typical scenario (responses aligned with 128kB chunks):-

[Browser] Requests bytes 0-

[Server] Response (partial) bytes 0–131071 file size 1234567

[Browser] Requests bytes 131072–1234566

(So at this point, particularly if the file is small, it will request the entire remainder of the file, but it is happy with whatever chunk size we give)

[Server] Response (partial) bytes 131072–262143

[Browser] Requests bytes 262144–1234566

(as you can see it just keeps requesting subsequent chunks after the last byte given. Of course if someone skips ahead on the track the request would start from a random point e.g.)

[Browser] Requests bytes 946888–1234566

(I might respond to this with another chunk-end aligned response)

[Server] Response (partial) bytes 946888–1048575

So understanding this forms the basis of serving file chunks. I had thought that I would need to serve awkward byte ranges but in all my testing, apart from Apple requiring just the initial 2 bytes, it seems happy with the ranges I provided!

The Problem using Distributed Storage for Media Streaming

Storj’s approach is that you either need to serve the files in entirety yourself, or you share them via its (centralised) website.

Arweave, Filecoin and some others don’t shard/EC their files in the first place so rely on pure replication. I have been unable to test any of these.

In the case of media files, Sia Skynet also takes this approach, dispensing with the storage efficiency of Sharding data and instead resorting to Replication of data.

All the above platforms require the file in its entirety before it can be served. You will likely be paying for the full file even in the event you require a small part of it. Of course once the entire file is retrieved, standard server software itself (Apache, NGINX) will handle the partial file serving for you.

But considering many videos are not watched in their entirety, the larger the video, the more inefficient this becomes.

0Chain however, aims to offer the best of both worlds. While maintaining its EC efficiency, it allows partial downloads of files (chunks) at a time. I have utilised this functionality to implement a media server that downloads requisite chunks of a video from a provided authticket.

How 0Chain can provide a solution

The method is simple. The chunk sizes are aligned to the chunk sizes of the storage allocation that the files are returned from. The EC ratio is totally configurable on 0chain, (unlike most other platforms) and in the example, my EC ratio is 6+2, (6 data + 2 parity). So each reconstructed chunk is constructed from 6 x 64kB Blobber chunks, giving a serveable chunk size of 384kB. (I have also allowed the ability to use larger chunk sizes by requesting multiple chunks at a time. This is done by allowing the minimum number of blocks per chunk, but it always rounds up to be a multiple of the serveable chunk size).

So as you can see, I have added a status monitor for the purpose of seeing how the chunks are being served in advance to your browser. It requests chunks until it has at least typically a minute or so in its buffer then periodically, when it drops below say 30 seconds it requests a few more chunks to maintain the buffer.

Of course, if you skip ahead, there will be a slight delay while the appropriate chunks are requested and fetched.

I have also built a basic cache facility into this and each file chunk is retained for a few minutes on the server before being deleted. You see can in the status monitor what chunks are already cached on my server (from a previous visitor) and which are being freshly downloaded in real-time.

About the 0Chain Media Server

It first decodes a provided authticket. The gives the file and allocation information plus the hash of the original file. It’s saves this by a cache of a hash of the authticket itself.

If required, it then requests from the network the metadata of the file. This includes the original file size and mime type and file hash. It stores a cache of the metadata by the file hash avoiding the need for subsequent requests.

If required, it also also requests the allocation data for the file including EC settings, so that the file chunk size can be determined. Again, this data is saved via a hash reference so only needs to be obtained once.

Then it just calculates the chunk that the initial byte of the requested range falls within. If it has not already got that chunk cached, it retrieves it from the Blobbers (Storage Nodes). It then serves the chunk with appropriate (partial) headers.

Of course, this is just a Proof Of Concept project, but could easily act as the basis for a fully-fledged Content Delivery Network. By intelligently adjusting the cache duration you could easily balance this to serve a huge amount of media files with very limited storage. Regional CDN instances could be set up for maximum distribution.

Portability of this method

A potential workaround would be to split the media files into chunks beforehand but then you have introduced a processing layer and you can’t just play them as normal, but there could be applications where this is appropriate.



NOTE: Player with stats and chunk cache deletion facility etc. featured in demo are not part of this repo.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store