How Meesho built an adaptive, on-demand video streaming solution

Paresh Goel
Meesho Tech
Published in
4 min readAug 29, 2019

Guest post by Paresh Goel, VP Engineering

At Meesho, we are always on the lookout for innovative ways to help our resellers (customers) and buyers make better buying decisions. We figured out that product videos would be one such way to help our resellers and buyers fully understand the product/products they intend to buy.

In this post, I will be outlining the technical architecture we built at Meesho towards creating an adaptive on-demand video streaming solution on the app.

Our goals

  1. Build an end-to-end on-demand video streaming service.
  2. Low Latency first playback — The playback during the first few seconds should be fast.
  3. Videos should playback on a versatile list of Devices / OS / Browsers- including Android, iOS, as well as Desktop browsers.
  4. Adaptive playback — the video quality should change automatically based on user device bandwidth and screen size.

Technical challenges

  1. Varied Codec support: The codec landscape in videos is highly fragmented. Popular codecs include H.264, H.265, VP9. We chose H.264 in our stack as this is supported in most modern OS and browsers.
  2. Device and Network Fragmentation: Our user base in India is on varied devices and network capabilities. From an iPhone with 4G speed to a sub INR 2,000 phone with 2G network connection — we had to support both. This coupled with intermittent connection meant that the video quality had to dynamically switch based on prevailing client capability.
  3. Automation: to ensure faster turn-around from video upload to delivery. Once a video is created and uploaded on our AWS S3 server, we had to quickly process the videos for final delivery. A lot of automation was built in to support this.

System architecture

We zeroed-in on HTTP Live Streaming (HLS) as our streaming protocol.
The basic idea behind HLS is to break the content into multiple packets — each of which can be referenced via HTTP. This is better than custom protocol layers like Flash since HTTP stacks are widely available. It’s easy to plug-in any CDN behind your content if that content is accessible via HTTP.

Adaptive streaming is supported by HLS via Manifest files (aka metadata files). Each manifest file is a playlist for a specific bit-rate. So if you want to stream at 5 different bit-rates, there will be 5 different manifest files — each referring to their own list of video packets.

Following is an example of a 1,500 kbps manifest file for a video:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ALLOW-CACHE:YES
#EXT-X-TARGETDURATION:11
#EXT-X-KEY:METHOD=AES-128,URI="hls1500k.key",IV=0x13242333333333322bb5fca
#EXTINF:10.880000,
hls1500k00000.ts
#EXTINF:10.800000,
hls1500k00001.ts
#EXTINF:10.800000,
hls1500k00002.ts
#EXTINF:10.800000,
hls1500k00003.ts
#EXTINF:7.200000,
hls1500k00004.ts
#EXTINF:10.800000,
hls1500k00005.ts
#EXTINF:10.800000,
hls1500k00006.ts
#EXTINF:10.800000,
hls1500k00007.ts
#EXTINF:7.200000,
hls1500k00008.ts
#EXTINF:10.800000,
hls1500k00009.ts
#EXTINF:10.800000,
hls1500k00010.ts
#EXTINF:10.800000,

The bit-rate specific files are wrapped in a Master manifest file.
The front-end video player will request a bit-rate specific manifest file based on client screen size, network bandwidth and other factors like CPU power. The HLS compliant player first requests the master manifest and then picks a bit-rate specific manifest. It may request multiple bit-rate specific manifest files if the situation so desires.

Following is an example of a Master manifest file:

#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=469000,RESOLUTION=400x224,CODECS="avc1.42001e,mp4a.40.2"
hls0400k.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=687000,RESOLUTION=480x270,CODECS="avc1.42001e,mp4a.40.2"
hls0600k.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1198000,RESOLUTION=640x360,CODECS="avc1.4d001f,mp4a.40.2"
hls1000k.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1792000,RESOLUTION=960x540,CODECS="avc1.4d001f,mp4a.40.2"
hls1500k.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2451000,RESOLUTION=1024x576,CODECS="avc1.4d001f,mp4a.40.2"
hls2000k.m3u8

Faster First-Chunk Playback

One thing that we were optimizing from the start was the first play-back time; i.e. how much time does it take to start playing the first chunk after the user has tapped on the play button.
This was a critical feature as our earlier experiments with embedded YouTube videos had proven that this time-lag negatively impacts the user experience.

We built several optimisations over top of the native player to achieve this.
For example, we did eager-loading of manifest files for videos that user could watch in the near future. We also parsed these manifest files upfront to figure the bit-rate specific manifest that the player will eventually play, and those were also cached.

Transcoding

Most likely you will receive the videos in an .mp4 / H.264 format from your content team. These have to be transcoded into the HLS format.

One way to do this is to spin your own instance in cloud and install ffmpeg on it. You will have to write some scripts to get files from storage (like S3), transcode via ffmpeg, and then put back to your destination storage (like S3).

You can also use AWS MediaConvert to build your processing pipeline.

Building an automated pipeline that takes a video (in house as well as those shot by our resellers) in MP4 format and delivers it to the app in a digestable (even on 2G and 3G networks) format, comes with several technical challenges. But building this service in house without a dedicated team of video engineers is something the tech team achieved in a month’s time.

Want to join the Tech team and team Meesho to create 20 million entrepreneurs by 2020, then apply here. We are hiring!

--

--