Iris Fu
Iris Fu
Aug 13 · 1 min read

Written by Maxime Nay, Lead Data Engineer on Februray 23, 2018

Maxime Nay, Lead Data Engineer at GumGum gave a talk explaining GumGum’s Data Architecture and challenges associated with it on February 15th, 2018 at South Bay Java User’s Group. GumGum produces over 50 TB of new raw data every day. It amounts of more than 100 billion events per day. These events are processed using a typical lambda architecture. For a given use case we have a batch pipeline and a real time pipeline. Data produced from both of these pipelines is them merged to give a complete view of the data. GumGum has more than 70 such pipelines. Some of them do not have the real time component. Processing data at such scale involves maneuvering through many challenges. Maxime talks about the challenges and the steps taken to solve some of the problems we faced.

Processing 100 Billion Events a Day from GumGum on Vimeo.

The slides used in this presentation can be viewed at http://bit.ly/100-billion-events


We’re always looking for new talent! View jobs.

Follow us: Facebook | Twitter | | Linkedin | Instagram

gumgum-tech

Thoughts from the GumGum tech team

Iris Fu

Written by

Iris Fu

gumgum-tech

Thoughts from the GumGum tech team

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade