Solving complex problems leveraging our blameless culture

Sandeep Singh Hooda
DBS Tech Blog
Published in
1 min readNov 11, 2021

It was the most gratifying experience to participate in USENIX Association SRECon — the tech community that has the most innovative thinkers, leaders, and speakers globally that reach a diverse audience.

Koonseng Lim and I proudly share our experience through the tale “of Mice & Elephants”.

Just as in the well known myth that the mighty elephant cowers before the tiny mouse; the presence of small files in the Hadoop Distributed File System, can literally bring the elephant of big data to its knees!

In this talk we chronicle our journey of managing small files after an unfortunate incident rendered our multi-petabyte cluster inactive for close to 5 days. The aftermath of the incident spawned a flurry of collaborative work between our infrastructure SRE, enterprise SRE, platform SRE and application teams. We discuss the various lessons learnt and experiences gained from experimenting with and implementing various mitigating measures in the domain of people, process and technology to combat the scourge of small files; a perennial problem in Hadoop.

We show that with the proper mitigating controls and technical capabilities afforded by newer distributed file systems, mice and elephants can coexist happily, just as in real life.

--

--

Sandeep Singh Hooda
DBS Tech Blog

A believer, A Planner & An Architect | A real Change maker | SRE Leader | Containerization Evangelist | SRECon Speaker | Blameless Culture Advocate