Imhotep: Large Scale Analytics and Machine Learning at Indeed
This talk was held on Wednesday, March 26, 2014.
To scale the building of decision trees on large amounts of Indeed job search data, we created a system called Imhotep. In addition to being a crucial tool for building these machine learning models, Imhotep has proven to be applicable to many different analytics problems. The core of Imhotep is a distributed system that manages the parallel execution of queries across a set of time-sharded inverted indices.
This talk covers Imhotep’s primitive operations that allow us to build decision trees, drill into data, build graphs, and even execute SQL-like queries in IQL (Imhotep Query Language). We discuss what makes Imhotep fast, highly available, and fault tolerant.
Audio Description
The following video includes a descriptive audio track for this talk.
Transcripts
- Basic transcript (includes audio information only)
- Descriptive transcript (includes audio and visual information)
Speaker
Jeff Plaisance is a senior software engineer at Indeed.
Originally published at Indeed Engineering Blog.