By Jayden Soni

As a software engineering intern at Civis this summer, I worked on a project supporting our Identity Resolution product. Under the hood, it relies heavily on Apache’s distributed processing framework Spark to run our proprietary person matching algorithm. While we currently maintain a Kubernetes cluster to handle the jobs, scripts and notebooks run in Civis Platform, our Spark applications are currently run directly using Amazon EMR. Version 2.3, however, has made it possible to run Spark applications on Kubernetes instead. …


Civis Analytics

Building a Data-Driven World |

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store