Audible’s Jiun Kim talks scaling big data for customer intelligence

As a member of Audible’s data science team, Kim designs solutions for customer acquisition, retention, and engagement.

Jiun Kim is a Data Scientist at Audible Inc., an Amazon Company. Day to day, she develops predictive models for abuse and fraud detection and customer intelligence. Born in Korea, she began her career as a statistician for the United Nations International Labor Organization and United Nations Economic and Social Commission for Asia and the Pacific located in Bangkok, Thailand. Kim saw the potential of big data early on, and in 2013 decided to move to New York City to pursue a Masters in Statistics from Columbia University. In a conversation with NYC Media Lab’s Amy Chen, Kim shares her path to data science.

How did you get started in data science?

I see data science as an interdisciplinary field. It’s a combination of statistics, computer science, mathematics, and social sciences. I had a background in statistics and a good understanding of social sciences. At Columbia, I focused on improving my coding skills by taking computer science related classes. Also, I began participating in data science challenges like Kaggle competitions. Whenever competing, I picked teammates who had strong coding skills so I could learn from them. Python is the most predominantly used programming language in the data science industry, especially in tech companies. Taking part in data science challenges helped me further develop my confidence in coding in Python, ultimately advancing my career in tech.

Now that you’re working in the tech industry, how exactly are you working with data in your role?

I spend big chunks of time working on three things: data collection, data processing, and modeling. Amazon has data bigger than anything I have ever seen; figuring out what data needs to be utilized, and collecting, stitching, processing data isn’t a trivial job. The aforementioned process was practiced in school to some degree, but not at this scale. Optimizing the prototyping process is crucial and it’s where we invest much time and effort. Because data science models are typically built with an aim of solving business questions, translating my work to the non-technical part of the organization is required.

At Audible, how does the company organize around data science?

Broadly speaking, Audible’s data science team focuses on the customer space and the content space. Data scientists in the customer space mostly work on Customer Relationship Management (CRM) including acquisition, retention, engagement, and habituation. Essentially, anything that has to do with customer’s life cycle. Folks in the content space work on business questions related to content such as recommendation engines and content acquisition, as we work heavily with publishers. I’ve been functioning in the customer space by working on abuse models and revenue prediction models.

What’s a recent project you’ve worked on?

One project that I worked on was related to identifying customers who abuse our return policy. An algorithm was developed to detect customers with abusive behaviors. The results were then handed to the business to take appropriate next steps. This research helped to minimize unnecessary cost for the company.

How have you seen data science evolve in the two years you’ve been at Audible?

About two years ago, the data science team was created at Audible. Since then, we have made a significant impact on the business by helping them understand customers better and being able to predict their behaviors. Data science models have empowered the business by taking more customized and targeted actions and a proactive approach.

How are you exploring machine learning and AI in your work?

Machine learning methods are used day-to-day. So I guess I explore machine learning every day. Besides, I try to participate in external meet-ups and hackathons. In a previous hackathon at Audible (when my team was one of the winners!), I developed a prototyping tool for narrator self-assessment. The idea allows narrators to record themselves using the system, and it analyzes how they can improve themselves in pitch, range, and other techniques. To even go further, narrators can receive recommendations on genre to narrate based on their voices. This can be seen as an application of speech recognition.

What advice would you give to aspiring data scientists?

My advice for students: while you are in school, be sure to work on as many data science projects as possible. That will enable you to consider questions such as how to collect data, what kind of features to use, models to build, and how to best present the result to non-technical people. Also, be familiar with large-scale data handling processes and machine learning techniques. Develop a methodology that can be applied to solve different types of problems.

Subscribe to NYC Media Lab’s data science newsletter to read more data-focused interviews: