Exploratory Data Analysis (EDA) solution to Kaggle caravan insurance challenge on R

Kieran Tan Kah Wang
Analytics Vidhya
Published in
15 min readSep 17, 2020

--

Photo by Kevin Schmid on Unsplash

The Caravan Insurance Challenge was posted on Kaggle with the aim in helping the marketing team of the insurance company to develop a more effective marketing strategy. The dataset consists of 5822 records of customer data collected by the insurance company on 85 different socio-demographic and product-ownership data features.

Machine learning (ML) algorithms is the solution to help identify those customers who are most likely to purchase caravan policies especially since we are given 5822 training data and 238 testing data, but the purpose of writing this article, is to tackle the problem using just Exploratory Data Analysis (EDA) without the complicated ML models.

Overall data statistics

After obtaining the datasets from the Kaggle link, we can read the training data ticdata2000.txt into R. I did mine using Jupyter Notebook with the R kernel installed as I find Jupyter Notebook more user-friendly and effective for me to debug any errors. You can find the link to install R kernel onto Jupyter Notebook with this link.

--

--