Predicting Breast Cancer Using Logistic Regression
Learn how to perform Exploratory Data Analysis, apply mean imputation, build a classification algorithm, and interpret the results.
Background
Breast cancer is the second most common cancer and has the highest cancer death rate among women in the United States. Breast cancer occurs as a result of abnormal growth of cells in the breast tissue, commonly referred to as a tumor. A tumor does not mean cancer — can be benign (no breast cancer) or malignant (breast cancer). Tests such as an MRI, mammogram, ultrasound, and biopsy are commonly used to diagnose breast cancer.
In this tutorial, we are going to create a model that will predict whether or not a patient has a positive breast cancer diagnosis based off of the tumor characteristics.
This dataset contains the following features:
- id (patientid)
- name
- radius (the distance from the center to the circumference of the tumor)
- texture (standard deviation of gray-scale values)
- perimeter (circumference of the tumor, approx. 2*3.14 *radius)
- area
- smoothness (local variation…