Predicting Breast Cancer Using Logistic Regression

Learn how to perform Exploratory Data Analysis, apply mean imputation, build a classification algorithm, and interpret the results.

Mo Kaiser
The Startup

--

Source: DataCamp

Background

Breast cancer is the second most common cancer and has the highest cancer death rate among women in the United States. Breast cancer occurs as a result of abnormal growth of cells in the breast tissue, commonly referred to as a tumor. A tumor does not mean cancer — can be benign (no breast cancer) or malignant (breast cancer). Tests such as an MRI, mammogram, ultrasound, and biopsy are commonly used to diagnose breast cancer.

In this tutorial, we are going to create a model that will predict whether or not a patient has a positive breast cancer diagnosis based off of the tumor characteristics.

This dataset contains the following features:

  • id (patientid)
  • name
  • radius (the distance from the center to the circumference of the tumor)
  • texture (standard deviation of gray-scale values)
  • perimeter (circumference of the tumor, approx. 2*3.14 *radius)
  • area
  • smoothness (local variation…

--

--

Mo Kaiser
The Startup

Data enthusiast, lifelong student, avid reader