# Logistic Regression: Predict The Malignancy of Breast Cancer

1. Pravangasta Suihangya Balqis W.

# INTRODUCTION

Supervised learning is also known as supervised machine learning, it is defined by the use of labeled data sets to train algorithms that classify data or predict outcomes accurately. Guided learning can be separated into two types of problems when mining data, namely classification and regression. Classification method is usually used in business problems such as churn analysis and risk management, classification will go through the data process in order to get more accurate data. In Classification there is a target category variable in the classification. The regression is actually almost the same as the classification but the regression cannot find a structure that is classified into classes. The Regression method looks for a pattern and assigns a numeric value to it. Regression techniques can be used to predict the future. The relationship between one or more independent variables and the dependent variable can be modeled using regression analysis.

Import library:

# Exploratory Data Analysis

Exploratory Data Analysis covers the critical process of initial investigative testing on data to identify patterns, find anomalies, test hypotheses, and check assumptions through summary statistics and graphical (visual) representations. EDA can help detect errors, identify outliers in data sets, understand relationships between data, explore key factors, find patterns in data, and provide new insights. EDA is very useful for statistical analysis. In this EDA we divide into two parts, namely Univariate and Multivariate

# Machine Learning Theory

In the machine learning phase, we must first separate the dependent variable from the independent variable. For independent variables, because there are quite a lot of columns in this dataframe, we will take it according to the EDA phase, which is 10 columns with the highest correlation to the independent variable. we will divide into x and y then we will do train test split data.