Analytics Vidhya
Published in

Analytics Vidhya

Deep Dive in Machine Learning with Python

Part — XIV: Initial Data Analysis (IDA) with example

Image Link

Import required python libraries

Python packages

Load the ARFF(Attribute Relational File Format) Dataset file

Loaded ASD dataset

Step-1: Change the character encoding

Character encoding

Browse the dataset

Step-2: Datatype Handling

Features datatypes

Step2.1: ‘AGE’ converted to dtype ‘INT’

Filled the NULL values in ‘AGE’
AGE dtype converted to INT

Step2.2: Labelling ‘GENDER’ to dtype ‘INT’ (1 represents m(i.e. male) and 0 represents f(i.e. female))

COUNT of MALES and FEMALES
GENDER encoded to 0 and 1

Step2.3: Labelling ‘BORN_WITH_JAUNDICE’ to dtype ‘INT’ (1 corresponds as ‘yes’ and 0 as ‘no’)

Before Labelling
After Labelling

Step2.4: Labelling ‘FAMILY_MEMBER_WITH_PDD’ to dtype ‘INT’ (1 corresponds as ‘yes’ and 0 as ‘no’)

Before Labelling
After Labelling

Step2.5: Labelling ‘USED_SCREENING_APP_BEFORE’ to dtype ‘INT’ (1 corresponds as ‘yes’ and 0 as ‘no’)

Before Labelling
After Labelling

Step2.6: Converting the data types of ‘Screening Questions’ variables to ‘INT’

Step2.7: Labelling ‘SCREENING_SCORE’ to dtype ‘INT’

Before dtype conversion
After dtype conversion

Step2.8: Labelling ‘ASD_Label’ to dtype ‘INT’ (1 corresponds as ‘yes’ and 0 as ‘no’)

Before Labelling
After Labelling

Step2.9: Standardizing data of ‘WHOS_COMPLETING_TEST’

Before Standardizing
After Standardizing

First-hand cleaned DataFrame

Courtesy WWE and The New Day

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rajesh Sharma

It can be messy, it can be unstructured but it always speaks, we only need to understand its language!!