140 Followers
·
Follow

Image for post
Image for post

Python Machine Learning

After 10 months of Jupyter notebook by notebook, 149 lesson by lesson inchwise progress, I completed the Udemy Python for Data Science and Machine Learning course by Jose Portilla. It was a great follow on to Andrew Ng’s Intro to Machine Learning course that I finished January 2018. In the middle of the course at the beginning of this year, I took a break to analyze the CDC population health dataset doing exploratory data analysis (EDA). Jose’s course was well put together and give it solid recommendations for those looking for broad python based learnings.

More good things to come soon …


Detailed analysis of population health indicators, social determinants of health, and effects on gender and race

Image for post
Image for post
Photo by CMDR Shane on Unsplash

In part 2 of our CDC Chronic Disease indicator dataset, our analysis revealed several areas with highly correlated interrelationships — indicators within the cardiovascular disease, chronic kidney disease, diabetes, and select indicators in the overarching conditions “social determinants” category. While there are also highly correlated relationships in other areas such as cancer and COPD, we’ll be focusing primarily on the former set in this final blog post.

While Figure 1 from the previous post looked at the relationships among all the indicators, Table 5 from the previous post showed there were a number of top correlation pairs. By looking at the recurring patterns of indicators by specific topic, we can narrow down the scope of the topics of interest. …


The ties among diabetes, chronic kidney disease, and cardiovascular disease

Image for post
Image for post
Photo by v2osk on Unsplash

In the last story, we started looking into a 15 year chronic disease dataset from the U.S. Center for Disease Control and Prevention, or CDC. The beginnings of the exploratory data analysis started with understanding the columns and rows of data and what was relevant for further analysis.

In this post, we are going to dig deeper to understand these 400K rows and 17 categories of topics, which requires a bit of data wrangling of the dataframe into a format for pivot table summary and visualization. …

About

Daniel Wu

Digital Health, Data Science, Analytics, Product Management, and Innovation

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store