Doug SteeninTowards Data ScienceSeminal Papers in Data Science: A Relational Model for Large Shared Data Banks50 years later, a review of some main concepts from E.F. Codd’s 1970 paper that laid the groundwork for relational databases and SQL6 min read·Oct 17, 2020----
Doug SteenBeyond the F-1 score: A look at the F-beta scoreTailoring the F-beta score for specific binary classification problems4 min read·Oct 11, 2020--2--2
Doug SteenHow to code your first simple game using PythonAll you need is a .py file and the command line!5 min read·Oct 4, 2020--1--1
Doug SteeninTowards Data ScienceProgress bars for Python with tqdmTrack the execution of Python iterations with a smart progress bar4 min read·Sep 26, 2020----
Doug SteenPrecision-Recall CurvesSometimes a curve is worth a thousand words - how to calculate and interpret precision-recall curves in Python.7 min read·Sep 20, 2020--4--4
Doug SteeninTowards Data ScienceUnderstanding the ROC Curve and AUCThese binary classification performance measures go hand-in-hand — let’s explore.7 min read·Sep 13, 2020--1--1
Doug SteeninTowards Data ScienceHow to build KNN from scratch in Python… well, at least without sklearn’s KNeighborsClassifier.7 min read·Sep 5, 2020--3--3
Doug SteeninTowards Data ScienceA Gentle Introduction to Self-Training and Semi-Supervised LearningCoding an example of self-training in Python to utilize unlabeled data for classification9 min read·Aug 30, 2020--1--1
Doug SteeninAnalytics VidhyaObtaining sports data from an API using Python requestsAs a data scientist, the ability to obtain data through an API is a critical skill. In this post, I provide a brief tutorial on obtaining…4 min read·Aug 23, 2020----
Doug SteeninAnalytics VidhyaImplementing PCA in Python with sklearnPrincipal Component Analysis (PCA) is a commonly used dimensionality reduction technique for data sets with a large number of variables…6 min read·Aug 16, 2020--2--2