I worked with a data set obtained from the Medicare Part D website. The dataset contained about 1 million instances which I decreased down to about 500,000. It required additional cleaning, feature engineering, feature removal, and exploratory data analysis. But once that was all done my data was ready to be plugged into Tableau.
Fortunately Tableau’s interface is very user friendly, so once you get a hand of it, it’s a plug and chug, with a sprinkle of creativity.
I helped cofound a healthcare in my senior year of undergrad that worked to tackle the opioid crisis, so this dashboard…
SQL is a querying language being used through out almost every industry. It’s a must know language if you are going to remain professionally competitive. I was catching up with a friend who works in project management down in DC and she found herself on numerous occasions having to pick up SQL for her clients. Her background was not in Data, but it was a skill she had to pick up on the job.
As I continue my networking I’m finding that many professionals from a non-data analytical background have had to pick up SQL on their own. Start ups…
As a Data Scientist, you must familiarize yourself with a variety of different algorithms. One of my favorite algorithms that continues to fascinate me is Logistic Regression. Logistic Regression is a supervised learning classifier. Although Logistic Regression is used in traditional statistics its applications in in machine learning classification remain fascinating.
Before diving right into Logistic Regression it’s important to have some basic understanding of Linear Regression.
Linear Regression can be really useful when you are trying to predict a continuous output value from a linear relationship. But a Logistic Regression output values lie between 0 and 1; a probability…
If you’re like me, you’ve built upon your existing technical skills and are looking for new opportunities. A little bit about myself, I’m a Mechanical Engineer turned Data Scientist. I’m actively on the market for a new opportunity having completed Flatiron School’s Data Science program.
Remember, job searching is a process. The average job search takes 3 to 5 months. Job searching gets draining and it’s easy to lose confidence in yourself and be overcome with imposter syndrome. …
In 2019 I decided to consider different industries than the one I was in at the time. I had graduated with a Bachelor’s Degree in Mechanical Engineering from New York University’s Tandon School of Engineering in 2017. When I decided to pursue mechanical engineering back in my freshman year of undergrad, I felt Mechanical Engineering had a broad enough application where the skills obtained from my undergraduate studies could be applied to a variety of industries. Which is true, but it certainly helps when you know exactly what you want to do, and hence, can focus your studies on that…
Ensemble learning is the method of combining several different machine learning models to improve stability and predicting power. The end result is one strong, optimal predictor. The main principle driving Ensemble learning is that several weak learners are brought together to form one strong learner, in other words, an increase in accuracy.
Generally when performing a machine learning task, the primary cause for low predictability is noise, variance and bias. Ensemble methods help reduce the impact of Variance and Bias within your model. Noise should be tackled using Exploratory Data Analysis.
Ensemble Learning types are separated into three categories:
Do people still use Access databases anymore? Well even if they don’t I recently tried and it was surprisingly easy, though at times convoluted and frustrating. Access databases are the Microsoft version of simple desktop functionality for data infrastructures that one can use on the fly for hopefully non-complex database setups.
There is not a 1-to-1 correlation with Access sql and other database querying such as Oracle sql which there rarely is, as different databases usually have differing languages. So the first step I had to do was determining what worked in Access. …
One of the most alarming indicators of a poorly performing machine learning model is an accuracy test of the training and testing data. A test of your data will indicate if your model is overfit, underfit, or balanced. The reason we have train-test split is so that we can determine and adjust the performance of our models. Otherwise we would be blindly training our models to predict without any insight on the model’s performance.
“Your model is underfitting the training data when the model performs poorly on the training data.”
A common issue when drawing data from a slow API is that you may be operating on a deadline and you find that it will take you several days to get your entire dataset from the API.
Initially, you may find yourself just running a function to pull the data, and once you have your data in your Jupyter Notebook, you pickle it. But another common issue that comes up is an error popping up in the middle of your data extraction when you come back to work the next morning. That error will expectedly cause you to lose any…
As data scientists we are responsible for deriving insights from data sets and creating machine learning models to predict desirable outputs. Every data scientist needs to start off with an ability to be able to code in Python and familiarize themselves with some important packages.
These packages are important for data scientists as they are the tools that data scientists use to clean, extract and wrangle datasets into a desirable format.
Aspiring Data Scientists should familiarize themselves with these packages so as to prepare themselves for their future careers in the field.
Setting Up Data Science Environment
Data Scientist, with a background in Mechanical Engineering from NYU. Interests include sports, mental health, humanitarian support and tech news.