I created a simple Web Application with Spotify API, Python Dash, and Flask. Spotify users can access the app giving permission to the app to use the data. A lot of cool statistics are displayed!
You need a Spotify Account to access. Allow up to 20 seconds to load.
I am a Data Scientist, with an academic background in Electrical and Computer Engineering. After completing university in 2017, I immediately started a Ph.D. Through the Ph.D. journey, I discovered Data Science. Machine Learning and Data Science Books, Youtube Videos, Online Courses, Podcasts, Kaggle, all combined made me a self-taught Data Science Aspirant. So, after I completed my Military Service, I found a Data Science job (still doing the Ph.D.). …
Outlier Detection is also known as anomaly detection, noise detection, deviation detection, or exception mining. There is no universally accepted definition. An early definition by (Grubbs, 1969) is: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs. A more recent definition by (Barnett and Lewis, 1994) is:
An observation which appears to be inconsistent with the remainder of that set of data.
Straight from this excellent article, the most common causes of outliers are:
Normalization and standardization are similar — they rescale the features. They are used in data analysis to understand the data, and in machine learning to perform better training with certain algorithms.
This article includes:
Using House Prices Dataset from Kaggle.
In about 20 minutes from now, you will have a playlist in Spotify that automatically receives songs from your favorite subreddits.
You will have to set up this once and enjoy it forever. It is very easy and requires no coding knowledge.
Ok, but why do I want to create such a playlist?
If you are like me, then you love to discover new songs. Except for the great recommendation systems that Spotify and other services provide, which are generated by machine learning, I have found that human recommendations are more diverse and more interesting.
One of the best human music recommendation places on the internet is Reddit. There are numerous subreddits for every music taste. A list with the most popular can be seen here. These subreddits contain mostly songs as a post in the format: ‘Artist - Song’s…
In this article, we will:
We have a dataset. It is splitted into two parts. One is called training and the other testing. They have the same number of columns, except one. Training has also the target.
Our task is to fit a model on the training data and predict the unknown target on the testing data.
We can’t just fit on the whole training data and expect things to go well on testing. We need to validate that our model captures the hidden patterns in the training data, is stable, does not overfit, and generalizes well on unknown data. …
Twitter is the most successful microblogging service with 150 million daily users. 6.000 tweets are written every second. People tweet about everything that comes in mind and use hashtags to associate the tweet with a topic.
We can build a machine learning classifier to rate tweets based on their Sentiment. A tweet can express a positive, negative, or neutral Sentiment.
I will create a simple model to classify such tweets in real-time and create a graph for the overall Sentiment of COVID. What’s the people’s sentiment about the virus?
First, I need a dataset of tweets that are already classified into one of the three categories to train my model. Sem-eval provides a relatively big dataset of 65.854 already labeled tweets. As there is no COVID-specific twitter dataset, I will use a general twitter dataset. …
We will use data from the kaggle competition M5 Forecasting — Accuracy.
The task is to forecast, as precisely as possible, the unit sales (demand) of various products sold in the USA by Walmart.
More precisely, we have to forecast daily sales for the next 28 days. The data covers stores in three US states (California, Texas, and Wisconsin) and includes item level, department, product categories, and store details.
The data is enormous, and for this demonstration, I will use a subset of them, a product from the dataset with a lot of sales.
Our goal is to compare classical time series analysis techniques with machine learning algorithms. …
Time Series Forecasting is the process where we try to do the impossible: predict the future.
If anyone says that has constructed the perfect time series forecasting model, well, we have to be cautious. Sure, some models are better than others and the error can be quite small for some observations, but overall the future is unpredictable. Something might happen in the future that never occurred in the past, so even this “perfect” model will fail.
In this article, we will go step-by-step through the time series forecasting procedure using three relatively simple forecasting methods and predict the unknown future using the Triple Exponential Smoothing model. …
In Part 1 we looked at:
Everything was accompanied by theory and code.
In Part 2 we will continue our journey with:
Let’s remember our dataset with a glimpse of its first rows.
Time series is a sequence of observations recorded at regular time intervals.
Depending on the frequency of observations, a time series may typically be hourly, daily, weekly, monthly, quarterly and annual. Sometimes, you might have seconds and minute-wise time series as well, like, number of clicks and user visits every minute, etc.
Most problems use time-series data. Anything that is observed sequentially over time is a time series.
Examples of time series data include:
Time series analysis involves understanding various aspects of the inherent nature of the series so that you are better informed to create meaningful and accurate forecasts. …
About