You can visit the final app on rdok.net. The code is open-source on GitHub.
This is mostly a procedural article. The purpose is to describe the full process of creating an app without bombarding the reader with code snippets.
What’s the app about? The app is called Random Dose of Knowledge and looks like this
Most of the content of this article is from my recent paper entitled:
“An Evaluation of Feature Selection Methods for Environmental Data”, available here for anyone interested.
There are two ways to reduce the number of features, otherwise known as dimensionality reduction.
The first way is called feature extraction and it aims to transform the features and create entirely new ones based on combinations of the raw/given ones.
The most popular approaches are the Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Multidimensional Scaling. However, the new feature space can hardly provide us with useful information about the original features.
Reading and writing files using Pandas and NumPy is an everyday task for Data Scientists and Engineers.
Let’s compare the most common functions that these libraries provide to write/read tabular data.
We can make our code much faster in these I/O operations, save time, and make our boss and ourselves happy.
We can also save serious amounts of disk space by choosing the appropriate save function.
First, let's create a DataFrame of 10,000,000 rows and 2 columns.
The most common approach to save a Pandas DataFrame.
Podcasts are on the rise. They can be an alternative way for Data Scientists to learn and keep up with the latest news on the field. Immerse yourself in the industry and stay at the top of your field.
A podcast is a passive form of learning, so you can do other things at the same time. You can listen to podcasts when you take a walk, when exercising, when cleaning the house, or when relaxing.
I will recommend 5 active podcasts that post new episodes every week with durations ranging from 20 minutes to about an hour.
I rate all the movies I watch on IMDb and the website allows you to download a nice .csv with all your ratings. This .csv contains basic information about the movies. In order to perform topic modeling, I need the plots and/or summaries of the movies. I will grab this information from Wikipedia and use it to enrich the IMDb dataset. Then I will perform LDA for topic modeling on the plots+summaries of the movies to find 6 topics.
I will keep the article clean of code. The code is available here.
The purpose of this article is to:
I created a simple Web Application with Spotify API, Python Dash, and Flask. Spotify users can access the app giving permission to the app to use the data. A lot of cool statistics are displayed!
You need a Spotify Account to access it. Allow up to 20 seconds to load.
I am a Data Scientist, with an academic background in Electrical and Computer Engineering. After completing university in 2017, I immediately started a Ph.D. Through the Ph.D. journey, I discovered Data Science. Machine Learning and Data Science Books, Youtube Videos, Online Courses, Podcasts, Kaggle, all combined made me a self-taught…
Outlier Detection is also known as anomaly detection, noise detection, deviation detection, or exception mining. There is no universally accepted definition. An early definition by (Grubbs, 1969) is: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs. A more recent definition by (Barnett and Lewis, 1994) is:
An observation which appears to be inconsistent with the remainder of that set of data.
Straight from this excellent article, the most common causes of outliers are:
Normalization and standardization are similar — they rescale the features. They are used in data analysis to understand the data, and in machine learning to perform better training with certain algorithms.
This article includes:
Using House Prices Dataset from Kaggle.
In about 20 minutes from now, you will have a playlist in Spotify that automatically receives songs from your favorite subreddits.
You will have to set up this once and enjoy it forever. It is very easy and requires no coding knowledge.
Ok, but why do I want to create such a playlist?
If you are like me, then you love to discover new songs. Except for the great recommendation systems that Spotify and other services provide, which are generated by machine learning, I have found that human recommendations are more diverse and more interesting.
In this article, we will:
We have a dataset. It is splitted into two parts. One is called training and the other testing. They have the same number of columns, except one. Training has also the target.
Our task is to fit a model on the training data and predict the unknown target on the testing data.