We have a name for the scientific exploration of data to find valuable information and solve business problems: Data Science (DS). This statement is not a definition, just a basic description.
DS has been there for a while now. I have covered my thoughts on the field extensively in the past (check the bottom for resources), but I want to talk about something different in this article.
DS is not coming out of anywhere. …
When you are working in data science one of the hardest parts is discovering which data to use when trying to solve a business problem.
Remember that before trying to get data to solve a problem you need to get the context of a business and the project. With context I mean all the specifics on how a company runs its projects, how the company is established, its competitors, how many departments exist, the different objectives and goals they have, and how they measure success or failure.
When you have all of that you can start thinking about getting the required data to solve the business problem. In this article I won’t talk that much about data collection, instead, I want to discuss and show you the process to enrich the data you already have with new data. …
I love Apache Spark. It was one of the first frameworks I used for machine learning and data science. It has been growing steadily in the past years, and we are close to its 3rd version. The changes we are expecting are mostly in the optimization of queries and processes so the API won’t change that much. This means that what you will learn in this article will work for a while (we don’t know for how long because life it’s tricky).
The article it’s divided into three parts:
One of the hardest things, when you are working with a new dataset, is to discover the most important features for predicting your target, and also, where you can find new sources of information that can improve your understanding of the data and your models.
In this article, I’m going to show you how to do that without any programming skills. Yes, that can sound weird right now but bear with me. In future articles, I’ll explore other programming libraries that can help you do this and see which approach gives better results.
We are going to do this with an example dataset. We are going to use the House Sales in King County, Seattle, USA dataset. You can find all the information about the data…
This guide is not “the definitive” one because it’s my take on the subject, and also I hate when people say that what they say it’s the definitive thing.
Hi all! This is a reproduction of my latest newsletter for Data Science Now, a show I have with my company Closter about the latest trends in data science, machine learning, and AI, but I also cover educational material like this.
You can hear the podcast version here:
And if you prefer you can watch the video recording here (sadly we had issues with the video, so it’s just a Youtube video with the…
If you write articles, blog posts or reports to your boss a lot, then you want to make sure people are understanding what you say. In this article, I’ll show how to use a Python library called textstat to determine readability, complexity, grade level and more about your text.
You can easily install textstat via the Python Package Index (PyPI):
pip install textstat
You can also install the latest version from GitHub:
git clone https://github.com/shivam5992/textstat.git
pip install .
There are a lot of useful functions in the library to:
If you had any fundamental or elementary physics in your life you’ve heard something like this:
To every Action there is always an equal Reaction.
That’s an extract of Newton’s third law, that states:
To every Action there is always an equal Reaction: or the mutual actions of two bodies upon each other are always equal, and directed to contrary parts.
Here’s a picture from The mathematical principles of natural philosophy translation in 1729:
Most of you probably knew that, and also, I’m pretty sure you’ve heard people applying this to EVERYTHING in life. I mean everything.
Let me show you some…
Welcome to a new series on data science. Here I’ll start making an introduction to some concepts and definitions that will guide our study. To understand this article, I recommend that you read these other articles I’ve written in the past:
I will try to define a new beginning for our field that I’m calling:
Knowledge Data Science
I’m taking this idea from the field of Knowledge Engineering, but just the name, the definition will be a little different. …
If you are a DJ, a music sampler or something like that, normally you spend hours creating tracks from original music, even spending thousands of dollars on software that can help you do that. A step in that process is to be able to separate vocals and music, and sometimes vocals, bass, drums and more.
The amazing people at Deezer just released a package that will helo you do that very fast, easy and for FREE!!
In the paper “Singing Voice Separation: A Study on Training Data” the people at Deezer explain the process to separate the tracks using deep neural nets. …