How it all began.

Image for post
Image for post
Illustration by Héizel Vázquez

We have a name for the scientific exploration of data to find valuable information and solve business problems: Data Science (DS). This statement is not a definition, just a basic description.

DS has been there for a while now. I have covered my thoughts on the field extensively in the past (check the bottom for resources), but I want to talk about something different in this article.

DS is not coming out of anywhere. …

Comparing different NLP techniques and methods with Python and other tools to detect fake news.

Image for post
Image for post
Illustration by Héizel Vázquez

Are these tweets real or not?

Image for post
Image for post

Data discovering using Explorium and Python in a few simple steps.

Image for post
Image for post
Illustration by Héizel Vázquez

When you are working in data science one of the hardest parts is discovering which data to use when trying to solve a business problem.

Remember that before trying to get data to solve a problem you need to get the context of a business and the project. With context I mean all the specifics on how a company runs its projects, how the company is established, its competitors, how many departments exist, the different objectives and goals they have, and how they measure success or failure.

When you have all of that you can start thinking about getting the required data to solve the business problem. In this article I won’t talk that much about data collection, instead, I want to discuss and show you the process to enrich the data you already have with new data. …

How to build a basic machine learning model with Apache Spark and Python.

Image for post
Image for post
Illustration by Héizel Vázquez

I love Apache Spark. It was one of the first frameworks I used for machine learning and data science. It has been growing steadily in the past years, and we are close to its 3rd version. The changes we are expecting are mostly in the optimization of queries and processes so the API won’t change that much. This means that what you will learn in this article will work for a while (we don’t know for how long because life it’s tricky).

The article it’s divided into three parts:

  • Basics of Apache Spark
  • Installing and using Apache Spark
  • Creating your first Apache Spark machine learning…

For more follow me on twitter: https://twitter.com/faviovaz

Or how to enrich your datasets and create new features automatically.

Image for post
Image for post
Illustration by Héizel Vázquez

One of the hardest things, when you are working with a new dataset, is to discover the most important features for predicting your target, and also, where you can find new sources of information that can improve your understanding of the data and your models.

In this article, I’m going to show you how to do that without any programming skills. Yes, that can sound weird right now but bear with me. In future articles, I’ll explore other programming libraries that can help you do this and see which approach gives better results.

We are going to do this with an example dataset. We are going to use the House Sales in King County, Seattle, USA dataset. You can find all the information about the data…

38 free resources to learn calculus, algebra, and statistics.

Image for post
Image for post
https://open.spotify.com/show/5nrspdHxUxzc9TkEibpxD5?si=l69w-RnHR965oBAASVd7sQ

Disclaimer:

This guide is not “the definitive” one because it’s my take on the subject, and also I hate when people say that what they say it’s the definitive thing.

Hi all! This is a reproduction of my latest newsletter for Data Science Now, a show I have with my company Closter about the latest trends in data science, machine learning, and AI, but I also cover educational material like this.

You can hear the podcast version here:

And if you prefer you can watch the video recording here (sadly we had issues with the video, so it’s just a Youtube video with the…

Using the power of programming to write better and improve readability.

Image for post
Image for post
Illustration by Héizel Vázquez

If you write articles, blog posts or reports to your boss a lot, then you want to make sure people are understanding what you say. In this article, I’ll show how to use a Python library called textstat to determine readability, complexity, grade level and more about your text.

Installation

You can easily install textstat via the Python Package Index (PyPI):

pip install textstat

You can also install the latest version from GitHub:

git clone https://github.com/shivam5992/textstat.git
cd textstat
pip install .

Usage

There are a lot of useful functions in the library to:

  • Count syllables
  • Perform lexicon counts
  • Count sentences
  • Run the Flesch Reading Ease Score to assess the ease of readability in a document. …

Let’s stop making stuff up and be better scientists.

Image for post
Image for post
Illustration by Héizel Vázquez

If you had any fundamental or elementary physics in your life you’ve heard something like this:

To every Action there is always an equal Reaction.

That’s an extract of Newton’s third law, that states:

To every Action there is always an equal Reaction: or the mutual actions of two bodies upon each other are always equal, and directed to contrary parts.

Here’s a picture from The mathematical principles of natural philosophy translation in 1729:

Image for post
Image for post
https://archive.org/details/bub_gb_Tm0FAAAAQAAJ/page/n63

Most of you probably knew that, and also, I’m pretty sure you’ve heard people applying this to EVERYTHING in life. I mean everything.

Let me show you some…

An introduction to the (possible) future of data science.

Image for post
Image for post
Illustration by Héizel Vázquez

Welcome to a new series on data science. Here I’ll start making an introduction to some concepts and definitions that will guide our study. To understand this article, I recommend that you read these other articles I’ve written in the past:

I will try to define a new beginning for our field that I’m calling:

Knowledge Data Science

I’m taking this idea from the field of Knowledge Engineering, but just the name, the definition will be a little different. …

How to use the library spleeter from Deezer to start creating your karaoke.

Image for post
Image for post

If you are a DJ, a music sampler or something like that, normally you spend hours creating tracks from original music, even spending thousands of dollars on software that can help you do that. A step in that process is to be able to separate vocals and music, and sometimes vocals, bass, drums and more.

The amazing people at Deezer just released a package that will helo you do that very fast, easy and for FREE!!

How It Works

Image for post
Image for post
https://sigsep.github.io/

In the paper “Singing Voice Separation: A Study on Training Data” the people at Deezer explain the process to separate the tracks using deep neural nets. …

About

Favio Vázquez

Data scientist, physicist and computer engineer. Love sharing ideas, thoughts and contributing to Open Source in Machine Learning and Deep Learning ;).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store