Intuition — Example

Tatiana owns a Vegetarian store and has a great number of loyal customers. She would like to know how valuable each customer is in monetary terms and how much they will spend in the future. This information will help her to plan promotions, send gift bonuses, and keep profitable customers.


There are several definitions for the sector and project but one I liked is:

Sometimes Data Scientist needs to expose their Machine Learning models to be consumed by other members of the team (e.g Back-ends developers) or directly by final users. In these cases implementing an API is useful.

This post will explain steps to train a model, store classifier on Google Cloud Storage and use Cloud Machine Learning to create an API in order to perform online predictions. The classifier will be trained using iris flower data set which consists of 3 different types of irises (Setosa, Versicolour, and Virginica). The rows being the samples and the columns being features: sepal length, sepal…

Cloud Dataflow is an excellent solution to move data around, and several articles have being dedicated to use Dataflow to ELT data into BigQuery. This post is dedicated in the opposite direction. I have some table a need in BigQuery and want to move it to MySql.

This post will be build on top on the previous Dataflow post How to Create A Cloud Dataflow Pipeline Using Java and Apache Maven , and could be seen as an extension of the previous one.

Goal: Transfer some columns from BigQuery table to a MySql Table.

Disclaimer: I am a newbie on…

Cloud Dataflow is a managed service for executing a wide variety of data processing patterns.

This post will explain how to create a simple Maven project with the Apache Beam SDK in order to run a pipeline on Google Cloud Dataflow service. One advantage to use Maven, is that this tool will let you manage external dependencies for the Java project, making it ideal for automation processes.

This project execute a very simple example where two strings “Hello” and “World" are the inputs and transformed to upper case on GCP Dataflow, the output is presented on console log.

Disclaimer: Purpose…

Cluster analysis is a technique whose purpose is to divide into groups (clusters) a collection of objects in such a way that:

  1. The objects of the same group are the most similar possible.

2. The objects of the same group are the most similar possible (internal cohesion of the group). And the objects of different groups are as different as possible.



  • Recognize the differences or similarities between groups of objects and describe them in graphic or algebraic form to achieve a better understanding of a given domain.


  • Associated subjectivity and difficulty in validation.


  • Useful to find interesting insights in data and visualize information.


  • Marketing: customer segmentation.
  • Social networks: identification of communities

Algorithms examples

  • K- Means clustering
  • Hierarchical clustering

K-Means Clustering

This post collect some slides I made in order to teach python (to co-workers, colleagues, friends) etc. I use python mostly for Data Science and Machine Learning, and although I use python every day I consider myself an intermediate python user. This slides help me to deep in knowledge and teach others on the same path.

  • Class 1

Python Advantages, Anaconda, Spyder, Jupyter, Python Data Types, Variables, Math operators, Logic operators, Input/Output operators, Comments.

  • Class 2

If/else, for loops, while loops, functions, Fizbuzz.

  • Class 3

Tuples, Lists, Sets, Dictionaries

  • Class 4

Error handling, Read/Write files.

  • Class 5

Numpy, Arrays, Subarrays.

This tutorial will describe how to install both Python versions (2.7 and 3.6) on a Windows 10 environment. Additionally, how to add python path in windows 10 will be discussed.

PATH is an environment variable on Unix-like operating systems, DOS, OS/2, and Microsoft Windows, specifying a set of directories where executable programs are located”

source :

1. Download python 2.7

Go to and click on ‘Download Python 2.714”.

Wait until installation package is complete.

This are some slides I made a year ago where recopilated different sources over internet trying to understand the most basic theory behind how ’Support Vector Machines’ works. At the end I perform a pipeline where I use SVM algorithm from scikit-learn library to perform a gender classification from male and female voices.

Hope to make a related blog in the future.

Pd: Unfortunately I don’t own that slides account anymore.

Translated from Brandon Rohrer’s Blog by Jose Miguel Arrieta R.

La inferencia bayesiana es una forma de obtener mejores predicciones de tus datos. Es particularmente útil cuando no tienes la cantidad de datos que quisieras y quieres aprovechar hasta el último bit de fuerza predictiva de la misma.

Aunque a veces se describe con reverencia, la inferencia bayesiana no es mágica o mística. Y a pesar de que las matemáticas pueden volverse densas, los conceptos detrás de él son completamente accesibles. …

Edit May 2019

I made a new version using Google Cloud Platform.

This tutorial will help you build a classifier as a service. The classifier will be trained using iris flower data set witch consists on 3 different types of irises’ (Setosa, Versicolour, and Virginica). The rows being the samples and the columns being features: sepal length, sepal width, petal length and petal width. Scikit-learn library will be used for machine-learning algorithms.The classifier will be stored in a S3 bucket and a lambda function will used to make classifications, finally an Amazon API Gateway will be used to trigger the…

Jose Miguel Arrieta

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store