Sign in

PM @Doctolib after 5 years as data scientist. Worked for DataRobot, Capgemini & Accenture

Delivering on time with 100% predictability.

In this article I am going to write about two things: why managing project timelines is a must for Product Managers and how to leverage tools to achieve it with high confidence.

An example project timeline for my Product Team I made on asana

Why Product Management is so cool… but can be highly stressful

As I was starting a new career as a Product Manager in a promising Tech company, something quickly struck me: how complex the role was. It was unlike anything I’ve seen before. A mix of many jobs gathered in one. Name it entrepreneur, consultant, manager, sales… A PM is all over the place, doing many things at once.

And with that you come to realise a harsh truth…


Built by Microsoft, Yandex, Intel or by smaller organisations or individuals, these packages automate lots of redundant tasks in machine learning.

Introduction

Having some free time these past weeks, I spent time exploring resources that were in my backlog. And I was astonished. So many cool — and not necessarily well known — packages on machine learning that didn’t make it to the top of the charts. Their popularity looked like niche although they cope with fundamental challenges in machine learning.

So here they are. 9 really useful packages in ML to learn and implement at work: interpretML, dtreeviz, CatBoost, plaidML, mlflow, kedro, sklearn_pandas, Streamlit, pandas_profiling.

They are presented within the following topics:

  • Model interpretation (interpretML, dtreeviz)
  • Model building (CatBoost, plaidML)
  • Model…


CatBoost is changing the game of Machine Learning forever, for the better.

Introduction

As I was designing the content for a training on Machine Learning, I ended up digging through the documentation of CatBoost. And there I was, baffled by this immensely capable framework. Not only does it build one of the most accurate model on whatever dataset you feed it with — requiring minimal data prep — CatBoost also gives by far the best open source interpretation tools available today AND a way to productionize your model fast.

That’s why CatBoost is revolutionising the game of Machine Learning, forever. And that’s why learning to use it is a fantastic opportunity to up-skill…


Image taken from Pinterest from evocarehealth.com

Exercises

These are exercises that explore the possibilities of the pandas package. If you need to go back to the lesson, here it is:

And here are the corrections:

Creating DataFrames

Create DataFrames that looks like the following images. Oh, and find a clever way to create them, don’t just hard code the values in a dictionary:

1)


Image taken from Pinterest from evocarehealth.com

These corrections come from the following exercises:

Exercises

Creating DataFrames

1)

pd.DataFrame({
'n': range(3,11),
'n_squared': [n*n for n in range(3,11)]
})

2)

pd.DataFrame({'number_'+str(n):[n] for n in range(10)})

Don’t forget the [] next to the n.

3)

pd.DataFrame({'column_'+str(n):[n*m for m in range(10)] for n in range(10)})

Two list comprehensions intertwined.

Combining DataFrames

1)

customers1.append(customers2, ignore_index=True)

Note that ignore_index=True isn’t necessary, it’s good to get a clean new index.

2)

import ostransactions = pd.concat([
pd.read_csv(f)
for f in os.listdir('cust_transactions')
if f.endswith('.csv')
])

The ‘import os’ is necessary to call ‘os.listdir(‘cust_transactions’)’ which lists all the files in the folder ‘cust_transactions’. I added the ‘if f.endswith(‘.csv’)’ …


pandas and DataFrames

Since you’re using pandas, the first line you will have to write in (almost) all your notebooks is

import pandas as pd

In pandas, a DataFrame is a table. You can do many things on a DataFrame: sum over a column, plot a histogram, do a pivot table…

As a good start, simply write ‘pd.’ in an empty cell and use the keyboard shortcut Tab to see all the available functions within pandas. You’ll the

Creating DataFrames

To create a DataFrame, use:

pd.DataFrame({
'age':[10, 45, 13]
})

The function ‘pd.DataFrame()’ uses a dictionary:

{
'age':[10, 45, 13]
}

where ‘age’ is the…


Pandas is the go-to library for data science. These are the shortcuts I use to do repetitive data science tasks faster and simpler.

1. Analysing samples of dataframes with df.groupby().__iter__()

It’s usually hard to explore a dataset row by row or group by group within a Jupyter Notebook compared to what you can do with Excel. One useful trick is to use a generator and use Ctrl + Enter instead of Shift + Enter in order to iteratively look at different samples within the same cell, without creating a mess in your notebook.

First create a cell with the generator with .groupby() (or .iterrows()) and add the .__iter__():

Then, run the following cell as many times as you wish to observe the data that matters most to you, using…


Selenium is a powerful tool for advanced interactions with websites: login, clicks… Let’s use it for web scraping

Alright let’s do something ‘simple’ here: collect all the artists available on Spotify.

That’s a robot scrolling through Spotify’s catalog of artists

⚠️Obviously, I need to put a disclaimer here ⚠️
Don’t use this method to resell data you collect, these are privately owned by companies. In particular don’t resell or do anything illegal with Spotify data.

2nd remark: since Spotify has an API, it’s a bit stupid to get the data from the website. However this is a good exercise and you usually can get more (but slower) out of web scraping than using a restricted API (not sure that’s the case for Spotify though).

If you have…


In this article we’ll create our first graph database with neo4j.
After installing neo4j, we’ll explore 2 ways to create the database:

  • manually typing CYPHER queries
  • generating CYPHER queries using an Excel formula. This is very useful if you start from an Excel file

Before diving into the code, let’s first understand what is neo4j and why it’s important to know this technology.

What’s neo4j?

Screenshot of neo4j browser (source: neo4j website)

To put it in a simple way, neo4j is the mySQL of the graph databases. It provides a graph database management system, a language to query the database, a.k.a CYPHER, and a visual interface with the neo4j browser.

Neo4j is an open source project, maintained by a private company. It comes with a free version that we will use…

Félix Revert

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store