MLearning.ai
Published in

MLearning.ai

5 functions for making your job easier as Data Scientist.

Functions I wrote that will save you tons of hours.

⚡️ Reading data faster

Use the advantage of datatable’s reading speed without losing the advantages of using Pandas!

I don’t have anything personal against the datatable package but I got used to Pandas after so many years…so I wrote this simple function for reading files using the freadfunction from the datatable package. After that, I transform the result into a pandas dataframe which is easier to handle for me.

If a the process takes 16.62 seconds for Pandas, Datatable is only at 6.55 seconds. Overall Datatable is 2 times faster than Pandas.

⚡️ 1. Reading data faster

🗓 Generating main date formats

This function will help you to create from one column with dates, several ones with multiple interesting date formats: Just the quarter, the year, year with month, year with name of the month…etc.

If you choose to expanded, you will have more granulated formats as the day of the month, the day of the week, the number of the week…etc

And of course, it will save your original column just in case you want to use it in the future !

🗓 2. Generating main date formats

📊 Write Pandas in Google Spreadsheet

Another functionality that I need to use in my daily basis as a Data Scientist is to write in Google Spreadsheet some tables that I transform using Python. Sometimes Google Spreadsheet is your best tool for creating reports and the possibility of doing that using Python is awesome.

In order to do that, you will need this cool package called gspread . Once that you installed the package and configured your spreadsheet for working, you can use the following function for writing the whole dataset wherever you want.

📊 3. Write Pandas in Google Spreadsheet

One of the advantages of using this function is that, writing a whole dataset at the same time instead of writing each cell will save you requests to the google spreadsheet API and you will find much less problems with API limitations !

💡Super quick-stats

Take a glance of your dataset using this simple function ! It will show you:

  • Shape of the dataset
  • Duplicates Rows
  • % not nan values along cardinality
  • First 5 rows of the dataset
💡4. Super quick-stats

🔌 5. Connect to Notion with Python

Who don’t use Notion nowadays? This is one of my favourite tools and of course, it is critical for me to use Python with it in my daily work.

🔌 5. Connect to Notion with Python

Contact me!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Juan

Juan

Lead Data Scientist 📊 at Bravo Studio. Start-up growth advisor 🚀 He/Him — 🏳️‍⚧️🏳️‍🌈 ally. Opinions are my own