5 functions for making your job easier as Data Scientist.
Functions I wrote that will save you tons of hours.
⚡️ Reading data faster
Use the advantage of datatable’s reading speed without losing the advantages of using Pandas!
I don’t have anything personal against the datatable package but I got used to Pandas after so many years…so I wrote this simple function for reading files using the
freadfunction from the datatable package. After that, I transform the result into a pandas dataframe which is easier to handle for me.
If a the process takes 16.62 seconds for Pandas, Datatable is only at 6.55 seconds. Overall Datatable is 2 times faster than Pandas.
🗓 Generating main date formats
This function will help you to create from one column with dates, several ones with multiple interesting date formats: Just the quarter, the year, year with month, year with name of the month…etc.
If you choose to expanded, you will have more granulated formats as the day of the month, the day of the week, the number of the week…etc
And of course, it will save your original column just in case you want to use it in the future !
📊 Write Pandas in Google Spreadsheet
Another functionality that I need to use in my daily basis as a Data Scientist is to write in Google Spreadsheet some tables that I transform using Python. Sometimes Google Spreadsheet is your best tool for creating reports and the possibility of doing that using Python is awesome.
In order to do that, you will need this cool package called
gspread . Once that you installed the package and configured your spreadsheet for working, you can use the following function for writing the whole dataset wherever you want.
One of the advantages of using this function is that, writing a whole dataset at the same time instead of writing each cell will save you requests to the google spreadsheet API and you will find much less problems with API limitations !
Take a glance of your dataset using this simple function ! It will show you:
- Shape of the dataset
- Duplicates Rows
- % not nan values along cardinality
- First 5 rows of the dataset
🔌 5. Connect to Notion with Python
Who don’t use Notion nowadays? This is one of my favourite tools and of course, it is critical for me to use Python with it in my daily work.
- 📱 LinkedIn: Juan Antonio Cabeza Sousa
- 📬 Email: firstname.lastname@example.org
- 🖥️ Twitter: @juaancabsou