A Complete Date-Time Guide for Data Scientist in Python
A complete guide to deal and process date & time data in python using different python packages
Everybody needs to deal with date & time data in some point of data science work. Date & time data help to find important insights such as on which day of the week your website gets maximum hits, in which hour maximum transactions are made or in which hour maximum crypto / stocks are traded etc.
In this article, I will demonstrate usefulness of few very important python packages to date & time data. Let’s start without any further ado.
1. pandas
pandas is most used python package along with numpy in data science. Pandas can be used to clean and process date & time data.
1.1 String to Timestamp
When we read a CSV file using pd.read_csv('data.csv')
command, pandas reads date column as an object (string). Pandas can be used to convert string to Timestamp object.
First import pandas: import pandas as pd
I will use Rugby World Cup 2019 data for this article.
Bonus 1 : We can convert seconds or nanoseconds to DateTime using
pd.to.datetime()
1.2 Processing Timestamp
Once we have date and time as Timestamp, we can get year, month, day of month, day of week, hour, minute, seconds from it.
Bonus: How to change Timestamp format?
strftime(format="desired_format")
is an option. See examples below.
Note: strftime()
returns a string. This should be changed to Timestamp using pd.to_datetime()
.
2. datetime
datetime python package is an another import package to process date & time data.
Import sub-packages from datetime as below:
from datetime import datetime, timedelta, timezone
datetime is used to create and process datetime object. Let’s see few important usages of datetime
.
datetime.now() → Current date and time in current time zone
datetime.utcnow() → Current UTC time
datetime.taoday() → Current date and time in current time zone
datetime(year, month, day, hour, minutes, seconds) → create a datetime object
We can also get year, month, day of month, day of the week, hour, minutes and seconds values from datetime object same as Timestamp object.
timedelta is mainly used to add or subtract time from datetime object.
Example:
datetime ± timedelta(weeks=0, days=0, hours=0, minutes=0, seconds=0), where datetime is datetime object.
3. pytz
This library allows accurate and cross platform timezone calculations using Python 2.4 or higher.
Install pytz
via pip. pip install pytz
.
pytz can be use to get specific time zone date and time , also to make a specific time zone datetime object. See examples in picture below.
4. dateutil
I found two important applications of dateutil package: I. Parsing and II. Getting relative date and time (datetime object)
I. Parsing: Parsing is process of converting string to datetime.
II. Getting relative date and time: we just learned time subtraction and addition with timedelta. timedelta add or subtract exact number but what if we want get relative time for example exact one month back or later from today. In this case detail is very useful. See examples in picture below.
Pro tips:
- Always use
relativedelta
to get exact time from specific time because number of days are not same in each month which makes hard to usetimedelta
. - If you know the time difference between two time zones then use
timedelta
instead ofpytz
. - Use
pandas
to convert string into date & time (Timestamp) while working withpandas dataframe
.
Find Jupyter NoteBook on Github to view codes.
Reach out to me on LinkedIn or twitter if you have any query .
Thank you for reading article .