DateTime in pandas read_csv(): Everything you have to know

Published in

Nerd For Tech

5 min readJun 2, 2021

To begin with, you will need data and they come from different sources in different formats. Some of the Data Formats I have used and well supported by the Pandas library are Comma-separated values(CSVs), XLSX, ZIP, Plain Text(txt), JSON, XML, HTML.

When you build a pandas data frame from one of these data formats, chances are at least one of the columns will have dates.

As strings are represented as string objects, integers as Integer objects, floats as floating point objects in python, dates should be of type Datetime.

import datetimetype(5)
>>> <class 'int'>type('Nepal')
>>> <class 'str'>type(datetime.date(2051,7,14))
>>> <class 'datetime.date'>

Datetime objects are unique. For python to recognise an object as string, the value must be inside single/double quotes, and,

for python to treat them as Datetime objects, they must be created using datetime.datetime() method. Here is the link to the python docs:

datetime - Basic date and time types - Python 3.9.5 documentation

Only one concrete class, the class, is supplied by the module. The class can represent simple timezones with fixed…

docs.python.org

But, working with dates and parsing them is one of the biggest challenges for programmers.

Let me show you what I mean.

First things first, let’s import a dataset. I will be using the Avocado Prices dataset for now. You can download the data from the link below.

Avocado Prices

Historical data on avocado prices and sales volume in multiple US markets

www.kaggle.com

Let’s import the libraries.

import pandas as pd
import numpy as np
import datetime

The data is in CSV format. So, we will be using read_csv() method

df = pd.read_csv('../DataSets/avocado.csv')
df.head(3)

O/P:

Let’s check the data types of each of the columns using dataframe.info()

Alas, the data type of the ‘Date’ column is not ‘DateTime’, but Object.

Object is the pandas equivalent of python’s String and is interchangeable with it in most cases.

For python to treat an object as datetime object instead of str,int, float; the object should be created or parsed using datetime.datetime() or datetime parser.

By default, they are read as strings by the pandas read_csv() method and not as datetime objects.

So, how to force pandas to read columns with dates(strings) as the datetime objects instead?

Parsing the dates as datetime at the time of reading the data.

Set parse_dateparameter of read_csv()to label/index of the column you want to parse(convert string date into datetime object).

Only booleans, lists, and dictionaries are accepted for the 'parse_dates' parameter

parse_date = [‘Date’] or parse_date = [1]

Now, let’s check the data type of the columns.

And, now the column ‘Date’ is of type datetime

And, if we try to check the data type of an element of the ‘Date’ column.

TimeStamp is the pandas equivalent of python’s Datetime and is interchangeable with it in most cases.

There are lots of benefits of using a datetime object instead of String objects. For example, with datetime objects dealing with time zones, daylight saving time, and different written date formats will be easier and consistent.

Convert String Dates to datetime objects after they have been read by pandas

Pandas has a built-in function called to_datetime() that can be used to convert strings to datetime object.

df['Date] = pd.to_datetime(df['Date'])

Custom Format:

Sometimes, dates might be in a custom format, for example, YYYY-DD-MM HH:MM:SS. Pandas to_datetime() has a parameter called format that allows you to pass a custom format as the argument:

dfx = pd.DataFrame({'Date':['1994-10-31 12:30:00', '1995-11-23 15:30:00', '2004-1-1 00:30:00'], 'name':['psg','aw','prw']})
dfx

dfx['Date'] = pd.to_datetime(dfx['Date'], format='%Y-%m-%d  %H:%M:%S')

What if year, month and day are in separate columns? Can we assemble them into a single column as datetime objects?

Set the parameterparse_date to a dictionary in read_csv() method.

They key in the dictionay should be the name of the column and the value should be a list with the index of the columns you want to combine.

This will collapse the existing columns, if you want to keep the existing columns then, Set keep_date_col = True

From the docs:

parse_dates: bool or list of int or names or list of lists or dict, default False

The behaviour is as follows:

boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

Assemble the columns to one after they have already been read using pd.to_datetime()

As usual, you can still use the pd.to_datetime() to assemble them to a single column and convert them to datetime object.

By default, the existing columns will not be collapsed and will remain there.

OK, that’s it for now.

Thank you for reading.

I the next part I will include:

Manipulating the datetime objects in pandas
Using a custom callable for converting a sequence of string columns to an array of datetime instances.
Performing aggregation and groupby in dates