Dealing with datetimes like a pro in Pandas

Irina Truong
j-bennet codes
Published in
3 min readDec 28, 2017

--

In my previous article (https://codeburst.io/dealing-with-datetimes-like-a-pro-in-python-fb3ac0feb94b), I was writing about challenges related to the datetime type in Python. My recommended approach for solving them was to use Pendulum library (https://github.com/sdispater/pendulum).

But guess what?

Pandas can solve those problems just as well!

What is Pandas?

Pandas is an open-source Python library designed for data analysis. If you haven’t heard about it before, check out the comprehensive documentation here: http://pandas.pydata.org/.

Challenge #1: Parsing datetimes

Let’s see how Pandas would help with your Google Analytics-like application. In that application, you were parsing log lines that looked like this:

Here is how you’d do that with Pandas:

This code:

  • reads the log lines
  • splits each lines into parts, preserving only the relevant fields, and
  • converts the resulting list of tuples into a Pandas DataFrame.

Think of the DataFrame object as a table-like structure. It has 4 columns and contains the following data:

At this point, every field is still a string (or, to be exact, a numpy object). Now you got to the datetime parsing part:

The code above:

  • provides the format string, because the log file uses a non-standard date format (date and time parts are separated by a colon “:” instead of a space “ ”)
  • provides utc=True, to tell Pandas that your dates and times should not be naive, but UTC.

That’s all it takes.

Challenge #2: Displaying datetimes with timezones

First, let’s use your date field as the dataframe’s index. This will give you a DatetimeIndex with lots of useful methods:

Now, you can convert datetimes to the user’s timezone:

And get a localized dataframe:

Challenge #3: Rounding (truncating) datetimes

To aggregate things on an hourly frequency, you have to round datetimes down to an hour. DatetimeIndex has a method for that:

In case you wanted to round up to an hour, there’s a corresponding ceiling method.

Now, to count things in this dataframe, group by date and request:

Here is your aggregate:

Challenge # 4: Finding edges of an interval

Here is how you can calculate the start of a week:

And the start of next week:

Challenge #5: Creating ranges

Creating a range of dates is extremely easy. You can define the number of points you need:

Or provide a start and end date, and generate every point in between:

I would not necessarily recommend installing Pandas just for its datetime functionality — it’s a pretty heavy library, and you may run into installation issues on some systems (*cough* Windows). But if you already use Pandas to process data, there’s no need for any additional libraries to deal with datetimes. You have this great tool right there, in Pandas’ toolbox.

--

--