A beginner’s guide to Time series / date functionality

Nesrine Ammar
4 min readApr 11, 2020

--

Pandas contains extensive capabilities and features for working with time series data for all domains.

Timestamps and Periods

Timestamp : references particular moment in time and associates values with data points in time. Timestamped data is the most basic type of time series data that associates values with points in time. For pandas objects it means using the points in time.

Period : represents an interval and associates things like change variables with a time span.

Timestamp and Period can serve as an index. Lists of Timestamp and Period are automatically coerced to DatetimeIndex and PeriodIndex respectively.

The index of a timestamp is DatetimeIndex. Let’s look at a quick example. First, let’s create our example series df1, we’ll use the Timestamp of January 1st, 2nd and 3rd of 2020. When we look at the series, each Timestamp is the index and has a value associated with it.

Similarly, the index of period is PeriodIndex. Let’s create another example series df2. This time, we’ll use the values d, e, and f and match them with the period January, February and Mars 2020.

Converting to Datatime

Now, let’s look into how to convert to Datetime. Suppose we have a list of dates as strings. If we create a DataFrame using these dates as the index. Then, we generate some randomly data, this is the DataFrame df3 that we get. Looking at the index we can see that it’s pretty messy and the dates are all in different formats.

Using pandas to_datetime function, pandas will try to convert these to Datetime and put them in a standard format as TimeStamp.

If you pass a single string to to_datetime, it returns a single Timestamp.

We can also use the DatetimeIndex constructor directly. Example:

df4 = pd.DataFrame({ 'name':['john','mary','peter','jeff','bill'], 'date_of_birth':['2000-01-01', '1999-12-20', '2000-11-01', '1995-02-25', '1992-06-30'],})datetime_index = pd.DatetimeIndex(df4['date_of_birth'].values)
df4 = df4.set_index(datetime_index)
df4.drop('date_of_birth',axis=1,inplace=True)

Timedeltas

Time deltas: An absolute time duration. Similar to datetime.timedelta from the standard library.

We can see that when we take the difference between January 3rd and January 1st, we get a Timedelta of two days. We can also do something like find what the date and time is for 12 days past January 1st, at 8:00 AM. Then, 12 days and four seconds past.

Working with dates in Dataframe

Let’s create a fixed frequency DatetimeIndex of 9 dates with freq = M. date_range has four parameters start, end, periods, and freq, exactly three must be specified. If freq is omitted, the resulting DatetimeIndex will have periods linearly spaced elements between start and end.

We can use diff() to find the difference between each date’s value.

Suppose we wanted to know what the mean count is for each year in our DataFrame. We can do this using resample.

We can use partial string indexing to find values from a particular year ‘2018’, or from a particular month ‘January 2020’, or we can even slice on a range of dates from ‘January 2019’.

Let’s change the frequency of our dates in our DataFrame using asfreq. If we use this to change the frequency from ‘3M’ to ‘2M’, we’ll end up with missing values every other month. So let’s use the forward fill method on those missing values by propagating last valid observation forward to next valid.

One last thing I wanted to briefly touch upon is plotting time series. Importing matplotlib.pyplot, and using the iPython magic %mapplotlib inline, will allow you to visualize the time series of df5 in the notebook.

References:

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.asfreq.html https://docs.wradlib.org/en/stable/notebooks/python/mplintro.html

Thanks for reading :)

--

--