In the last post, I mentioned working with indexes in time series. As we explained in the previous lessons, the values measured or observed over time are in a time series structure. I will talk about how to work with time series in this post.
It is frequently studied with time series in different fields such as finance and economics.
I will explain the following topics in this post.
- What is to_datetime and how to use it?
- What are frequency offsets and how to use them?
- What are Period and PeriodIndex and how are they used?
Before starting the topic, our Medium page includes posts on data science, artificial intelligence, machine learning, and deep learning. Please don’t forget to follow us on Medium 🌱 to see these posts and the latest posts.
Let’s get started.
First, let’s import Pandas with pd and Numpy with np.
We can convert a date into timestamp. For this, the to_datetime method is used. For example, let’s make a date time stamped.
If we take the dates as an index in the time series, we can do the analysis easily. Dates taken as index create the DatetimeIndex object. In most cases, the dates are in a different format. To convert these dates into DatetimeIndex object, again to_datetime () method is used. Now let’s take a series of dates in different formats.
Let’s convert these dates to DatetimeIndex object.
Notice that in the DatetimeIndex object, the date format is the first year, then month, day in the end. This format is used in America. In America, the dates are in mm / dd / yyyy format. Dates in Europe are shown in the format dd / mm / yyyy. For example, let’s want to convert the 5th day of the 3rd month into a timestamp.
If we want to write the date in European format, dayfirst = True option is used.
The format argument is used when converting the date in different formats. For example, let’s want to use asterisk instead of slash.
We can use any other symbol instead of this star.
There may be strings in the index that do not represent a date. By default, the pandas cannot recognize them and gives an error message. For example;
Note that “xyz” does not specify a date. Let’s try to change this variable to datetime.
We got an error message saying “Unknown string format: xyz”. The errors argument is used to avoid receiving this error message. This argument has three values. These take erorrs ignore, raise and coerce values. By default, the option to raise comes. ignore option, dates are not converted to DatetimeIndex object. If we write a coerce to the errors argument, it will be converted to a DatetimeIndex object, but different format strings are represented by NaT. So it means not time.
to_datetime also converts epoch values to timestamp. As you know, the start time for computers is January 1, 1970. epoch is the number of seconds that have passed since this date. Let’s take the time value first.
Let’s convert the number to history. While turning the time, this command is read in milliseconds by default. To convert this to seconds, the unit = “s” argument is used.
Its value of one billion coincides with the date 2001–09–09.
Frequency and Date Offsets
In Pandas, frequencies are made up of a fundamental frequency and a product. Basic frequencies consist of string expressions called Date offsets, such as M for monthly or D for daily.
These DateOffsets are used for the frequency argument. If we want to see the list of DateOffsets, we can check the Pandas documentation. For example, we can see the list as B stands for workdays, W represents a week. We can add a multiplier to these string expressions.
Another used frequency class is the weeks of the month starting with WOM. For example, let’s print the Sundays of the last week of each month.
Period and PeriodIndex
Periods represent timestamps such as days, months, and years. For example, let’s take a variable p of a period type.
We can see the methods that can be used for this variable with the dir () function.
Let’s see the starting time of this variable p.
Let’s see the due date.
Addition or subtraction can be done with the period variable. Let’s create a variable in a monthly period.
If we add 5 to this date a, the date goes 5 months forward.
If we subtract 3, it goes back 3 months.
If the frequency of the two periods is the same, we can find the difference between the two dates.
Regular date ranges can be generated with the period_range function.
Notice that these dates are the PeriodIndex object. This PeriodIndex object can be taken as an index for the Pandas data structure.
Period and PeriodIndex objects can be converted to another frequency with asfreq. For example, let’s want to convert an annual period to a monthly period.
Let’s convert the initial month of this annual period into a monthly period.
Let’s convert the last month into a monthly period.
Quarterly data are standard in areas such as finance. Quarterly reports are reported at the end of the financial year. The end of the fiscal year is usually the last month of the year, but sometimes there may be different months of the year. For example, the command below indicates that the 4th quarter of the year ends in DEC.
The command below indicates that the 4th quarter of the year ends at the FEB.
Let’s check it out.
Let’s translate this date into a diary.
Quarterly dates can be generated with period_range.
Let’s create a time series using these dates.
We can convert Series and DataFrame objects indexed with timestamp to period with to_period method. To illustrate this, let’s generate a date range and create a time series with this date range.
Let’s create a time series with this date range.
Now let’s convert this time series to period type.
Let’s check the index of this data.
We see that it is PeriodIndex.
That’s it. I hope you enjoy this post. You can access the notebook I used for this post on our GitHub page. 🚩
If you haven’t read it, I strongly recommend you to read the following articles about time series. 👇👇👇