Changepoint Analysis of Time Series Data
No matter what area you work in, it is likely you observe or receive time series data in your daily life. The definition of time series in Wikipedia is “A series of data points indexed (or listed or graphed) in time order”. Some examples include the daily closing price of a stock, an hourly reading of internet traffic, the daily peak temperature in a city, a person’s pulse rate, and so on. Time series are especially useful and important in scenarios where we need to monitor processes or track metrics in industrial control, medical care, economical analysis, and so on.
There are many perspectives and methods for analyzing time series data, and one of the most useful techniques is changepoint analysis(CPA). The purpose of CPA is to identify whether, when and where a change has taken place in a time series. There are many reasons to do this kind of analysis. A few good ones are: a) to identify when a change has occurred so that you can respond somehow to that change; b) to pinpoint when a change has occurred so you can attempt to identify its cause; c) to predict future change. Today we will illustrate how to apply CPA techniques in practice, through two use cases: identification of human activity changes and segmentation of audio signals.
In April 2017, Uru released an important API, StoryBreak. This API is powered by a range of state-of-the-art techniques, including machine learning, computer vision, deep learning, statistically analysis, and time series analysis; and one of them is changepoint analysis on the audio signals. Sounds interesting? Try it out!
Identify human activity changes
In this use case, we aim at finding out the changepoint of a human’s activities. For instance, a person is sitting somewhere, and at some time stamp, he stands up and starts walking. This use case has a lot of real world applications including healthcare monitoring, security checking, and others.
We collected data from Human Activity Recognition database, which was built from the recordings of humans performing activities of daily living while carrying a waist-mounted smartphone with embedded inertial sensors. The raw data can be easily converted to a long list of positions in the 3-D space indexed by timestamp.
Data = X_1, X_2, … , X_n
and X_i = (x_i, y_i, z_i, t_i)
where X is one record; x, y, z are the position values; and t is the time stamp. Intuitively we can extract more info like velocity of movement and acceleration; and both of them are time series data. Now we take one window of data as an illustrative example. This piece of data contains two sequential activities: 1) a person was sitting between 1 to 190 time stamp; 2) he stands up at 191. We extract the velocity information and plot the data:
Now we apply the changepoint identification technique to locate the “stand up time stamp” by calling “cpt.mean” function in R “changepoint” package. The “cpt.mean” function measures the change in mean. Below is the plot of the results.
This picture depicts the two segmentations through redlines and it indicates that the changepoint is at 190, which exactly matches the ground truth. Although the “cpt.mean” function works perfectly in this case, there are scenarios where it doesn’t, like the following example. For another activity change from “walking upstairs” to “walking downstairs”, the velocity means are very similar to each other before and after the change, so that we can not find out the changepoint through measuring change in mean. Fortunately there are other options, like measure by variance. In this case the “cpt.var” function works.
Actually there is another option “cpt.meanvar”, which measures the change in the combination of mean and var, and it outperforms each single one in many cases, like the scenario where activities change happens from “sitting” to “laying”, this combination measurement works perfectly.
Segment audio signals
Audio plays an important role in multimedia contents, especially in videos. One of the major use cases of changepoint analysis is to find the organic breaks in video, so that advertisements can be inserted in the right time stamp without interrupting user experiences.
In the audio analysis scenario, there are batches of audio features which are indexed by time. We extracted a particular feature from the audio data, obtained from a horror movie, and applied the function used in the above section to segment the time series data, and then got the result as shown in the figure below.
So far so good. However, this movie is roughly 2 hours long, we may want to find multiple breaks. How to do? One solution may be to apply the function multiple times recursively to each sub-series. Actually the function “meanvar” takes the parameter Q which indicates the max number of changepoints to be output. Here let’s say we want 10 breaks, we call the function and get the results as the following graph:
Although the results look convincing, another issue is left, that some intervals are too short. In some use cases, we want to specify the minimal length of each segment. With this requirement, we set the “miniLen” parameter as the 1/20 of the length of this video, and the new result is as follows:
We verify the results by watching this movie, and the accuracy is pretty good, roughly 70%. In the above experimental study, please keep in mind we have the assumption the data satisfies normal distribution. In practice there are other distributions can be measured, and other methods to select, also some penalty functions can be applied. You can try it by yourself following the doc.
In this post, we have illustrated how to apply CPA in practice. If you want to dive deeper and learn more about the mathematical background of CPA, here are two good papers to start with:
Killick, Rebecca, and Idris Eckley. “changepoint: An R package for changepoint analysis.” Journal of Statistical Software 58.3 (2014): 1–19.
Reeves, Jaxk, et al. “A review and comparison of changepoint detection techniques for climate data.” Journal of Applied Meteorology and Climatology 46.6 (2007): 900–915.
Also, besides StoryBreak, Uru offers other APIs which are interesting and useful for understanding and analyzing videos. Please check out our website for the products and demos :)
Xiaozhen Xue currently works as a machine learning engineer and software engineer at Uru. Before that, he received a Ph.D. degree in Computer Science at Texas Tech University, and worked at Amazon for a while. His research interest includes machine learning applications, statistical analysis, software engineering, and distributed systems.