Discovering hidden patterns in high dimensional time series
Habits can change. Sometimes they change abruptly in response to a resolution or a change in the environment. But often they change in regular cycles. For example, I’m a pretty predictable fair weather bike commuter; I ride in the warmer months and retreat to the subway for the long winter. Services that anticipate patterns like these can better serve users through each day, week, and year.
In this post I describe a method for finding cyclical patterns in high dimensional data. This type of data is common in transaction records when each item can be described as a vector or embedding. For example, if we have song vectors learned from user-item Matrix Factorization, then the sequence of song embeddings played over the course of a day on a radio station may contain useful patterns; we could find that different types of tracks play in the morning versus the evening, and these differences could explain how audiences change over the course of a day.
To illustrate this idea I use embeddings learned from Citi Bike trip data in New York City. I’m in interested in discovering how trips change over time, and in particular, how trip destinations evolve for each departure station. A typical daily pattern is shown above for trips leaving Pershing Square North near Grand Central. Here, riders that start a trip in the morning tend to ride towards lower Manhattan. At all other times, however, riders are more likely to travel into upper Manhattan. This type of analysis over every station has the ability to reveal which stations have the strongest periodic behavior, and which frequencies are the most common.
Periodic signals in high dimensional data
The classic approach to exposing patterns in periodic time signals is Fourier analysis. Waveforms in the time domain are decomposed into coefficients in the frequency domain, and the original signal is represented as a sum of sinusoids at different frequencies and phases. In this case, the time series of interest are the sequences of destination stations for each departure station. For example, users that depart from Washington Square Place may head to West Village during the day and the East Village at night.
But how do we treat sequences of station destinations as an input signal to a Fourier analysis? In typical scenarios, time series are real-valued functions that are sampled uniformly over time. However, in this case the series is constructed from the non-uniform timestamps of each bike being removed from a starting station, and the ‘value’ is the identity of the trip’s ending station. This last difference is the most striking; Fourier analysis is not designed to work with categorical entities. Luckily, there are many well known methods for describing items as embeddings, or real-valued vectors, including common practices in recommendation systems.
Matrix Factorization powers many production recommendation systems by factoring a user-item interaction matrix into embeddings for both users and items. These factors are then used to generate recommendations by finding the closest item embeddings to each user embedding. Here, I build an interaction matrix between pairs of bike stations where the rows are trip start stations, the columns are trip end stations, and the value is the number of trips between them in that direction (note that trips can also be a loop, ending back at the starting station). This yields a 25-dimensional, real-valued vector space in which each station is embedded twice; once for when it acts as the start of a trip and once for when it acts as the end of a trip. In this analysis I want to group all trips together by their starting station; that is, I’ll treat each start station as its own data set and look for patterns in where those trips end. This means that for each data set the starting station is the same, and therefore we only need to consider the end station embeddings for each trip.
Each station is now described by a vector of 25 numbers, and this embedding represents how the station behaves as a destination for trips. Stations that tend to be the destination for trips with similar starting points get embedded in similar regions of the vector space (Figure 2; hypothetical data). This does not necessarily need to be related to a station’s location in latitude and longitude! Looking for periodic signals in this vector space can give us an idea about which subspaces are associated with trip patterns, like time of day or day of week.
Next, I create a high dimensional time series for each starting station by stacking together the end-station embeddings as they appear in time. That is, the time series for a starting station is just the trip log of rides that start from that station, and each record is the embedding of the end station indexed by the timestamp. Each end station can appear many times in the log over time because many users can travel between the same two stations. This is illustrated in Figure 3 for two embedding dimensions.
Finally, I need to find a one-dimensional projection that captures oscillatory behavior for a particular frequency of interest. For example, if I’m looking for daily patterns, then there exists some one-dimensional subspace, w, such that projection into that subspace contains a best-fit sine wave with a period of 24 hours that has a larger amplitude than any other one-dimensional subspace (Figure 4). Once stations are projected into this subspace, the stations with the highest values are likely to occur at one part of the day and the stations with the lowest values are likely to occur 12 hours later. I cast the process of finding the subspace as an optimization problem over the projection vector.
Specifically, I try to maximize the Fourier amplitude for a particular frequency of interest over a one-dimensional projection vector. For a projection onto the vector w, the non-uniform discrete Fourier amplitude of signal x at frequency f is
where t_n is the timestamp of the nᵗʰ data point with amplitude x_n . To get x_n the embedded track vector d is projected onto w. In matrix form, where the embeddings are stacked together as rows in a matrix D,
We then seek to find the unit vector that maximize the Fourier power for this frequency,
The following python code performs these steps to find the best projection vector for an embeddings matrix and a frequency of interest. The function 𝚙𝚛𝚘𝚓𝚎𝚌𝚝𝚒𝚘𝚗_𝚏𝚘𝚞𝚛𝚒𝚎𝚛_𝚊𝚖𝚙𝚕𝚒𝚝𝚞𝚍𝚎 projects the data onto w, and the function 𝚘𝚙𝚝𝚒𝚖𝚒𝚣𝚎_𝚙𝚛𝚘𝚓𝚎𝚌𝚝𝚒𝚘𝚗 finds the vector w that maximizes the amplitude at the supplied time period (i.e. inverse frequency). The input matrix has a row for every station as it appears in the time series, and the columns are the dimensions of their embeddings.
Citi Bike example
Running this code on each station’s embedded time series across an array of common frequencies yields an amplitude profile. We can determine both which stations have the strongest periodic resonance, and what frequency patterns are the most common. The data set for this analysis is all New York City Citi Bike trips in September, 2019. Figure 5 shows that the stations with the strongest cyclical behavior have both daily and weekly patterns.
Some stations have both daily and weekly patterns. The station in Astor Place shows that morning trips tend to end in the Financial District and Flatiron, as compared to the evenings when trips are more likely to end in Alphabet City and the Lower East Side. Similarly, on weekdays trips tend to end in the Lower East Side and South Street Seaport, and on the weekends bikers head to Upper Manhattan and Brooklyn (Figure 6).
Although I use this approach to describe Citi Bike trips in this post, I have found that the method is useful for other high-dimensional time series tasks as well. These types of data sets are common in recommendation systems that include a dimension of time. For example, applying the above strategies to users on an e-commerce website would expose seasonal or weekly patterns in buying for items or users. If you try it out, let me know what you find!
Thank you to Tom Drapeau and Alex Companioni for helpful thoughts and comments!