Lag-Llama

Hadar Sharvit
2 min readFeb 21, 2024

--

A new wave of foundation models in time series is emerging, with dominant players like Google and Salesforce claiming zero-shot SOTA results! Today, the team and I reviewed a new paper titled Lag-Llama, proposing a novel TS foundation model suggested through collaborative efforts from Morgan Stanley, ServiceNow, and Universite de Montreal. Here's a quick overview, along with some reflections :

  • The main “trick”, if you will, is the unique incorporation of lags and covariates — any particular time value x_t is mapped to a set of |L| lags like L={1,4,7, …, l} and F covariates like sec(t), min(t),…, month(t).
main trick — multiple lags and covariates
  • Univariate time series only. Kind of disappointing, as in many cases multivariate is a must — think of heat prediction in an enclosed space without giving boundary conditions, that may change over time (infinite solutions!).
  • the architecture is based on Llama masked transformer decoder layer(s). Honestly, this section is missing some information, mostly for those who are not quite familiar with all the newest LLM tricks.
architecture diagram
  • the distribution head is the student’s t distribution, which is essentially a normal distribution where you can choose how ”fat” the tails are. I suppose that this was chosen after some empirical experimentation.
  • Since we’re dealing with multiple datasets with different underlying statistics, instead of training normalization, each window was normalized.
    Specifically, They chose a method that resembles a standard scaler but incorporates the median of the entire window instead of the mean, along with the disparity between the medians of the window’s first and second halves instead of the standard deviation
  • the datasets in the corpus are weighted by the amount of total number of series (inverse relation, I believe)
  • around 2.5 million parameters, which compared to LLMs is extremely small.

    We'll probably test this one (or one of its alternatives) soon enough to evaluate its performance on real-world data (like, real-real).

--

--

Hadar Sharvit

ML Team Lead at a small startup, focusing on Time Series. MSc in Computer Science, BSC in computer Science & Physics