Interpreting Time

Published in

Salesloft Engineering

8 min readFeb 22, 2022

“Only time (whatever that may be) will tell”
-Stephen Hawking

How could time possibly be so difficult to understand? We structure our days, our weeks, our lives around it (and use it to define days, weeks, and lifespans!) yet when asked to define time, how would you answer? More importantly, if you were asked to explain the concept of time to someone, what would you say to them? You might start with how we measure time, but even that can get tricky. Yes, there are formal definitions of seconds, hours, days, and years, but why does a three-hour road trip to the beach always feel so much longer than a three-hour road trip coming back from the beach? Both road trips are the same amount of time, but how we experience those three hours, how we personally perceive time during those road trips, feels wildly different based on which direction we are going in. And speaking of direction, why do we have an “arrow of time” always moving in one direction? Why can’t it be figure eight, or even a circle?

The best formal definition of time that I’ve been able to make sense of goes back to the second law of thermodynamics. This law states that in a closed system, entropy can only increase or hold constant, it can never decrease. This is a physicist’s way of saying that I can pour my cereal from its box and into a bowl and back, but if I scramble an egg I can never uncook it. Entropy can be understood as the amount of chaos or disorder in a system. Prior to cooking, an egg has a defined yolk and whites and, while they are joined, they are also separate and distinct from each other. But once I scramble the egg, the lines blur, the yolk and the whites are no longer distinct and separate entities. The egg is now a single homogenous liquid, with the yolk proteins and whites proteins randomly jumbled together with very little design or pattern. The amount of entropy, or disorder, has increased and can not be undone. Going back to my cereal, the cereal pieces are randomly ordered and contained in their box, and when I pour those same pieces into my bowl, they are still just as randomly ordered and contained. I can pour the cereal back into the box, and again there is the same amount of disorder present, so entropy has remained constant. When entropy increases, and we have moved from one state to another with an impossibility of ever going back to the previous state, we have progressed. We have moved forward from a past to a present, giving a definition to the arrow of time.

Since those of us who are not physicists generally don’t go around measuring the amount of entropy in various systems to ensure its increase, we instead agree that time exists, is inevitable, and simply aim to utilize it. We quantify time and use it to measure everything from when to eat to the processing speed of a computer. We use these (relatively) constant measures of time to create music, to create the machines and systems that run our world, and to quantify personal milestones. So, with all of these precisely quantified units of time, how is it that ten minutes on a treadmill feels like forty?

The piece of our brain responsible for processing time is called the medial entorhinal cortex, consisting of grid cells, which act as our neural clock. These grid cells are literal neural networks that are incredibly flexible, allowing the circuitry to encode time based on physical circumstances and experiences, which isn’t always objective. From Dr. Albert Tsao, now a postdoctoral researcher at Stanford and the researcher who originally discovered these time-encoding cells [1]:

“The neural clock operates by organizing the flow of our experiences into an orderly sequence of events. This activity gives rise to the brain’s clock for subjective time. Experience, and the succession of events within experience, are thus the substance of which subjective time is generated and measured by the brain”

Illustration of how how the brain perceives time, in this example, skiing. — An illustration from NeuroscienceNews shows a skier’s perception of time during a 4-hours ski trip.

So, given the difficulties an organic neural network has with understanding and experiencing time, is it even possible to train an artificial neural network to understand time? More specifically, does a language model, which should have a general understanding of language including words and phrases associated with time, really understand what time is and how it works? It might sound unfair to go immediately to language since even humans can have a hard time understanding references to time (does “next week” mean the upcoming week or the week after?). But since we can only claim to understand time as well as we can communicate that claim, it makes sense that we would start with communications or language.

There’s also an advantage of looking specifically at language: from reading between the lines to find a timing objection, to scheduling a demo call, to deciding on next steps, language carries time-related information that is crucial for sales. As researchers at Salesloft, it is our job to develop models that can understand these communications and provide an action or recommendation based on what was said. These features can be incredibly valuable to our users, but only if we as researchers are able to ensure that our models make these suggestions based on a comprehensive understanding of how time works.

Recurrent neural networks are naturally suited to recognize sequences, and training these models on a timestamped dataset can imitate a sense of time, but ultimately this only sidesteps the fundamental understanding of time [2]. Given the amount of uncertainty in our own understanding of time, it makes sense to try a more probabilistic approach. A probabilistic framework allows a model to propagate uncertainty associated with certain words or phrases until it has ingested enough context to make a decision, such as the interpretation of the word “last” in “last week” and “last week of November”. These frameworks are successful in recognizing language referring to time and recognizing ranges, sequences, and even duration of time [3]. However, this still dodges our core concern of “does a language model understand time?”. Inferring characteristics of time brings us to Natural Language Inference, NLI, which prompts a model to classify a hypothesis as true, false, or neutral given a premise. Table 1 shows an example of each label.

Table 1: Examples of the data used for Natural Language Inference (NLI).

This framework gives us insight into how well the model is able to infer the concepts conveyed in the premise and hypothesis (in this case time) rather than simply identifying words and phrases associated with those concepts [4]. In the paper “Probing Language Models for Understanding of Temporal Expressions” [5], researchers evaluate how a transformer model trained on a standard NLI dataset (Multi-genre Natural Language Inference or MNLI) generalizes to three different time-related challenges, Temporal Order, Temporal Duration, and Cross-Unit Duration. Temporal Order tests the model’s understanding of the order of time, Monday before Friday, February after January, etc. Temporal Duration looks at how well the model can reason about timespans, and Cross-Unit Duration is similar but uses multiple granularities of time to test if the model is able to understand that 86,400 seconds is the same as 1,440 minutes is the same as 24 hours. Base MNLI-trained models do quite poorly across all three challenges, which means that general natural language understanding and inference does not include a robust understanding of how time works. However, when the models are trained specifically on these temporal understanding tasks, the researchers saw accuracies above 90% for all three tasks. We see very similar results in the paper “Temporal Reasoning in NLI” [6]. Once again, the base inference models perform quite poorly, but when trained on the temporal language data, these models exhibit dramatic boosts in performance.

“A Study of Temporal Commonsense Understanding” [7] changes the methodology a bit by working outside of the usual NLI framework with a fantastically named dataset, MCTACO (Multiple Choice TemporAl COmmon-sense), which as the name suggests, prompts the model with a question related to time and 2–4 multiple choice answers where more than one answer can be correct. This allows the researchers to be more granular in which characteristics of time they test the model on. Whereas the first two papers we discussed had three challenges for the language models being evaluated, this paper looks at five: duration, ordering, typical time (when something normally occurs), frequency (how often something occurs), and stationarity (is this indefinite?). Finally, the researchers went another step further by comparing model performance on this dataset to human performance on the same dataset to give an idea of relative performance gaps. In this setting, the state-of-the-art language models did perform much better than straight random guesses but also did quite poorly compared to human baselines.

Investigating certain concepts within language models is by no means exclusive to time; similar studies have proven that even basic language models are able to understand the concept of numbers [8]. But why do we investigate these things? A user might never know the difference if the model they are interfacing with understands time or not, and to be honest, they probably don’t really care. But as machine learning researchers, it is our responsibility to understand the inner workings of our models as best we can. Without evaluating how well a language model can encode such a common concept as time, we can not be certain what patterns or biases our models are actually picking up on. Our goal at Salesloft is to build machine learning features that enable users to be better sellers by better serving their buyers, and time is an absolutely crucial component of that process. So, if our models can not understand time, how well can those models actually enable our users? Service is rooted in timing, and selling is a service, so enabling our models within Salesloft to understand time unlocks a wide array of forecasting, planning, and sentiment features to help sellers better serve their buyers.

References:

https://www.ntnu.edu/how-your-brain-experiences-time
Faghihi, HR. 2021. Time-Stamped Language Model: Teaching Language Models to Understand the Flow of Events. https://aclanthology.org/2021.naacl-main.362.pdf
Angeli, G. 2012. Parsing Time: Learning to Interpret Time Expressions. https://nlp.stanford.edu/pubs/2012-naacl-temporal.pdf
Poliak, A. 2018. Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation. https://aclanthology.org/D18-1007.pdf
Thukral, S. 2021. Probing Language Models for Understanding of Temporal Expressions. https://aclanthology.org/2021.blackboxnlp-1.31.pdf
Vashishtha, S. 2020. Temporal Reasoning in NLI. https://aclanthology.org/2020.findings-emnlp.363.pdf
Zhou, B. 2019. “Going on a vacation” takes longer than “Going for a walk”: A Study in Temporal Commonsense Understanding. https://aclanthology.org/D19-1332.pdf
Wallace, E. 2019. Do NLP Models Know Numbers? Probing Numeracy in Embeddings. https://arxiv.org/pdf/1909.07940.pdf

Interpreting Time

Written by Alec Delany