Liquid Neural Nets (LNNs)

Jake Hession
7 min readFeb 6, 2024

--

Photo by Robert Anderson on Unsplash

Liquid neural nets (LNNs) are an exciting, relatively new direction in AI/ML research that promises more compact and dynamic neural nets for time series prediction. LNNs offer a new approach to tasks like weather prediction, speech recognition, and autonomous driving. The primary benefit LNNs offer is that they continue adapting to new stimuli after training. Additionally, LNNs are robust in noisy conditions and are smaller and more interpretable than their conventional counterparts.

LNNs and similar concepts have been around for a while, but the 2020 paper, Liquid Time Constant Networks catapulted them to the forefront to the AI/ML space. Since then, they’ve cemented themselves as a fascinating direction for time series predictors aimed at increasing individual neurons’ representational capability instead of deriving capability through scale.

While developing the LNN concept, Hasani was inspired directly by biological neurons in the nematode C. elegans, a microscopic roundworm. Ramin Hasani, the 2020 paper’s lead author, notes that “It [C. Elegans] only has 302 neurons in its nervous system, yet it can generate unexpectedly complex dynamics.” Hasani’s mission was to “have fewer but richer nodes.” The result was LNNs.

The ‘liquid’ portion of the acronym comes from the network’s use of a liquid time constant (LTC), which is an input dependent term that alters the strength of the network’s connections to adapt to new inputs. The LTC is why LNNs can adapt to new inputs after training. Additionally, the liquid time constant, as well as the nodes’ weights are bounded. This means that LNNs are not susceptible to gradient explosion like traditional RNNs and other continuous time recurrent architectures.

LNNs have several advantages over traditional time series prediction frameworks. First and foremost is their adaptability. LNNs can alter themselves to model new distributions after training because of their ‘liquid’ nature. This adaptability makes them very capable in tasks with shifting data distributions and excessive noise. LNNs’ robustness also makes them transferable across tasks.

Each neuron’s increased informational density means small LNNs can model complex behavior that may take tens or hundreds of thousands of conventional nodes to model. For example, in a TED talk demonstrating the technology, Hasani was able to guide a vehicle using only 19 LNN nodes.

This size reduction allows LNNs to be more transparent than conventional neural nets. Large neural nets are black boxes because their many weights and nodes make individual relationships hard to analyze. By scaling down instead of up, nodes and weights can be interpreted in context, and you can not only accomplish your goal, but know why you did.

Now, a true dive into how LNNs actually work. Liquid neural nets are an evolution of neural ODEs, which model system dynamics using a series of first order ordinary differential equations (ODEs) coordinated via nonlinear interlinked gates. This is different from normal neural nets, which represent systems via a series of implicit nonlinearites (activation functions). ODEs are capable of modeling much more complex behavior than typical activation functions, and give each node more expressive power at the cost of complexity. The derivative of a typical neural ODE’s hidden state can be expressed as the following equation:

Hasani, Lechner, et. al (2020)

Here, f is the output of the neural net with parameters theta, x(t) is the current state, I(t) are the inputs at that time t. Solving this differential equation yields the next hidden state of the network. The main note is that the neural net output determines the derivative of the hidden state. There are many benefits to this setup like ease of determining causality, reduced memory cost, and ability to handle data arriving at irregular intervals. However, they are also susceptible to instability and gradient explosion, which can derail training and make the networks useless. Neural ODEs and other continuous time recurrent architectures have been used widely in the past, and LNN aims to build on their successes and address their failures.

LNNs package linear ODEs a little differently, introducing the liquid time constant (tau) and a new bias parameter:

Hasani, Lechner, et. al (2020)

The influence of the vanilla neural ODE is apparent. Both primary components of the state derivative equation are still based on the neural network itself. The term containing the time constant and the term containing the bias counterbalance each other, helping create the stable behavior characteristic of LNNs. Also note how the time constant term acts directly on this hidden state of the neural net, affecting and bounding the strength of the weights between nodes.

The liquid time constant is an interesting term, defined in the 2020 paper as the following:

Hasani, Lechner, et. al (2020)

The updated LTC is the previous LTC over 1 plus the LTC multiplied by the neural net output at a specific time step. The LTC “characterizes the speed and coupling sensitivity of an ODE”, essentially a constant term that determines how strong connections between various nodes are and how sharp the gradients in each ODE node are. Because the LTC changes with input, LNNs can create new systems to adapt to new shifts in input over time.

Forward passes over the LNN can be done by any ODE solver, but in the paper, they develop their own ‘fused’ ODE solver, which I will not be exploring in detail. The forward pass is described in the following pseudocode:

For a forward pass over the LNN by fused ODE solver, the solver discretizes the continuous temporal interval, calculating the transitional stat so the solver’s step is not continuous but a set number of discrete steps from time step 0 to n. The actual hidden state update occurs in this equation:

Hasani, Lechner, et. al (2020)
Hasani, Lechner, et. al (2020)

Here, the next hidden state is determined by the current hidden state plus the output of the neural net times the change in time and bias terms. This is over 1 plus the change time multiplied by 1 over the LTC plus the output of the neural network. It is effectively the equation defining the LNN, weighted by change in time, and in the pseudocode, it is calculated for each of the N nodes in the network for each of the T time steps.

According to the original paper, when utilizing the fused ODE solver developed for LNNs, the time complexity on a sequence of length T is O(L x T) where L is the number of discretization steps taken. This means that an LSTM with N cells and a LNN with N neurons have the same time complexity for forward passes.

Training LNNs is performed via Backpropagation Through Time (BPTT), which is a training formula used for some types of recurrent neural net architectures. It works by unrolling the network over a sequence of time states into a batch of feedforward networks (essentially one very long feedforward network), and then aggregating the error across all the passes and using it to update the weights at each time steps In the context of LNNs, this means that a series of ODE solver outputs (the hidden states of the unfolded network) are packaged as a neural network, and then the BPTT process is performed to train the system.

One of LNN’s main draws over other continuous architectures is it’s bounded nature, which the paper summarizes with the theorem on state stability:

Hasani, Lechner, et. al (2020)

This theorem dictates that the state of the node, determined by a set of coordinates, as the state is defined by an ODE, is larger than the coordinates (0, smallest quantity in the bias vector) and smaller than the coordinators (0, largest quantity in the bias vector). Thus, the state of the node is bounded and robust to inputs of undefined magnitude.

Further elaboration on all of these topics can be found in the original Liquid Time Constant Networks paper.

LNNs have many advantages over traditional recurrent networks and over other time continuous networks, but they also have flaws. While LNNs are immune to gradient explosion, they are still susceptible to gradient vanishing for long term dependencies, so for these applications, an architecture that explicitly saves long term dependencies (LSTM) may be more effective. LNNs’ accuracy and efficiency are affected by the choice of ODE solver, which is another point of variance in their implementation and efficacy. Additionally, compared to other time continuous networks, they are slow to pass over and train. However, their expressive power, memory savings, and transparency make LNNs an exciting field of research with innumerable applications.

Citations

Abdulkader Helwan. (September 28, 2023). https://abdulkaderhelwan.medium.com/liquid-neural-networks-37ccaaee469a

Bhaumik Tyagi. (July 13, 2023). Liquid Neural Networks: Revolutionizing AI with Dynamic Information Flow. https://tyagi-bhaumik.medium.com/liquid-neural-networks-revolutionizing-ai-with-dynamic-information-flow-30e27f1cc912

Brian Heater. What is a liquid neural network, really? (August 17, 2023). https://techcrunch.com/2023/08/17/what-is-a-liquid-neural-network-really/

Daniel Ackerman. Liquid machine learning system adapts to changing conditions. (January 28, 2021). https://news.mit.edu/2021/machine-learning-adapts-0128

Ramin Hasani, Mathias Lechner, et. al. Liquid Time Constant Networks. (June 8, 2020) https://arxiv.org/abs/2006.04439

Ramin Hasani. (January 19, 2023). Liquid Neural Networks https://www.youtube.com/watch?v=RI35E5ewBuI

Tim Keary. (September 25, 2023). Liquid Neural Network (LNN) https://www.techopedia.com/definition/liquid-neural-network

--

--

Jake Hession

Pursuing a degree in Computer Science and History, interested in Computer Vision, TinyML, and Generative AI. Writing to learn.