Paper Explained — Discovering Symbolic Models from Deep Learning with Inductive Biases
Original Paper
Introduction
This paper improves SOTA about Symbolic Regression to derive Math Equations for Numerical Phenomena
It uses a common approach for the actual symbolic regression which is Evolutionary Algos but the main limitations for them are they do not scale well, in fact at the end of Pag.1 it is said
However, typically one uses genetic algorithms — essentially a brute force procedure as in [2] — which scale exponentially with the number of input variables and operators. Many machine learning problems are thus intractable for traditional symbolic regression.
So to sum up the main limitations of Evolutionary Algos is related to the fact they scale exponentially to
- the dimensionality of the input space
- the dimensionality of the symbols space
So the main contribution of this paper is to make the Evolutionary Algos approach for Symbolic Regression more scalable on some practical applications using the GNN as an intermediate representation of the Dataset to reduce the dimensionality of the input space, as the dimensionality of the symbols space and the symbols themselves are not learned, they are hyperparameters.
Key Contributions
Applications
Explained on Fig.1 from the Paper
How it works
Explained in Fig.2 in the Paper
The Black Boxes represent the NNs in the GNN which get learned out of the dataset and distilled with symbolic regression afterward
The Symbolic Model can then replace the NN and whole model can be refined
The key advantages of using a GNN as an intermediate representation for the Dataset instead of plugging the Symbolic Regression directly to the Dataset are explained in the Key Contribution Section above in this article
Key Aspect — Latent Space Regularization
Explained in Fig.3 in the Paper
To make the problem tractable from the Symbolic Regression perspective, it is key to keep the Latent Space Dimensionality as low as possible, as it scales exponentially to it
It can be imposed as a hyperparameter but this leads to some learning performance drop and the method would be less generic this way
It has been observed that better performance is achieved when the latent space dimensionality is learned by adding a proper regularization term like L1, which improves sparsity, or KL to a Gaussian Prior
Conclusion
In this paper, a method to make Symbolic Regression implemented with Evolutionary Algorithms is presented, relying on GNN as an intermediate representation.
The process seems interesting and with potential but its generalization capabilities to more complex datasets still need to be understood.