WeightWatcher: a Diagnostic Tool for Deep Neural Networks

LightOn
3 min readJun 4, 2021

It is about the one year mark since we started our (virtual) LightOn AI Meetups, and to mark the anniversary 🥳, we had Charles Martin as a guest. Charles is the Chief Scientist at Calculation Consulting and he presented his work on WeightWatcher: a Diagnostic Tool for Deep Neural Networks, a Python package built around a series of papers.

The 📺 recording of the meetup is on LightOn’s Youtube channel. Subscribe to the channel and subscribe to our Meetup to get notified of the next videos and events!

Weightwatcher is a Python package dedicated to analyze trained models and inspect models that are difficult to train🏋️. It can be used to gauge improvements in model performance and predict test accuracies across different models 🔮(without ever looking at the data!). It can also detect potential problems when compressing or fine-tuning pre-trained models 🗜️.

It is based on ideas from Random Matrix Theory, Statistical Mechanics, and Strongly Correlated Systems. The main idea is to fit a power law to the tail of the empirical spectral density (ESD) of the layer weights. The power-law exponent α is what helps us detect potential problems.

Fitting a power law in log-log to the tail of ESD needs to be done carefully!

Poorly trained models tend to have large layer α, as can be seen for example comparing GPT and GPT-2: the same model trained on dirty versus well-curated data.

GPT is trained on dirtier data than GPT-2, and it shows in the unusually large α values for some of the layers.

In particular, a weighted α can predict the test accuracy for models in the same architecture series across varying depths and other architectures and regularization parameters 📉.

The correlation between test accuracy and weighted alpha is remarkable.

Finally, there is some early research to extend this idea on when to perform optimal early stopping 🛑, or per-layer learning rate settings 🎛️, or detect over-fitting 🔍. Quite a program! We look forward to even more insightful empirical metrics in CharlesWeightWatcher in the future. The video of the meetup is here.

About Us

LightOn is a hardware company that develops new optical processors that considerably speed up Machine Learning computation. LightOn’s processors open new horizons in computing and engineering fields that are facing computational limits. Interested in speeding your computations up? Try out our solution on LightOn Cloud! If you want to try out your latest idea using our technology, you can register to the LightOn Cloud for a Free Trial or apply to the LightOn Cloud for Research Program! 🌈

Follow us on Twitter at @LightOnIO, subscribe to our newsletter, and/or register for our workshop series. We live stream, so you can join from anywhere. 🌍

The author

Iacopo Poli, Lead Machine Learning Engineer at LightOn AI Research

--

--

LightOn

We are a technology company developing Optical Computing for Machine Learning. Our tech harvests Computation from Nature, We are at lighton.ai