VeLO: The Intelligent Neural Network Optimizer Revolutionizing Deep Learning and Automating Optimization

Published in

Nerd For Tech

9 min readJul 27, 2023

In the proposed method, a new optimizer network takes the target network’s gradients, weights, and current training step and outputs its weight updates — no hyperparameters are needed. | Designed in Canva by Anish Singh Walia

VeLO, is a system designed to act as a fully tuned optimizer. It uses a neural network to compute the target network’s updates — A neural network to train and auto-tune another neural network.

NOTE: To add more value and as a USP of my blog, at the end of this blog post, I have designed and attached a cheat sheet/carousel of the topic discussed here for you to use and share.
I’d love to share the high-quality PDF version of my blog’s cheatsheets with you — for free! Since Medium doesn’t allow uploading PDFs, I post them daily on my LinkedIn page. Let’s connect and stay updated @ https://www.linkedin.com/in/anish-singh-walia-924529103/
Also, I’ll share this month’s bonus tip or best productivity tools that are cheap, effective, and a game changer, which I personally use, prefer, and insist you all try. So do check them out and use them.

Here is the Bonus tip for you all:

1) NOTION:
Bonus Tip 1: One great AI all-in-one Productivity/Task management tool I recently started using is Notion. Over the past few months, Notion has become famous and my absolute favorite.
If you’re like me, Juggling work, daily tasks, notes, and projects is tough. Multiple tabs for email, Slack, and Google Docs make it overwhelming. I personally use Notion AI, which streamlines everything in one place. It’s a game-changer, and you won’t regret using it.
I’ve been using its PRO version for a while now, and I must say, it’s been a complete game-changer for me. With almost every online co-working tool integration you can think of, it makes my daily work routine a breeze.
Plus, the pricing is unbeatable/cheapest for the tonnes of features it provides compared to all other all-in-one AI productivity tools I have used. I have taken up the annual subscription for mere 8$/month. Another awesome tool which is litreally dirt cheap.
I love its Web Clipper and use its Mac app, and I also use Notion on my phone. You can download the desktop app from here.
Do check out this cool post about Notion to know more about this brilliant platform.

Best all-in-one AI Productivity tool for this month

2)QUILLBOT:
Bonus Tip 2: One great AI Productivity Writing tool I recently started using for day-to-day writing and tasks such as plagiarism checker, grammar checker, Quillbot-Flow , paraphraser, summariser, and translator is QuillBot .
I wanted to try something similar and cheaper than Grammarly.
I took up its yearly premium for around $4/month(58 % off). The price is literally dirt cheap compared to other writing and prductivity AI tools I have used in the past.
Personally, it’s UI and UX is very simple and easy to use. So I just wanted to share this awesome, productive tool with you all. Do check it out and use it in your day-to-day writing tasks.
https://try.quillbot.com/

Best Productivity Writing tool for this month

I really insist you to go try the above tools out. Trust me, you won’t regret using these tools and will thank me later.

INDEX

Introduction
The Struggle of Hyperparameter Tuning
Critical Insight: Simplifying Hyperparameter Tuning
How VeLO Works: The Optimizer Network-An Evolutionary Approach
Diverse Training Scenarios for VeLO
Implementing VeLO in Python
Results: Faster Training, Lower Loss
The Limitations and the Road Ahead
Why VeLO Matters: Simplifying Model Development
The Future of Optimizers: VeLO-Like Algorithms
FAQs — Answering Your VeLO Queries
References

1. Introduction:

Training neural networks is a complex process involving optimizers updating their weights. However, finding the best hyperparameters for these optimizers often requires trial and error, making it time-consuming and cumbersome. But fear not! Google’s VeLO (Learned Optimizer) is here to revolutionize the game.

VeLO acts as a fully tuned optimizer, using a neural network to compute updates for the target network, and the best part is it eliminates the need for hyperparameter tuning!

Let’s dive into how VeLO works and why it’s a game-changer.

Training a neural network is like tuning an instrument to produce the perfect melody. Optimizers play a vital role in fine-tuning the network’s weights in AI and deep learning models. What if we could eliminate the need for manually selecting hyperparameters?

That’s where VeLO (Versatile Learned Optimizer) steps in. Developed by Luke Metz, James Harrison, and their team at Google, VeLO acts as a fully tuned optimizer, using a neural network to compute the updates for the target network. Let’s dive into this groundbreaking system and understand how it redefines deep learning optimization.

In-depth dive to learn more about conventional Optimization algorithms like Adam, Gradient Descent, AdaDelta can be found on my previous blog post below. I urge the readers to go through these conventional optimization techniques to understand the significant differences between VeLO and other optimizers for deep neural networks.

Types of Optimization Algorithms used in Neural Networks and Ways to Optimize Gradient Descent

Have you ever wondered which optimization algorithm to use for your Neural network Model to produce slightly better and…

medium.com

2. The Struggle of Hyperparameter Tuning:

For machine learning engineers, hyperparameter tuning can be frustrating and time-consuming. Finding the right combination of optimizer hyperparameters, suchThe Struggle of Hyperparameter Tuning: as learning rate, schedule, and weight decay, often involves hit-and-trial experiments. Traditional optimizers require human expertise to hand-pick hyperparameters like learning rate, schedule, and weight decay. This trial-and-error process is cumbersome, as it involves repeatedly training the target network with different hyperparameter values.

But VeLO is here to redefine optimization, making it hassle-free. VeLO proposes a unique solution that eliminates the need for such manual tuning.

3. Critical Insight: Simplifying Hyperparameter Tuning

The genius behind VeLO lies in its ability to automate the optimization process without relying on hyperparameter tuning. Instead of using traditional optimization techniques like SGD or Adam, VeLO introduces a different neural network, the optimizer network, to compute weight updates for the target network. This novel approach revolutionizes the way we optimize neural networks.

Hyperparameter tuning is a challenge for machine learning engineers, as finding the best values for optimizer hyperparameters, like learning rate, schedule, and weight decay, can be tedious. The proposed VeLO approach simplifies this by using a separate neural network to compute the weight updates, eliminating the need for manual hyperparameter tuning.

4. How VeLO Works: The Optimizer Network-An Evolutionary Approach

At each training step of the target network, VeLO employs an LSTM (Long Short-Term Memory) to generate weights for what we’ll call the optimizer network — a vanilla neural network responsible for updating the target network. The clever part lies in how the LSTM learns to generate the optimizer network’s weights.

High level architecture of VeLO | Designed in Canva by Anish Singh Walia

A Vanilla Neural network is a traditional neural network model with an input, hidden, and output layer.

Vanilla Neural Network | Created in Canva by Anish Singh Walia

Instead of using backpropagation, the LSTM evolves through an iterative process. Multiple LSTMs with random differences are generated, and the best ones are averaged. New LSTMs closest to the average are generated, and this process continues. The LSTM is learning from its own “experience” to improve its optimizer network weights.

The LSTM doesn’t learn its weights through the typical backpropagation method but leverages an evolution-like process.

5. Diverse Training Scenarios for VeLO

To make VeLO versatile, the authors randomly generated around 100,000 target neural networks with various architectures — vanilla, convolutional, recurrent, transformers, etc. These networks were trained on tasks ranging from image classification to text generation.

Each LSTM, starting with random weights, was copied and randomly modified for each target network. The modified LSTMs generated weights for the optimizer network based on various statistics of the target network. These statistics included mean and variance of weights, moving averages of gradients during training, fraction of completed training steps, and training loss values.

6. Implementing VeLO in Python:

You can explore the Python SDK learned_optimization developed by the researchers to implement VeLO. You can try out the full implementation in Google Colab using the link below:

learned_optimization/learned_optimization/research/general_lopt at main ·…

Contribute to anishsingh20/learned_optimization development by creating an account on GitHub.

github.com

VeLO Python Code Implementation

Google Colaboratory

Edit description

colab.research.google.com

By following this example, you can seamlessly integrate VeLO as the optimizer for your neural network, eliminating the hassle of hyperparameter tuning.

7. Results: Faster Training, Lower Loss

VeLO’s performance was evaluated using 83 tasks on a dataset scaled to complete training on a single GPU in one hour. The results were impressive. VeLO trained networks faster than Adam, a famous optimizer, tuned to find the best learning rate. On half of the tasks, VeLO was four times faster than Adam. Additionally, VeLO achieved a lower loss than Adam on five out of six MLCommons tasks, including image classification, speech recognition, text translation, and graph classification tasks.

8. The Limitations and the Road Ahead

Though VeLO proved exceptional for most tasks, it underperformed with large models (over 500 million parameters) and long training runs (more than 200,000 steps). The authors believe training VeLO on larger networks and for more steps(as large as the ones mentioned in the bold above), could overcome this limitation.

9. Why VeLO Matters: Simplifying Model Development

VeLO accelerates model development by eliminating hyperparameter testing and speeding up optimization. It takes advantage of various statistics from the target network’s training, allowing it to compute updates that move models closer to better solutions.

The field of optimizers is on the verge of a revolution. As VeLO paves the way for more intelligent optimization, we eagerly anticipate more advanced variants that can handle larger architectures and extended training durations.

10. The Future of Optimizers: VeLO-Like Algorithms:

VeLO represents a significant leap in deep learning optimization. While it may have room to grow with larger architectures, it has already proven its worth by automating the optimization process and outperforming traditional optimizers in various tasks. As the AI community looks forward to more advanced variants of VeLO, we can confidently say that optimizers are becoming more intelligent, freeing up valuable time and effort for machine learning engineers.

11. FAQs — Answering Your VeLO Queries:

What sets VeLO apart from traditional optimizers? VeLO eliminates the need for hyperparameter tuning, automatically adapting to tasks without manual intervention.
Is VeLO compatible with popular deep-learning frameworks? Yes, VeLO can be implemented using frameworks like TensorFlow and PyTorch.
What are the key benefits of using VeLO? VeLO accelerates model development by automating hyperparameter tuning and optimizing neural networks efficiently.
Can VeLO handle large models and extended training durations? VeLO’s performance might be limited to extensive models and vast training, but further research can address this.

12. References:

Research Paper: [VeLO: Training Versatile Learned Optimizers by Scaling Up] Authors: Luke Metz, James Harrison, C. Daniel Freeman, Amil Merchant, Lucas Beyer, James Bradbury, Naman Agrawal, Ben Poole, Igor Mordatch, Adam Roberts, Jascha Sohl-Dickstein
Github repo
Demo_for_training_a_model_with_a_learned_optimizer python package.

GitHub - anishsingh20/learned_optimization

Contribute to anishsingh20/learned_optimization development by creating an account on GitHub.

github.com

Save this post and try the example mentioned in the Google Colab notebook for various Deep learning tasks and projects. Do share the results and the findings in the comments here. Let’s build the future of AI together.

Share this post with your fellow AI, deep learning enthusiasts and experiment and play around with VeLO.

Connect with Me on Social Media:

Please Subscribe and Follow to get Free access to my newsletter and keep yourself updated on the latest AI and ChatGPT trends and technologies to make your lives easier and more productive, save money, and be effective at whatever you do.

Get an email whenever Anish Singh Walia publishes.

Get an email whenever Anish Singh Walia publishes. "Together, let's harness the power of AI and cutting-edge tools and…

medium.com

Your support motivates me to keep researching, designing cheatsheets, and writing about such topics.

LinkedIn: [https://www.linkedin.com/in/anish-singh-walia-924529103/]

GitHub: [https://github.com/anishsingh20]

anishsingh20 - Overview

AI | Cloud | R & Python | SRE | APIs | Data Science and Analytics | Technical Writer | Virtualisation | DevOps |…

github.com

Let’s build the future of AI together. Checkout my other blog posts related to developing AI applications, ChatGPT, OpenAI, and much more below:

Anish Singh Walia - Medium

Read writing from Anish Singh Walia on Medium. AI | Cloud | R & Python | Customer-experience | SRE | APIs | Data…

medium.com

VeLO: The Intelligent Neural Network Optimizer Revolutionizing Deep Learning and Automating Optimization

INDEX

1. Introduction:

Types of Optimization Algorithms used in Neural Networks and Ways to Optimize Gradient Descent

Have you ever wondered which optimization algorithm to use for your Neural network Model to produce slightly better and…

2. The Struggle of Hyperparameter Tuning:

3. Critical Insight: Simplifying Hyperparameter Tuning

4. How VeLO Works: The Optimizer Network-An Evolutionary Approach

5. Diverse Training Scenarios for VeLO

6. Implementing VeLO in Python:

learned_optimization/learned_optimization/research/general_lopt at main ·…

Contribute to anishsingh20/learned_optimization development by creating an account on GitHub.

Google Colaboratory

Edit description

7. Results: Faster Training, Lower Loss

8. The Limitations and the Road Ahead

9. Why VeLO Matters: Simplifying Model Development

10. The Future of Optimizers: VeLO-Like Algorithms:

11. FAQs — Answering Your VeLO Queries:

12. References:

GitHub - anishsingh20/learned_optimization

Contribute to anishsingh20/learned_optimization development by creating an account on GitHub.

Connect with Me on Social Media:

Get an email whenever Anish Singh Walia publishes.

Get an email whenever Anish Singh Walia publishes. "Together, let's harness the power of AI and cutting-edge tools and…

anishsingh20 - Overview

AI | Cloud | R & Python | SRE | APIs | Data Science and Analytics | Technical Writer | Virtualisation | DevOps |…

Anish Singh Walia - Medium

Read writing from Anish Singh Walia on Medium. AI | Cloud | R & Python | Customer-experience | SRE | APIs | Data…

Written by Tattooed Geek