Fast Model Editing at Scale via Model Editor Networks with Gradient Decomposition (MEND)

3 min readNov 12, 2021

One of the main problems with Transformer-based networks in the field of Natural Language Processing (NLP) is that over time, their parameters have to be trained again, and what the model has already learned is no longer valid for us. For example, we learned a language model (LM) on a data set which on that data, “Lionel Messi” was still in the Barcelona team, but now we know that Messi plays in the Paris Saint-Germain team. In order to update the parameters of the model, two simple tasks usually come to mind. The first idea is to fine-tune the model on the new data set. Well, this makes the model on the new data set a bit biased and not a very ideal method. The second way is to train the model on the new data set from scratch. This usually imposes a lot of computational cost on us. Now the question that arises is “what is the optimal way to update the model parameters easily and quickly over time?!!” 😉

In order to answer this question, a very interesting paper was recently presented by Stanford University students that presents a method called “Fast Model Editing at Scale or Model Editor Networks with Gradient Decomposition (MEND)” that can train and update even very large models that have more than 10 billion parameters on just one GPU and less than one day. How this method works is such that the valid information from the old data set is still clearly mastered by the model, and also the old wrong information has been updated, and finally the model has been completely updated. I will not go into the details of the paper here anymore, and I will put the link of the paper, and its code below so that you can enter the details yourself if you like.

The topic of language models in general is very important for tasks in the field of NLP, and therefore the above paper and the method presented can be very valuable for such tasks (i.e., down stream NLP tasks). In the days to come, I will share more posts about the importance of language models (based on Transformers) and their quick training methods, and so on.

References:

MEND: Fast Model Editing at Scale [Code]
Fast Model Editing at Scale [Paper]
Great Deep Learning Tutorials for Natural Language Processing (NLP)

About the Author:

I am a Senior Deep Learning Engineer at Sotoon (Company affiliated with Cafe Bazaar & Hezardastan group, a company that focuses on providing great services based on AI and cloud computing). I am experienced in Machine Learning, Deep Learning, Computer Vision, and Speech Processing. I have done a lot of projects in these domains for the industry.
You can follow me on
(my LinkedIn Profile)
(my GitHub Profile)
(My Google Scholar Profile)
(My Medium Blog)

Fast Model Editing at Scale via Model Editor Networks with Gradient Decomposition (MEND)

References:

About the Author:

Written by Amir Hossein Karami