Pytorch 模型權重初始化、複製操作

李謦伊
謦伊的閱讀筆記
27 min readDec 29, 2022

在定義模型時,可以自行設置模型權重的初始化數值,也可以從其他的 pre-trained model 中提取權重參數複製到自己的模型中。權重初始化的方式會分成定義在模型內部或外部兩種方法,前者是指在自定義模型的同時,也設置權重的初始化數值;後者則是指建立好模型後再進行權重的初始化。

本文將要來介紹幾種權重初始化和複製權重參數的方法,所有 code 會放在文章最下方。

首先 import 需要的 library

import torch 
import torch.nn as nn
import torchvision.models as models

權重初始化定義在模型內部

首先來示範如何在自定義模型內部進行權重的初始化賦值,這邊示範使用 resnet18。

🔖 torch.nn.Parameter

先自定義模型架構,詳細的卷積神經網路建置可參考我之前的文章:

接著使用 torch.nn.Parameter 來定義要賦值給 weights 及 bias 的參數值,假設要將全部的 weights 設定為 0.9、bias 設定為 0。

class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.net = models.resnet18()
print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])
print(“=====================”)

self.net.fc.weight = torch.nn.Parameter(torch.ones(self.net.fc.weight.shape)*0.9, requires_grad=True)
self.net.fc.bias = torch.nn.Parameter(torch.zeros(self.net.fc.bias.shape), requires_grad=True)
print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])

def forward(self, x):
output = self.net(x)
return output

定義好模型架構和權重參數後,將模型實例化,可以看到原先的權重參數 weights 更改成 0.9、bias 更改成 0。

model = MyModel()

# === output ===
weights: tensor([ 0.0155, 0.0363, 0.0081, -0.0339, -0.0418, -0.0295, 0.0294, 0.0241,
-0.0211, 0.0341], grad_fn=<SliceBackward0>)
bias: tensor([ 0.0384, -0.0155, -0.0196, 0.0125, 0.0270, -0.0023, 0.0139, 0.0296,
0.0080, 0.0436], grad_fn=<SliceBackward0>)
=====================
weights: tensor([0.9000, 0.9000, 0.9000, 0.9000, 0.9000, 0.9000, 0.9000, 0.9000, 0.9000,
0.9000], grad_fn=<SliceBackward0>)
bias: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], grad_fn=<SliceBackward0>)

🔖 data.normal_、data.fill_

這部分是使用 data.normal_、data.zero_ 的方式來賦值,前者是從給定平均數及標準差的常態分配(Normal Distribution)中隨機選取數值、後者則是將值設定為 0。

class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.net = models.resnet18()
print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])
print(“=====================”)

self.net.fc.weight.data.normal_(mean=0.0, std=1.0)
self.net.fc.bias.data.zero_()

print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])

def forward(self, x):
output = self.net(x)
return output

將模型進行實例化,可以看到 weights 數值是介於平均數為 0、標準差為 1 的常態分配之中,而 bias 則為 0。

model = MyModel()

# === output ===
weights: tensor([-0.0244, -0.0371, -0.0281, 0.0006, 0.0299, -0.0292, -0.0032, -0.0422,
0.0313, -0.0164], grad_fn=<SliceBackward0>)
bias: tensor([ 0.0148, -0.0264, -0.0145, 0.0291, -0.0015, 0.0304, -0.0170, 0.0353,
-0.0319, 0.0003], grad_fn=<SliceBackward0>)
=====================
weights: tensor([ 0.2429, -0.5703, -1.5922, 0.3605, 0.5135, 0.3904, -0.4094, -0.3470,
0.2323, 0.1666], grad_fn=<SliceBackward0>)
bias: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], grad_fn=<SliceBackward0>)

🔖 nn.init.normal_、nn.init.zeros_

另一種抽取常態分配數值的方法是 nn.init.normal_,而 nn.init.zeros_ 是賦值為 0。

class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.net = models.resnet18()
print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])
print(“=====================”)

nn.init.normal_(self.net.fc.weight.data, mean=0.0, std=1.0)
nn.init.zeros_(self.net.fc.bias.data)

print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])

def forward(self, x):
output = self.net(x)
return output

實例化模型並查看權重參數。

model = MyModel()

# === output ===
weights: tensor([ 0.0296, 0.0150, 0.0162, 0.0311, 0.0227, 0.0142, 0.0257, 0.0052,
0.0267, -0.0273], grad_fn=<SliceBackward0>)
bias: tensor([-0.0106, -0.0324, 0.0441, -0.0311, -0.0144, -0.0150, 0.0284, 0.0063,
0.0424, -0.0204], grad_fn=<SliceBackward0>)
=====================
weights: tensor([ 0.1584, 0.3998, 0.8412, -0.6938, -2.1568, -1.2821, -0.7416, -1.8385,
1.6660, -0.3524], grad_fn=<SliceBackward0>)
bias: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], grad_fn=<SliceBackward0>)

🔖 nn.init.constant_

接著使用 nn.init.constant_ 來將權重參數設定為常數。

class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.net = models.resnet18()
print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])
print(“=====================”)

nn.init.constant_(self.net.fc.weight, 1)
nn.init.constant_(self.net.fc.bias, 0)

print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])

def forward(self, x):
output = self.net(x)
return output

可以看到權重參數 weights 更改成 1、bias 更改成 0。

model = MyModel()

# === output ===
weights: tensor([ 0.0248, 0.0262, 0.0107, 0.0300, -0.0369, 0.0325, 0.0136, -0.0440,
0.0023, 0.0258], grad_fn=<SliceBackward0>)
bias: tensor([ 9.7215e-05, -4.1210e-03, -3.4068e-02, 2.0162e-02, 3.1296e-02,
-3.7656e-03, 3.4689e-02, -2.4909e-02, 3.8025e-02, 1.4642e-02],
grad_fn=<SliceBackward0>)
=====================
weights: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], grad_fn=<SliceBackward0>)
bias: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], grad_fn=<SliceBackward0>)

🔖 apply

當模型架構的定義較為複雜時,可以將權重初始化另外定義至其他函數中,再利用 apply 來調用實現。

class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.net = models.resnet18()
print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])
print(“=====================”)

self.apply(self.init_weights)
print(“weights: “, self.net.fc.weight[0][:10])
print(“bias: “, self.net.fc.bias[:10])

def init_weights(self, module):
if isinstance(module, nn.Linear):
nn.init.normal_(module.weight.data, mean=0.0, std=1.0)

if module.bias is not None:
nn.init.zeros_(module.bias.data)

def forward(self, x):
output = self.net(x)
return output

查看權重參數已更改為我們設置的樣子。

model = MyModel()

# === output ===
weights: tensor([ 0.0124, -0.0300, 0.0275, -0.0354, -0.0005, 0.0229, 0.0324, -0.0056,
0.0028, -0.0215], grad_fn=<SliceBackward0>)
bias: tensor([-0.0225, -0.0149, 0.0258, 0.0410, 0.0232, -0.0307, -0.0196, 0.0391,
-0.0055, -0.0414], grad_fn=<SliceBackward0>)
=====================
weights: tensor([ 0.3663, 1.0643, 0.0746, 0.0075, 0.8304, -0.5193, -0.0839, 0.1247,
0.7318, -1.8451], grad_fn=<SliceBackward0>)
bias: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], grad_fn=<SliceBackward0>)

🔖 將多個不同 layer 進行初始化設定

若需要同時設置不同 layer 的初始化權重,則在 isinstance() 中輸入所有需要設定的 layer 即可。

class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.net = models.resnet18()
print("weights: ", self.net.conv1.weight[0][:10])
print("=====================")

self.apply(self.init_weights)
print("weights: ", self.net.conv1.weight[0][:10])

def init_weights(self, module):
if isinstance(module, (nn.Linear, nn.Conv2d)):
nn.init.normal_(module.weight.data, mean=0.0, std=1.0)

if module.bias is not None:
nn.init.zeros_(module.bias.data)

def forward(self, x):
output = self.net(x)
return output

權重初始化定義在模型外部

接著是在自定義模型的外部來進行權重的初始化賦值,一樣使用 resnet18 來示範。

🔖 直接定義

先來定義模型

class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.net = models.resnet18()

def forward(self, x):
output = self.net(x)
return output

model = MyModel()

接著使用 data.normal_() 和 data.zero_() 來進行權重的初始化設定。由以下結果可以看到權重已更改為設定的數值。

print(“weights: “, model.net.fc.weight[0][:10])
print(“bias: “, model.net.fc.bias[:10])
print(“=====================”)

model.net.fc.weight.data.normal_(mean=0.0, std=1.0)
model.net.fc.bias.data.zero_()
print(“weights: “, model.net.fc.weight[0][:10])
print(“bias: “, model.net.fc.bias[:10])

# === output ===
weights: tensor([-0.0177, -0.0383, 0.0188, 0.0390, 0.0256, -0.0224, -0.0312, 0.0102,
0.0342, 0.0323], grad_fn=<SliceBackward0>)
bias: tensor([ 0.0340, 0.0321, 0.0181, 0.0400, 0.0119, 0.0009, -0.0185, -0.0041,
0.0240, 0.0040], grad_fn=<SliceBackward0>)
=====================
weights: tensor([-0.2647, -0.7330, 0.1211, -0.1159, -0.1106, 0.3854, 0.4741, 1.1223,
-0.5903, -0.0706], grad_fn=<SliceBackward0>)
bias: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], grad_fn=<SliceBackward0>)

🔖 apply

一樣也可以在模型外部定義權重初始化的函數,再利用 apply 來實現。這邊的模型是沿用上面定義的模型。

model = MyModel()

def init_weights(module):
if isinstance(module, nn.Linear):
module.weight = torch.nn.Parameter(torch.ones(module.weight.shape)*0.9, requires_grad=True)

if module.bias is not None:
module.bias = torch.nn.Parameter(torch.zeros(module.bias.shape), requires_grad=True)

使用 apply 來調用權重初始化的函數,並進行查看。

print("weights: ", model.net.fc.weight[0][:10])
print("bias: ", model.net.fc.bias[:10])
print("=====================")

model.apply(init_weights)
print("weights: ", model.net.fc.weight[0][:10])
print("bias: ", model.net.fc.bias[:10])

模型權重複製

權重初始化的方式也可以使用從其他的 pre-trained model 中提取權重參數,再複製到自己的模型中。除此之外,權重複製還能運用於模型訓練階段,假設當模型的其中一個分支訓練到一定程度後,將其權重複製到另一個分支上,類似 Siamese Network 的權重共享概念。

這邊會示範兩種權重複製的方式:複製一個模型權重至另一個模型、僅複製某些層的權重至其他層。

🔖 複製一個模型權重至另一個模型

接下來試著將 pre-trained model 的權重複製到自己的模型吧!

先來讀取 pre-trained model

checkpoint = torch.load('resnet_weights.pth')

自定義模型

class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.resnet = models.resnet18()
self.resnet.fc = nn.Linear(512, 256)
self.linear = nn.Linear(256, 10)

def forward(self, x):
x = self.net(x)
output = self.linear(x)

return output

model = MyModel()

來看一下自定義模型目前的權重數值

print("weights: ", model.resnet.conv1.weight[0][0])

# === output ===
weights: tensor([[-0.0006, -0.0150, 0.0463, -0.0079, -0.0152, 0.0115, 0.0249],
[-0.0154, 0.0046, 0.0252, 0.0479, 0.0018, -0.0133, -0.0048],
[ 0.0047, 0.0077, 0.0154, 0.0016, -0.0147, -0.0068, -0.0362],
[ 0.0425, -0.0370, 0.0107, 0.0094, -0.0246, 0.0161, -0.0203],
[ 0.0256, -0.0016, -0.0147, 0.0002, 0.0113, 0.0191, 0.0252],
[ 0.0187, -0.0308, 0.0402, 0.0105, -0.0355, -0.0450, 0.0135],
[ 0.0285, 0.0155, -0.0073, -0.0131, -0.0070, 0.0490, 0.0098]],
grad_fn=<SelectBackward0>)

提取 pre-trained model 的參數以用於待會的權重複製,由於自定義模型中的 resnet.fc 有做更改,因此要先將過濾掉。

pretrained_dict = {k: v for k, v in checkpoint.items() if k not in ['resnet.fc.weight', 'resnet.fc.bias']}

# 查看權重數值
pretrained_dict['resnet.conv1.weight'][0][0]

# === output ===
tensor([[-0.0322, -0.0509, -0.0117, -0.0062, 0.0003, -0.0347, 0.0073],
[-0.0072, -0.0488, -0.0295, -0.0035, -0.0362, -0.0497, -0.0226],
[ 0.0087, 0.0136, 0.0176, 0.0150, -0.0127, 0.0358, 0.0585],
[-0.0243, 0.0452, 0.0083, 0.0163, -0.0355, 0.0162, -0.0159],
[-0.0291, 0.0263, 0.0014, 0.0211, -0.0300, 0.0307, 0.0133],
[ 0.0156, -0.0002, 0.0679, 0.0492, -0.0200, -0.0276, 0.0333],
[-0.0059, -0.0139, 0.0266, -0.0367, -0.0117, 0.0113, -0.0111]])

複製 pre-trained model 權重

model_state = model.state_dict()
model_state.update(pretrained_dict)
model.load_state_dict(model_state, strict=False)

觀察目前自定義模型的權重數值與 pre-trained model 的權重相同

print("weights: ", model.resnet.conv1.weight[0][0])

# === output ===
weights: tensor([[-0.0322, -0.0509, -0.0117, -0.0062, 0.0003, -0.0347, 0.0073],
[-0.0072, -0.0488, -0.0295, -0.0035, -0.0362, -0.0497, -0.0226],
[ 0.0087, 0.0136, 0.0176, 0.0150, -0.0127, 0.0358, 0.0585],
[-0.0243, 0.0452, 0.0083, 0.0163, -0.0355, 0.0162, -0.0159],
[-0.0291, 0.0263, 0.0014, 0.0211, -0.0300, 0.0307, 0.0133],
[ 0.0156, -0.0002, 0.0679, 0.0492, -0.0200, -0.0276, 0.0333],
[-0.0059, -0.0139, 0.0266, -0.0367, -0.0117, 0.0113, -0.0111]],
grad_fn=<SelectBackward0>)

🔖 複製某些層權重至其他層

在這部分要來嘗試將模型中的其中一個分支權重複製到另一個分支上。

以下為兩個分支的模型架構定義

from collections import OrderedDict

class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.net1 = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(256, 128, 3)),
('relu1', nn.ReLU()),
('conv2', nn.Conv2d(128, 10, 3)),
('relu2', nn.ReLU()),
]))

self.net2 = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(256, 128, 3)),
('relu1', nn.ReLU()),
('conv2', nn.Conv2d(128, 10, 3)),
('relu2', nn.ReLU()),
]))

def forward(self, x):
x1 = self.net1(x)
x2 = self.net2(x)

return x1, x2

model_1 = MyModel()

查看模型的參數名稱

for name, param in model_1.named_parameters():
print("name: ", name)

# === output ===
name: net1.conv1.weight
name: net1.conv1.bias
name: net1.conv2.weight
name: net1.conv2.bias
name: net2.conv1.weight
name: net2.conv1.bias
name: net2.conv2.weight
name: net2.conv2.bias

接著遍歷第一個分支 net1,使用 copy_() 來將該權重參數複製到第二個分支 net2 上。值得注意的是,需要使用 with torch.no_grad()

with torch.no_grad():
for i in range(len(model_1.net1)):
if isinstance(model_1.net1[i], nn.Conv2d):
print("========= {} =========".format(i))
print("org:")
print("net1 weights:", model_1.net1[i].weight[0][0])
print("net2 weights:", model_1.net2[i].weight[0][0])

model_1.net2[i].weight.copy_(model_1.net1[i].weight)

print("=====================")
print("new:")
print("net1 weights:", model_1.net1[i].weight[0][0])
print("net2 weights:", model_1.net2[i].weight[0][0])

以下為原始與更新後的權重數值

# === output ===
========= 0 =========
org:
net1 weights: tensor([[-0.0167, 0.0080, -0.0142],
[-0.0078, -0.0056, -0.0023],
[ 0.0091, -0.0146, -0.0120]], requires_grad=True)
net2 weights: tensor([[-0.0045, 0.0046, 0.0058],
[-0.0091, 0.0138, -0.0193],
[ 0.0091, 0.0119, -0.0138]], requires_grad=True)
=====================
new:
net1 weights: tensor([[-0.0167, 0.0080, -0.0142],
[-0.0078, -0.0056, -0.0023],
[ 0.0091, -0.0146, -0.0120]], requires_grad=True)
net2 weights: tensor([[-0.0167, 0.0080, -0.0142],
[-0.0078, -0.0056, -0.0023],
[ 0.0091, -0.0146, -0.0120]], requires_grad=True)
========= 2 =========
org:
net1 weights: tensor([[-0.0119, 0.0113, -0.0167],
[-0.0195, 0.0121, -0.0249],
[-0.0240, 0.0281, -0.0243]], requires_grad=True)
net2 weights: tensor([[ 0.0269, -0.0043, 0.0169],
[ 0.0139, 0.0185, -0.0151],
[ 0.0261, -0.0009, -0.0192]], requires_grad=True)
=====================
new:
net1 weights: tensor([[-0.0119, 0.0113, -0.0167],
[-0.0195, 0.0121, -0.0249],
[-0.0240, 0.0281, -0.0243]], requires_grad=True)
net2 weights: tensor([[-0.0119, 0.0113, -0.0167],
[-0.0195, 0.0121, -0.0249],
[-0.0240, 0.0281, -0.0243]], requires_grad=True)

--

--