(ML) 解決 RuntimeError: { NSLocalizedDescription = “The size of the output layer ‘xxxx’ in the neural network does not match the number of classes in the classifier.”; }

11 min readJun 11, 2023

我根據一篇文章學習了如何使用PyTorch Lightning和CoreML，在Apple神經網路引擎中實現加速運算。然而，在按照文章進行實作時遇到了一些錯誤。

利用 PyTorch Lightning 與 CoreML 實現在 Apple 神經網路引擎進行加速運算

自 Apple 公司在自主研發處理器上取得成功後，各家媒體多將焦點放在卓越的媒體製作效能，以及不同情境下，處理器的效能檢測結果，卻忽略了 Apple 處理器上特別的 16 核心神經網路引擎(Neural Engine)。對 AI…

edge.aif.tw

文章的目標：利用 PyTorch Lightning 訓練一個 400 種鳥類的分類器，並能夠在 Macbook 與 iPhone 上進行推理，且整體訓練過程也會在 Macbook Air M1 上進行訓練，但按照文章進行環境配置及訓練模型後，在最後推理會遇到以下幾種錯誤

Error message 1

Cuda error（環境適配錯誤）

按照文章適配環境時，若是先安裝 pytorch 再安裝 pytorch_lightning 的話，pytorch_lightning 會將原本安裝好的 pytorch, cuda cudnn 版本對應打亂，解決方法，可以先安裝 pytorch_lightning，再執行安裝指定的 pytorch, cuda, cudnn 版本即可解決。

PyTorch

We'd prefer you install the latest version, but old binaries and installation instructions are provided below for your…

pytorch.org

Error message 2

KeyError: ‘class index’

下載的 bird_data 中的 bird.csv 中的 class id 改為 class index 即可解決

Error message 3

TypeError: Accuracy.__new__() missing 1 required positional argument: ‘task’

打開 py_utils/module.py ，將 Accuracy() 改為 Accuracy(num_classes=class_num, task=’multiclass’) ，並且關閉所有頁面，重開 jupyter notebook

#         self.train_acc = Accuracy()
        self.train_acc = Accuracy(num_classes=class_num, task='multiclass')

        self.val_loss = nn.CrossEntropyLoss()
#         self.val_acc = Accuracy()
        self.val_acc = Accuracy(num_classes=class_num, task='multiclass')

Error message 4 (本篇重點)

RuntimeError: { NSLocalizedDescription = “The size of the output layer ‘var_324’ in the neural network does not match the number of classes in the classifier.”; }

Inference on Pytorch 是正常可以使用的，不過將模型轉換為 CoreML 時，並 Inference on Mac 時，會出現 神經網絡中輸出層 ‘var_324’ 的大小與分類器中的類數不匹配的問題，所以我就開始不斷找尋問題的解答。

模型測試

首先先將模型加入任一的 Xcode 專案中

點選 Preview 後，將任一照片拉入其中

出現 Unexpected Error 錯誤

正常情況下會出現以下結果

模型可視化測試

使用 Netron 可視化模型

Netron

Edit description

netron.app

再次回想一下 error message 神經網絡中輸出層 ‘var_324’ 的大小與分類器中的類數不匹配 ，這時我的想法是明明訓練模型時，我在使用 PyTorch 訓練時的輸出是 400 ，為何還是出現了類數不匹配的問題？

官方模型可視化

下載任一同為圖像分類的模型進行可視化，Ex: Resnet50、MobileNetV2、SqueezeNet

Models - Machine Learning - Apple Developer

Build intelligence into your apps using machine learning models from the research community designed for Core ML.

developer.apple.com

Resnet50.mlmodel

嘗試改變模型輸入與輸出名稱

打開 pt2ct.py 進行修改

import torch
from py_utils.module import Model
import coremltools as ct

if __name__ == '__main__':
    model = Model.load_from_checkpoint(
        "best_model/birds-epoch=00-val_loss=0.64.ckpt.ckpt")
    X = torch.rand(1, 3, 112, 112)

    image_input = ct.ImageType(name="image",
                               shape=X.shape,
                               scale=1/255.0)

    model.to_torchscript(file_path="best_model/model_trace.pt", method='trace',
                         example_inputs=X)

    traced_model = torch.jit.trace(torch.load('best_model/model_trace.pt'), X)

    output = ct.TensorType(name="classLabelProbs")

    model = ct.convert(
        traced_model,
        inputs=[image_input],
        outputs=[output],
        classifier_config=ct.ClassifierConfig('data/bird_data/labelname.txt'),
        compute_units=ct.ComputeUnit.ALL,
    )


    model.save("best_model/bird3.mlmodel")

再次將模型可視化

Input 名稱與 Output 名稱已經變為與官方模型相同名稱，但是還是會出現一樣的 error message ，只是現在錯誤的名稱變為 classLabelProbs 而已，還有唯一不同的地方是沒有 softmax

解決問題

最後我在 Xcode 的 bird3.mlmodel 模型的 General 的 Class Labels 發現了一個問題，也就是在 Class Labels 會顯示你訓練的標籤數量，在這顯示了 515 種標籤類別，但在教學文章中表示為 400 種分類

這時候我計算將從 kaggle 下載的 dataset 的資料數量，我才發現原來真的有 515 種的鳥類資料

array = []

with open('data/bird_data/labelname.txt', 'r') as file:
    for line in file:
        line = line.strip()  # 去除行尾的換行符號
        array.append(line)

print(len(array)) # 515

最終我將 trainer_lightning.py 中的 400 改為 515

if __name__ == '__main__':

    data_paths = ['best_model', 'data']

    for path in data_paths:
        if not os.path.exists(path):
            os.mkdir(path)

    data_df = pd.read_csv('data/bird_data/birds.csv')

    data = DataModule(128, data_df)
    # 將 400 改為 515
    model = Model(515)

並且在 module.py 加入 softmax 進行訓練

    def forward(self, x):
        out = self.model(x)
        # 加入 softmax
        out = F.softmax(out, dim=1)
        return out

訓練後的模型轉換為 CoreML model 進行可視化

innerProduct 維度變為 515 x 512
但是這會有一個小問題，也就是在 pytorch 中的 softmax 轉換後，會轉換成 softmaxND

在 Xcode 進行測試

最終成功解決問題，最終發現是訓練資料的數量，跟最終輸出的維度對不上才會出現類似的問題，甚至在轉換 CoreML 時，也必須給定每筆資料的標籤才能順利轉換。

Similar Error Message

(pytorch) 將 yolov5 轉換成 CoreML 出現相似的問題

CoreML model error: The size of the output layer '740' in the neural network does not match the…

I've exported the yolov5s.pt model to CoreML creating yolov5s.mlmodel. I've followed the instructions reported in #251…

github.com

(pytorch) 將 yolov7 轉換成 CoreML 出現相似的問題

model_name.mlmodel not running on ios. · Issue #1359 · WongKinYiu/yolov7

I am trying to create an object detection model using Yolov7. I followed the steps and converted my model to .mlmodel…