(ML) Convert YOLOV5 to CoreML and add the Non-Maximum Suppression (NMS) layer

YEN HUNG CHENG
32 min readAug 20, 2023

--

Photo by Rahul Mishra on Unsplash

目的:學習如何將轉換成 CoreML 格式的 YOLOV5 模型,加入解碼器(Decoder)與非極大值抑制(Non-Maximum Suppression)層。

YOLOv5 已經推出一段時間了。在進行模型轉換至 CoreML 的過程中,如果按照許多文章的教學或遵循官方 GitHub 的操作指南進行,可能會面臨不少的挑戰。值得注意的是,官方的教學並未包含後處理部分的細節。在這篇文章中,我們將逐一探討這些可能遇到的問題,並提供解決方案,以確保您能夠成功克服轉換過程中的困難,使整個轉換工作能夠順利完成。

YOLOV5

先按照 官方的教學進行安裝以及轉換

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt # install

由於我們主要的任務是將官方的 pre-trained model 轉換成 coreML,所以在執行 requirements.txt 前,需要把一些被註解的程式碼移除,讓所需的套件完整安裝好。

# Export ----------------------------------------------------------------------
coremltools # CoreML export
onnx>=1.10.0 # ONNX export
onnx-simplifier>=0.4.1 # ONNX simplifier
# nvidia-pyindex # TensorRT export
# nvidia-tensorrt # TensorRT export
scikit-learn<=1.1.2 # CoreML quantization
# tensorflow>=2.4.0 # TF exports (-cpu, -aarch64, -macos)
# tensorflowjs>=3.9.0 # TF.js export
# openvino-dev>=2023.0 # OpenVINO export

接下來執行 detect.py,你可以在你的目錄中找到官方最新的 yolov5s.pt pre-trained model

python detect.py --source /data/images/zidane.jpg
runs/detect/exp

導出 pre-trained model

python export.py --weights yolov5s.pt --include coreml

轉換成功後可以在目錄中找到 yolov5s.mlmodel,將 yolov5s.mlmodel 放到 Netron 進行可視化

YOLOv5s

與官方的 Core ML Models 中的 YOLOv3-Tiny 進行比較

YOLOv3-Tiny

可以發現在 YOLOv3-Tiny 的最後一層的輸出是有包含後處理以及 NMS 層的

將模型放入到 Xcode 中查看

YOLOv5s
YOLOv3-Tiny

明顯可以發現 YOLOv3-Tiny 多了 Class Labels 與 Preview 選單

Preview

接下來我們要進行後處理以及添加 NMS 層到我們模型中

Convert Yolov5 to CoreML. Also add a decode layer.

接下來會參考上面的文章進行操作

# If you use custom traned model, you can change the class labels to your own classes.You can specify as many classes as you like.
classLabels = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]
numberOfClassLabels = len(classLabels)
outputSize = numberOfClassLabels + 5

Define the Decode function

# Just run to define the decode function
import torch
# classLabels = [f"label{i}" for i in range(80)]
numberOfClassLabels = len(classLabels)
outputSize = numberOfClassLabels + 5

# Attention: Some models are reversed!
reverseModel = False

strides = [8, 16, 32]
if reverseModel:
strides.reverse()
featureMapDimensions = [640 // stride for stride in strides]

anchors = ([10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [
116, 90, 156, 198, 373, 326]) # Take these from the <model>.yml in yolov5
if reverseModel:
anchors = anchors[::-1]

anchorGrid = torch.tensor(anchors).float().view(3, -1, 1, 1, 2)

def make_grid(nx, ny):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
return torch.stack((xv, yv), 2).view((ny, nx, 2)).float()

def addExportLayerToCoreml(builder):
'''
Adds the yolov5 export layer to the coreml model
'''
outputNames = [output.name for output in builder.spec.description.output]

for i, outputName in enumerate(outputNames):
# formulas: https://github.com/ultralytics/yolov5/issues/471
builder.add_activation(name=f"sigmoid_{outputName}", non_linearity="SIGMOID",
input_name=outputName, output_name=f"{outputName}_sigmoid")

### Coordinates calculation ###
# input (1, 3, nC, nC, 85), output (1, 3, nC, nC, 2) -> nC = 640 / strides[i]
builder.add_slice(name=f"slice_coordinates_xy_{outputName}", input_name=f"{outputName}_sigmoid",
output_name=f"{outputName}_sliced_coordinates_xy", axis="width", start_index=0, end_index=2)
# x,y * 2
builder.add_elementwise(name=f"multiply_xy_by_two_{outputName}", input_names=[
f"{outputName}_sliced_coordinates_xy"], output_name=f"{outputName}_multiplied_xy_by_two", mode="MULTIPLY", alpha=2)
# x,y * 2 - 0.5
builder.add_elementwise(name=f"subtract_0_5_from_xy_{outputName}", input_names=[
f"{outputName}_multiplied_xy_by_two"], output_name=f"{outputName}_subtracted_0_5_from_xy", mode="ADD", alpha=-0.5)
grid = make_grid(
featureMapDimensions[i], featureMapDimensions[i]).numpy()
# x,y * 2 - 0.5 + grid[i]
builder.add_bias(name=f"add_grid_from_xy_{outputName}", input_name=f"{outputName}_subtracted_0_5_from_xy",
output_name=f"{outputName}_added_grid_xy", b=grid, shape_bias=grid.shape)
# (x,y * 2 - 0.5 + grid[i]) * stride[i]
builder.add_elementwise(name=f"multiply_xy_by_stride_{outputName}", input_names=[
f"{outputName}_added_grid_xy"], output_name=f"{outputName}_calculated_xy", mode="MULTIPLY", alpha=strides[i])

# input (1, 3, nC, nC, 85), output (1, 3, nC, nC, 2)
builder.add_slice(name=f"slice_coordinates_wh_{outputName}", input_name=f"{outputName}_sigmoid",
output_name=f"{outputName}_sliced_coordinates_wh", axis="width", start_index=2, end_index=4)
# w,h * 2
builder.add_elementwise(name=f"multiply_wh_by_two_{outputName}", input_names=[
f"{outputName}_sliced_coordinates_wh"], output_name=f"{outputName}_multiplied_wh_by_two", mode="MULTIPLY", alpha=2)
# (w,h * 2) ** 2
builder.add_unary(name=f"power_wh_{outputName}", input_name=f"{outputName}_multiplied_wh_by_two",
output_name=f"{outputName}_power_wh", mode="power", alpha=2)
# (w,h * 2) ** 2 * anchor_grid[i]
anchor = anchorGrid[i].expand(-1, featureMapDimensions[i],
featureMapDimensions[i], -1).numpy()
builder.add_load_constant_nd(
name=f"anchors_{outputName}", output_name=f"{outputName}_anchors", constant_value=anchor, shape=anchor.shape)
builder.add_elementwise(name=f"multiply_wh_with_achors_{outputName}", input_names=[
f"{outputName}_power_wh", f"{outputName}_anchors"], output_name=f"{outputName}_calculated_wh", mode="MULTIPLY")

builder.add_concat_nd(name=f"concat_coordinates_{outputName}", input_names=[
f"{outputName}_calculated_xy", f"{outputName}_calculated_wh"], output_name=f"{outputName}_raw_coordinates", axis=-1)
builder.add_scale(name=f"normalize_coordinates_{outputName}", input_name=f"{outputName}_raw_coordinates",
output_name=f"{outputName}_raw_normalized_coordinates", W=torch.tensor([1 / 640]).numpy(), b=0, has_bias=False)

### Confidence calculation ###
builder.add_slice(name=f"slice_object_confidence_{outputName}", input_name=f"{outputName}_sigmoid",
output_name=f"{outputName}_object_confidence", axis="width", start_index=4, end_index=5)
builder.add_slice(name=f"slice_label_confidence_{outputName}", input_name=f"{outputName}_sigmoid",
output_name=f"{outputName}_label_confidence", axis="width", start_index=5, end_index=0)
# confidence = object_confidence * label_confidence
builder.add_multiply_broadcastable(name=f"multiply_object_label_confidence_{outputName}", input_names=[
f"{outputName}_label_confidence", f"{outputName}_object_confidence"], output_name=f"{outputName}_raw_confidence")

# input: (1, 3, nC, nC, 85), output: (3 * nc^2, 85)
builder.add_flatten_to_2d(
name=f"flatten_confidence_{outputName}", input_name=f"{outputName}_raw_confidence", output_name=f"{outputName}_flatten_raw_confidence", axis=-1)
builder.add_flatten_to_2d(
name=f"flatten_coordinates_{outputName}", input_name=f"{outputName}_raw_normalized_coordinates", output_name=f"{outputName}_flatten_raw_coordinates", axis=-1)

builder.add_concat_nd(name="concat_confidence", input_names=[
f"{outputName}_flatten_raw_confidence" for outputName in outputNames], output_name="raw_confidence", axis=-2)
builder.add_concat_nd(name="concat_coordinates", input_names=[
f"{outputName}_flatten_raw_coordinates" for outputName in outputNames], output_name="raw_coordinates", axis=-2)

builder.set_output(output_names=["raw_confidence", "raw_coordinates"], output_dims=[
(25200, numberOfClassLabels), (25200, 4)])

Define the MNS function

# Just run to define the NMS function

def createNmsModelSpec(nnSpec):
'''
Create a coreml model with nms to filter the results of the model
'''
nmsSpec = ct.proto.Model_pb2.Model()
nmsSpec.specificationVersion = 4

# Define input and outputs of the model
for i in range(2):
nnOutput = nnSpec.description.output[i].SerializeToString()

nmsSpec.description.input.add()
nmsSpec.description.input[i].ParseFromString(nnOutput)

nmsSpec.description.output.add()
nmsSpec.description.output[i].ParseFromString(nnOutput)

nmsSpec.description.output[0].name = "confidence"
nmsSpec.description.output[1].name = "coordinates"

# Define output shape of the model
outputSizes = [numberOfClassLabels, 4]
for i in range(len(outputSizes)):
maType = nmsSpec.description.output[i].type.multiArrayType
# First dimension of both output is the number of boxes, which should be flexible
maType.shapeRange.sizeRanges.add()
maType.shapeRange.sizeRanges[0].lowerBound = 0
maType.shapeRange.sizeRanges[0].upperBound = -1
# Second dimension is fixed, for "confidence" it's the number of classes, for coordinates it's position (x, y) and size (w, h)
maType.shapeRange.sizeRanges.add()
maType.shapeRange.sizeRanges[1].lowerBound = outputSizes[i]
maType.shapeRange.sizeRanges[1].upperBound = outputSizes[i]
del maType.shape[:]

# Define the model type non maximum supression
nms = nmsSpec.nonMaximumSuppression
nms.confidenceInputFeatureName = "raw_confidence"
nms.coordinatesInputFeatureName = "raw_coordinates"
nms.confidenceOutputFeatureName = "confidence"
nms.coordinatesOutputFeatureName = "coordinates"
nms.iouThresholdInputFeatureName = "iouThreshold"
nms.confidenceThresholdInputFeatureName = "confidenceThreshold"
# Some good default values for the two additional inputs, can be overwritten when using the model
nms.iouThreshold = 0.4
nms.confidenceThreshold = 0.25
nms.stringClassLabels.vector.extend(classLabels)

return nmsSpec

Combine the model added decode and the NMS

# Just run to combine the model added decode and the NMS.
def combineModelsAndExport(builderSpec, nmsSpec, fileName, quantize=False):
'''
Combines the coreml model with export logic and the nms to one final model. Optionally save with different quantization (32, 16, 8) (Works only if on Mac Os)
'''
try:
print(f'Combine CoreMl model with nms and export model')
# Combine models to a single one
pipeline = ct.models.pipeline.Pipeline(input_features=[("image", ct.models.datatypes.Array(3, 460, 460)),
("iouThreshold", ct.models.datatypes.Double(
)),
("confidenceThreshold", ct.models.datatypes.Double())], output_features=["confidence", "coordinates"])

# Required version (>= ios13) in order for mns to work
pipeline.spec.specificationVersion = 4

pipeline.add_model(builderSpec)
pipeline.add_model(nmsSpec)

pipeline.spec.description.input[0].ParseFromString(
builderSpec.description.input[0].SerializeToString())
pipeline.spec.description.output[0].ParseFromString(
nmsSpec.description.output[0].SerializeToString())
pipeline.spec.description.output[1].ParseFromString(
nmsSpec.description.output[1].SerializeToString())

# Metadata for the model‚
pipeline.spec.description.input[
1].shortDescription = "(optional) IOU Threshold override (Default: 0.6)"
pipeline.spec.description.input[
2].shortDescription = "(optional) Confidence Threshold override (Default: 0.4)"
pipeline.spec.description.output[0].shortDescription = u"Boxes \xd7 Class confidence"
pipeline.spec.description.output[
1].shortDescription = u"Boxes \xd7 [x, y, width, height] (relative to image size)"
pipeline.spec.description.metadata.versionString = "1.0"
pipeline.spec.description.metadata.shortDescription = "yolov5"
pipeline.spec.description.metadata.author = "Leon De Andrade"
pipeline.spec.description.metadata.license = ""

model = ct.models.MLModel(pipeline.spec)
model.save(fileName)

if quantize:
fileName16 = fileName.replace(".mlmodel", "_16.mlmodel")
modelFp16 = ct.models.neural_network.quantization_utils.quantize_weights(
model, nbits=16)
modelFp16.save(fileName16)

fileName8 = fileName.replace(".mlmodel", "_8.mlmodel")
modelFp8 = ct.models.neural_network.quantization_utils.quantize_weights(
model, nbits=8)
modelFp8.save(fileName8)

print(f'CoreML export success, saved as {fileName}')
except Exception as e:
print(f'CoreML export failure: {e}')

Load the CoreML model

# You need specify the path to your model that converted and saved in the same folder of your weight file.
import coremltools as ct
mlmodel = ct.models.MLModel("yolov5s.mlmodel")
# Just run to get the mlmodel spec.
spec = mlmodel.get_spec()
builder = ct.models.neural_network.NeuralNetworkBuilder(spec=spec)
spec.description
# run the functions to add decode layer and NMS to the model.
addExportLayerToCoreml(builder)
nmsSpec = createNmsModelSpec(builder.spec)
combineModelsAndExport(builder.spec, nmsSpec, f"yolo5s.mlpackage") # The model will be saved in this path.

Error Message

---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-56-87c275ff5e94> in <cell line: 2>()
1 # run the functions to add decode layer and NMS to the model.
----> 2 addExportLayerToCoreml(builder)
3 nmsSpec = createNmsModelSpec(builder.spec)
4 combineModelsAndExport(builder.spec, nmsSpec, f"yolo5s.mlmodel") # The model will be saved in this path.

1 frames
/usr/local/lib/python3.10/dist-packages/coremltools/models/neural_network/builder.py in set_output(self, output_names, output_dims)
522 spec = self.spec
523 for idx, dim in enumerate(output_dims):
--> 524 spec.description.output[idx].type.multiArrayType.ClearField("shape")
525 spec.description.output[idx].type.multiArrayType.shape.extend(dim)
526 spec.description.output[

IndexError: list index (1) out of range

文章中有些人也遇到相同的問題,作者尚未回應

接下來先到 yolov5 release 下載各種版的 yolov5s.pt (4.0, 5.0, 6.0)

接下來使用上方的 GitHub 進行嘗試

git clone https://github.com/pytholic/Yolov5Export.git
cd Yolov5Export-main/yolov5_export/yolov5

python export.py --weights yolov5s.pt --train --include "coreml"

這時候我們再次執行轉換的程式碼


# You need specify the path to your model that converted and saved in the same folder of your weight file.
import coremltools as ct
mlmodel = ct.models.MLModel("yolov5s.mlmodel")
# Just run to get the mlmodel spec.
spec = mlmodel.get_spec()
builder = ct.models.neural_network.NeuralNetworkBuilder(spec=spec)
spec.description
# run the functions to add decode layer and NMS to the model.
addExportLayerToCoreml(builder)
nmsSpec = createNmsModelSpec(builder.spec)
combineModelsAndExport(builder.spec, nmsSpec, f"yolo5s.mlpackage") # The model will be saved in this path.

這時你就會發現模型,已經成功轉換

將模型放入 Netron 進行可視化

再次將 mlmodel 放入 Xcode 中

此時的模型已經成功完成後處理以及添加 NMS 層

接下來嘗試修改最原始的 export.py

https://github.com/ultralytics/yolov5/files/10276955/export.py.txt

將上方下載的 txt 下載後,覆蓋到原本的 export.py 中

python export.py --weights yolov5s.pt --include "coreml"

將模型放入 Netron 進行可視化

再次將 mlmodel 放入 Xcode 中

目前,我分享了兩種不同的後處理方法和添加 NMS 後的模型方法,雖然它們在可視化後架構上稍有不同,但好在目前在 Preview 上推論過程看起來都順利運行。我也嘗試了其他方法,但這兩種方法是我成功轉換並能與大家分享的。這對於未來使用自己的訓練資料來說,是一個極為重要的一部分。

最後再附上 YOLOV5 官方所提供的 CoreML 模型 可視化圖,與上面兩種方法轉換後的架構都有所不同。

--

--