YOLOv9: 1 Channel Training
Train YOLO on One Channel
This step-by-step guide works for all YOLO models, including YOLOv8 and YOLOv9. Let’s get started!
If you're working with grayscale images, there is no need to include 3 input channels in the model. By default, YOLO doesn’t support 1 channel input for training, so let’s update the code. We’ll be working directly in the Ultralytics directory. So make sure to clone the GitHub repository:
git clone https://github.com/ultralytics/ultralytics.git
One Channel Training
First, modify the load_image() function in ultralytics/data/base.py
:
def load_image(self, i, rect_mode=True):
"""Loads 1 image from dataset index 'i', returns (im, resized hw)."""
im, f, fn = self.ims[i], self.im_files[i], self.npy_files[i]
if im is None: # not cached in RAM
if fn.exists(): # load npy
try:
im = np.load(fn)
except Exception as e:
LOGGER.warning(f"{self.prefix}WARNING ⚠️ Removing corrupt *.npy image file {fn} due to: {e}")
Path(fn).unlink(missing_ok=True)
# im = cv2.imread(f) Replace with the code below
im = cv2.imread(f, cv2.IMREAD_GRAYSCALE)
else:
# im = cv2.imread(f) Replace with the code below
im = cv2.imread(f, cv2.IMREAD_GRAYSCALE)
if im is None:
raise FileNotFoundError(f"Image Not Found {f}")
...
Continue by modifying the code in ultralytics/data/dataset.py
, starting with the class ClassificationDataset. Add self.ch to the __init__() and modify the __getitem__() function:
def __init__(self, root, args, augment=False, prefix=""):
super().__init__(root=root)
self.ch = 1 # Add this line of code
...
def __getitem__(self, i):
"""Returns subset of data and targets corresponding to given indices."""
f, j, fn, im = self.samples[i]
if self.cache_ram and im is None:
# im = self.samples[i][3] = cv2.imread(f) Replace with the code below
im = self.samples[i][3] = cv2.imread(f, cv2.IMREAD_GRAYSCALE)
elif self.cache_disk:
if not fn.exists():
np.save(fn.as_posix(), cv2.imread(f), allow_pickle=False)
im = np.load(fn)
else:
# im = cv2.imread(f) Replace with the code below
im = cv2.imread(f, cv2.IMREAD_GRAYSCALE)
im = Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))
sample = self.torch_transforms(im)
return {"img": sample, "cls": j}
Still in ultralytics/data/dataset.py
go to the class YOLODataset and the specific build_transforms() function. Add one line of code:
def build_transforms(self, hyp=None):
"""Builds and appends transforms to the list."""
self.augment = False # Add this line of code
...
Go to ultralytics/data/augment.py
and the class Format. Update the _format_img() function:
def _format_img(self, img):
"""Format the image for YOLO from Numpy array to PyTorch tensor."""
# Update the lines in this if-statement
if len(img.shape) < 3:
img = img.reshape([1, *img.shape])
img = np.ascontiguousarray(img)
img = torch.from_numpy(img)
return img
img = img.transpose(2, 0, 1)
img = np.ascontiguousarray(img[::-1] if random.uniform(0, 1) > self.bgr else img)
img = torch.from_numpy(img)
return img
Update YAML
Both the training and model configuration files must include ch: 1
.
Training configuration:
# train.yaml
ch: 1 # Add ch: 1
path: /path/to/data
train: train
val: val
names:
0: hand
Model configuration:
# yolov9c.yaml
nc: 80
ch: 1
backbone:
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]] # 2
- [-1, 1, ADown, [256]] # 3-P3/8
- [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]] # 4
- [-1, 1, ADown, [512]] # 5-P4/16
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 6
- [-1, 1, ADown, [512]] # 7-P5/32
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 8
- [-1, 1, SPPELAN, [512, 256]] # 9
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 12
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 1, RepNCSPELAN4, [256, 256, 128, 1]] # 15 (P3/8-small)
- [-1, 1, ADown, [256]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 18 (P4/16-medium)
- [-1, 1, ADown, [512]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Detect, [nc]] # DDetect(P3, P4, P5)
Running Training
Now start the training procedure:
from ultralytics import YOLO
model = YOLO("yolov9c.yaml")
model.train(data="train.yaml", epochs=3)
If you see no errors, then the first layer should have one input channel:
from n params module arguments
-1 1 704 ultralytics.nn.modules.conv.Conv [1, 64, 3, 2]
-1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
If the argument is [1, x, y, z] in the first Conv layer, it works as expected.
Further Reading
If you want to learn more about programming and, specifically, machine learning, see the following course:
Note: If you use my links to order, I’ll get a small kickback.