Enhancing Quality Control for Manufacturing: The Cutting Edge of Computer Vision in Snowflake

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

13 min readJan 10, 2024

Image Credits : Pixabay

When delving into the evolution of technology, it can be astonishing, and perhaps a bit incredulous, to realize that some of the technologies we use today have a history stretching back several decades. The term “artificial intelligence” was coined at the Dartmouth Conference in 1956. The first monolithic integrated circuit chip was developed by Robert Noyce back in 1959. Early concepts of Computer Vision started in the 1960s and a 1966 MIT Summer Vision Project aimed to develop a computer vision system capable of interpreting scenes from images.

Technologies have rapidly evolved, especially the field of Computer Vision which is marked by significant milestones and breakthroughs. The key stages of the Computer vision model journey were largely driven by algorithmic approaches and the availability of large datasets. From this perspective we can see that as early as 1974 we saw the introduction of optical character recognition (OCR) technology, which could recognize text printed in any font or typeface, Geometric and Photometric techniques for edge detection, corner detection, and texture analysis, SIFT enabled the identification of key points in images irrespective of scale, rotation, or illumination changes, and by 2000s the performance of deep learning models like AlexNet laid the foundation for convolutional neural networks (CNNs) and several other deep-learning models like GoogLeNet, EfficientNet, Faster-CNN, and their peers. Now in this era of GEN-Alpha, we have the LVMs (Large Vision Models) trained with billions of parameters, pushing the limits of scalability and providing breakthroughs in various applications, including image recognition, object detection, segmentation, and generative tasks.

Computer Vision for Intelligent Quality Control

In early 2023, Snowflake launched the Manufacturing Data Cloud to help customers simplify their data operations and management and unleash the power of AI to improve supply chain performance, power smart manufacturing, and implement quality control from connected products. Industry 4.0 technologies, including IIoT, machine learning, and advanced data analytics. One major challenge seen in Manufacturing operations is striving to deliver the highest quality during every stage of the production or assembly process. There is a strong desire to improve efficiency, accuracy, and consistency in identifying defects if any in the production process. By automating quality control, organizations aim to reduce human error, increase the speed of inspections, and ensure that products meet the desired standards.

Thanks to Machine Learning(ML) and Computer Vision technologies(CV) technology, manufacturers can carry out automated visual inspection, fault detection, segmentation, classification, and prediction at scale in near real-time. When tuned and calibrated optimally around different parameters such as controlled lighting, camera viewpoints, and others, remarkable productivity improvements can be achieved with enhanced defect detection rates up to 90% compared to traditional human inspection methods. One choice of architecture would be to create Multimodal models which refers to the fusion and analysis of data from multiple modalities, such as text, images, video, audio, and sensor data. This would provide a holistic understanding of data, leading to improved accuracy and efficiency in analysis.

In Part -1 of this Computer Vision for Quality Control series, we will see how to apply the power of these technologies to leverage Snowflake and build an end-to-end Vision Quality Control pipeline. The overall system requires careful selection and calibration of multiple components some of which are listed below :

Industry: Camera vendor strategy, Lighting strategy, Maintenance and Calibration Procedures, PLC Programming needs

Technical: Image stitching, Model building, Image Augmentation, Model retraining, Drift Detection, Inferencing, Automated Cloud to edge model deployment workflow

Solution Overview

For this blog, we will consider a fictional manufacturing company called PCB Dynamics. We will explore how to use training images and train a deep learning framework for detection and classification leveraging Snowpark Container Services(SPCS).

The end-to-end architecture for the Computer Vision based Quality control system is illustrated below in Figure 1.

Figure 1: Quality Control in Snowflake — Reference Architecture

Step 1 — As seen in the far left there are a couple of sources from which IT/OT data is being ingested. Most commonly there will be an IoT Bridge for the data ingestion. The IoT camera is used to capture product images from the assembly line and will be stored in the edge device. They will be routed for inferencing and the outputs from the detection are sent to Snowflake. Sometimes there may be a need to store the images in Cloud during which the images can be ingested into Snowflake. This can be predominantly for exploration, retraining, or other needs.

Snowflake supports unstructured data ingestion, storage, and processing. Hence we can securely access data files located in cloud storage (Snowflake stage), create File URLs, view metadata catalog from directory tables, and process the images.

Step 2 — PCB Dynamics has ingested their data into Snowflake from an edge IoT Camera to carry out data exploration. For training the quality control system, PCB Dynamics has planned to use a supervised approach for defect detection and classification and hence has annotated the defects in the training images using an annotating tool.

In this post, we will leverage PCB Data along with annotations from this git repository — https://github.com/Charmve/Surface-Defect-Detection/tree/master/DeepPCB is leveraged for training and inference.

Figure 2: A defect-free product (left) and defective product(right) with annotated classes

Model Choice

Step — 3 Computer Vision technology is an umbrella term that carries several different tasks like Segmentation, Classification, Object Detection, Localization, Denoising, Reconstruction and Estimation. We have chosen to approach the given problem of detecting defects and localization of the defect as an object detection task. One of the popular object detection models is YOLO. There are other choices that one can leverage and adopt as per their desired accuracy.

Yolo — You Only Look Once

One reason for choosing Yolo is that it carries the task of object detection as a regression problem rather than a classification problem. A single neural network predicts the bounding box and class probability with a single evaluation. This unified architecture gives the best and fastest performance. There are several versions of YOLO and the one we chose was the YOLOv5s which can be downloaded from the PyTorch hub. Also, Yolo’s streamlined design makes it suitable for various applications and easily adaptable to different hardware platforms, from edge devices to cloud APIs. Recommend to refer to the Enterprise Licensing information of Yolo on the Ultralytics website for commercial use.

Instance segmentation goes beyond object detection by identifying and separating individual objects in an image. The result of an instance segmentation model is a collection of masks or contours that outline each object in the image, along with class labels and confidence scores for each object. Instance segmentation is valuable when you require not only the location of objects in an image but also their precise shape.

I will refrain from delving into the architecture as it has already been extensively discussed in various papers and recommend you to read that if interested. YOLO and its different iterations are sometimes typically regarded as less accurate compared to two-stage detectors in terms of achieving high precision. However, where YOLO truly excels is in its remarkable speed, surpassing its counterparts by a significant margin. This speed advantage renders YOLO and its variations highly appealing for real-time applications or situations where speed is of utmost importance.

Metrics for Classification and Detection

The metrics used in visual defect detection and classification often originate from the field of machine learning, namely precision, recall and F-score. Other measures can be derived from a confusion matrix but are seldom used when it comes to this task. Some of the different metrics that are used for benchmarking are :

Accuracy
Processing speed (frames per second in a video)
Multiple classifications
Eliminate duplicate detections
IoU (Overlap of truth vs prediction)
Confidence predictions
Localization (especially small objects)
Average Precision (mAP)
F-score

Data Preparation:

Raw dataset analysis First of all, the original data is analyzed and it can be seen that they are grayscale images. They are cropped into many sub-images with a size of 640 x 640 and aligned through template matching techniques. Also, a threshold is carefully selected to employ binarization.

Image Annotation and Labeling

For each image captured from the IoT camera, an annotation tool is used to generate a text file per image that contains the following details in separate rows on each defect instance:

Class of the defect
x coordinate
y coordinate
height
width

There are several annotation tools in the market or one can even choose to build an in-house utility for this purpose.

Image Augmentation

By exposing the model to a broader range of synthetic variations, augmentation contributes to the model’s ability to generalize well to unseen data and improves its performance in real-world scenarios. There are several data augmentation techniques one will need to employ depending on the camera positioning and other parameters. Yolo provides the ability to carry the augmentation straight out of the box by calibrating a few hyperparameters to implement techniques like Mosaic, Mixup, random adjustments to the Hue, Saturation, and Value (HSV), flip, albumentations and formatting.

Model Training in Snowpark Container Services :

For model training, we will be leveraging Snowpark Container Services* which is a fully managed container offering from Snowflake designed to facilitate the deployment, management, and scaling of containerized applications within the Snowflake ecosystem. GPUs significantly accelerate computer vision operations and are required when training a deep-learning neural network with a large number of images.

The steps needed to train a model inside Snowpark Container are briefly outlined :

Load YOLOv5 (yolov5s.pt) from PyTorch Hub

We chose to download the YOLOv5s, the smallest and fastest model from PyTorch Hub.

2. Modify the number of classes

The default number of classes in the coco.yaml is 60 corresponding to the Coco dataset. This needs to be modified to the number of classes to 6 for the PCB Dataset.

class_mapping = “open”:0, “short”:1, “mousebite”:2, “spur”:3, “copper”:4, “pin-hole”:5

3. Data folder Preparation

Yolo expects the input dataset to be in a particular folder structure. For a complete tutorial on how to train a Yolo using custom data follow the steps here → https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/

The folder structure for the input dataset is below :

-data
 -images
  -train
  image1.jpg
  image2.jpg
  -val
  image1000.jpg
  image1001.jpg
 -labels
  -train
  image1.txt
  image2.txt
  -val
  image1000.txt
  image1001.txt

4. Build a Docker image for Snowpark Container Services, and upload the image to an image repository.

The below source code contains various instructions and configurations to download the Yolov5 source code from Ultralytics and install the necessary prerequisites before proceeding to align various sub-directories. We will be using the pre-trained Yolov5s model from PyTorch Hub and then custom train that with the PCB data.

The Entrypoint is set to a Python utility called training.py that contains the core logic behind the training and saves the custom-trained model weights. The Yolo repository is cloned and after entering into the directory the training functionality is invoked.

# ARG BASE_IMAGE=continuumio/miniconda3:4.12.0
# FROM $BASE_IMAGE
FROM pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime

RUN apt-get update && apt-get install -y libgl1-mesa-glx
RUN apt-get update && \
     apt-get -y --no-install-recommends install \
     libgomp1
RUN apt-get update && apt-get install -y libglib2.0-0
RUN mkdir /yolov5_custom && \
 cd /yolov5_custom

RUN apt-get install -y git curl && \
    git clone https://github.com/ultralytics/yolov5 /yolov5_custom

COPY ./yolov5s.pt /yolov5_custom/
COPY ./pcbyolo.yaml /yolov5_custom/
COPY  ./yolov5s.yaml /yolov5_custom/yolov5/models/
COPY ./data/images /yolov5_custom/yolov5/data/images
COPY ./data/labels /yolov5_custom/yolov5/data/labels

COPY requirements.txt ./
RUN pip3 install -r requirements.txt
RUN conda install snowflake-snowpark-python

RUN pip install psutil
COPY training.py ./yolov5_custom/
ENTRYPOINT ["python3", "yolov5_custom/training.py"]

5. Training.py routine :

Connect to Snowflake from the container. The routine training.py contains the logic to connect to Snowflake from a Snowpark Container, train the Yolov5s model and then store the model weights in a Snowflake Stage.

import argparse
import logging
import os

import numpy as np
import pandas as pd
import psutil
import joblib
import subprocess
import torch
from snowflake.snowpark import Session
from snowflake.snowpark.exceptions import *

import sys
sys.path.append('/yolov5_custom/')
from models.yolo import Model
# Environment variables below will be automatically populated by Snowflake.
SNOWFLAKE_ACCOUNT = os.getenv("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_HOST = os.getenv("SNOWFLAKE_HOST")
SNOWFLAKE_DATABASE = os.getenv("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.getenv("SNOWFLAKE_SCHEMA")

# Custom environment variables
SNOWFLAKE_USER = os.getenv("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.getenv("SNOWFLAKE_PASSWORD")
SNOWFLAKE_ROLE = os.getenv("SNOWFLAKE_ROLE")
SNOWFLAKE_WAREHOUSE = os.getenv("SNOWFLAKE_WAREHOUSE")


def get_arg_parser():
    """
    Input argument list.
    """
    parser = argparse.ArgumentParser()
    #parser.add_argument("--datayaml", required=True, help="location of yolo")
    #parser.add_argument("--cfg", required=True,  help="model yaml path")
    parser.add_argument("--trainingjobname", required=True,  help="name of the training job")
    parser.add_argument("--modelweights", required=True,  help="name of the trained model")

    return parser


def get_logger():
    """
    Get a logger for local logging.
    """
    logger = logging.getLogger("job-cvdetection")
    logger.setLevel(logging.DEBUG)
    handler = logging.StreamHandler(sys.stdout)
    handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter("%(name)s - %(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    return logger


def get_login_token():
    
    with open("/snowflake/session/token", "r") as f:
        return f.read()


def get_connection_params():
    """
    Construct Snowflake connection params from environment variables.
    """
    if os.path.exists("/snowflake/session/token"):
        return {
            "account": SNOWFLAKE_ACCOUNT,
            "host": SNOWFLAKE_HOST,
            "authenticator": "oauth",
            "token": get_login_token(),
            "warehouse": SNOWFLAKE_WAREHOUSE,
            "database": SNOWFLAKE_DATABASE,
            "schema": SNOWFLAKE_SCHEMA
        }
    else:
        return {
            "account": SNOWFLAKE_ACCOUNT,
            "host": SNOWFLAKE_HOST,
            "user": SNOWFLAKE_USER,
            "password": SNOWFLAKE_PASSWORD,
            "role": SNOWFLAKE_ROLE,
            "warehouse": SNOWFLAKE_WAREHOUSE,
            "database": SNOWFLAKE_DATABASE,
            "schema": SNOWFLAKE_SCHEMA
        }

def run_job():
    logger = get_logger()
    logger.info("Job started")

        
    # create snowflake session to access training data
    with Session.builder.configs(get_connection_params()).create() as session:
        # connecting stuff
        database = session.get_current_database()
        schema = session.get_current_schema()
        warehouse = session.get_current_warehouse()
        role = session.get_current_role()
        logger.info(
            f"Connection succeeded. Current session context: database={database}, schema={schema}, warehouse={warehouse}, role={role}"
        )

6. Run YOLO’s train.py for model training

The train.py routine from Yolo is invoked as a sub-process within the container. The number of epochs, batch size and other parameters can be determined after a few experimentation trials. Additionally one can choose to freeze certain layers, modify mask ratio, fusion of 2-dimensional convolution layer and a 2-dimensional batch normalization layer to optimize computation and more options.

img=640
batch=8
epochs=300
datafolder="pcbyolo.yaml"
cfg_path = "/yolov5_custom/yolov5/models/yolov5s.yaml"
weights = "yolov5s.pt"
jobname = "Custom_YOLOV5S"
        # Define the command to run train.py
command = [
            "python",
            "/yolov5_custom/train.py",
            "--img", str(img),
            "--batch", str(batch),
            "--epochs", str(epochs),
            "--data", datafolder,
            "--cfg", cfg_path,
            "--weights", weights,
            "--name", jobname
        ]

result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

7. Save the custom-trained model weights in Snowflake Stage

While the default model format is PyTorch one can export the model weight to any other supported formats like torchscript, ONNX, TensorflowLite, etc.. using the export.py routine. The model weight post-training is uploaded to the Snowflake stage. You can deploy to a Snowflake Model Registry also optionally.

from joblib import dump
run_directory = "/yolov5_custom/runs/train"

        if os.path.exists(run_directory):
            logger.info(f"Directory {run_directory} exists.")
            run_folder = os.listdir(run_directory)
            if len(run_folder) == 1 and os.path.isdir(os.path.join(run_directory, run_folder[0])):
            
                run_foldername = run_folder[0]

                # Construct the path to best.pt 
                best_weight_path = os.path.join(run_directory, run_foldername, "weights")
                best_pt_path = os.path.join(best_weight_path, "best.pt")
                session.file.put(best_pt_path, '@model_stg', auto_compress=False, overwrite=True)
    else:
            logger.warning(f"Directory {run_directory} does not exist.")
    
    script_directory="/yolov5_custom"
                    os.chdir(script_directory)
                    #python export.py --weights yolov5s.pt --img-size 640 --include torchscript
                    img=640
                    model_format="torchscript"
                    command = [
            "python",
            "/yolov5_custom/export.py",
            "--img", str(img),
            "--weights",best_pt_path,
            "--include", model_format]

                    result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
                     # Check if the command was successful (return code 0)
                    if result.returncode == 0:
                        print("Export.py execution successful.")
                        print("Command output:")
                        print(result.stdout)
                        
                    else:
                        print("Error running Export.py.")
                        print("Command error output:")
                        print(result.stderr)

                    torch_saved_model = os.path.join(run_directory, run_foldername, "weights")
                    torchscript_path = os.path.join(torch_saved_model, "best.torchscript")
                    print("Path to best.torchscript:", torchscript_path)
          #session.file.put(torchscript_path, '@model_stg', auto_compress=False, overwrite=True)

8. Stage the specification file, which gives Snowflake the container configuration information.

A sample spec would look like the one below :

spec:
  container:
  - name: cvspec
    image: <image_registry>/database/schema/image_repo/image_name
    env:
      SNOWFLAKE_WAREHOUSE: <warehouse_name>
    args:
    - "--trainingjobname=YOLOV5S_TRAIN"
    - "--modelweights=YOLOV5S_WEIGHT"
    volumeMounts:
        - name: dshm
          mountPath: /dev/shm
    resources:
      requests:
        memory: 40G
        nvidia.com/gpu: 4
        
      limits:
        memory: 40G
        nvidia.com/gpu: 4
  volumes:
  - name: dshm
    source: memory
    size: 1Gi

9. Create a compute pool (access CPUs or GPUs based on the requirement)

A compute pool is a collection of one or more virtual machine (VM) nodes on which Snowflake runs your Snowpark Container Services jobs and services. Refer to Snowpark Container Services documentation for the different compute pool family instances and the Consumption table. I chose a GPU_NV_M that comes with 4 NVIDIA A10G GPUs and the memory to process the model training for 300 epochs.

10. Create a job to invoke the Model training

To deploy, manage, and scale this containerized application we will create a job. The EXECUTE SERVICE command is used to create a job. Since the training job takes somewhere between 30 minutes to an hour for training with 5000 images, the status can be monitored using the job UUID or the event table.

EXECUTE SERVICE IN COMPUTE POOL CVQC_TRAIN_GPU_NV_M
FROM @CVQC_STAGE
SPECIFICATION_FILE='CVQC_TRAIN_SPEC.yaml';

11. Validate against a sample set of data

For a quick validation, I used the custom-trained model weights and validated an image.

Figure 3: Defect Detection with bounding boxes, defect classes and confidence scores

Conclusion :

Leveraging proactive quality measures empowers manufacturers to tackle quality concerns head-on, preventing the circulation of defective products and minimizing initial costs. This not only acts as a shield for brand reputation but also cultivates enduring customer trust and loyalty. Snowflake offers the ability to deploy cutting-edge defect detection technologies that go beyond operational efficiency, offering manufacturers the ability to swiftly identify faults, reduce downtime, optimize production workflows, and elevate overall productivity. The profound impact of robust defect detection is a strategic investment that reaches far beyond immediate cost savings, positioning manufacturers for sustained success in the competitive landscape.

Our Yolov5s model achieved 94.6% mAp, 92.2% F-score @ 82FPS on an edge device. If you are interested in learning how to create a complete pipeline for configuring a Nvidia Jetson Nano device to use this custom-trained model for inference, as well as automatic deployment of the model, retraining, and detecting drift in Snowflake, then follow Part 2 of this blog series.

*Snowpark Container Services is in Preview in some regions at the time of writing this.

Keep innovating!