Swift/Metal Image Processing

Yuya Horita

4 min readMar 13, 2019

Introduction

This is a basic section. Others here.

1. Edge Detection, Prewitt, Sobel and Laplace

2. Smoothing, Gaussian and Bilateral

Basic

Image Processing is a method to perform some operations on an image.

getting an enhanced image
extracting some information, like edges

Metal render advanced 3D graphics and perform data-parallel computations using graphics processors (GPUs).

High performance image processing can be performed in iOS/macOS, thanks to Metal.

In computers, images are represented as a matrix. Each element of the matrix represents the color of each pixel.

Simply, let’s think about the following case.

This is a grayscale image which just has four regions.

Black (top-left)
Dark-Gray (top-right)
Middle-Gray (bottom-left)
Light-Gray (bottom-right)

Gray scale color has its corresponding value. Here, assuming that each color’s gray scale value is

The number is not precise.

Then, the above image can be represented as

This is the matrix of it.

This is basic. Even if an image is large and complicated, it is just a cluster of color(or pixel). Of course, such image’s matrix has a large number of elements.

Image Processing will be performed on each pixel of an image.

For example, if you want to get a gray scale image from colorful image, change each pixel’s color to gray scale.

There are some methods for converting to gray scale.

Average: V= (R + G + B) / 3
Common: V= R * 0.3 + G * 0.59 + B * 0.11
BT.709: V= R * 0.2126 + G * 0.7152 + B * 0.0722
BT.601: V= R * 0.299 + G * 0.587 + B * 0.114

where V is grayscale value.

Let’s try.

I prepared simple project for using metal easily. Clone this repository and open SwiftImageProcessor.xcodeproj .

In SwiftImageProcessor/Shader/Color/gray.metal , the above converting shading functions are defined. One of them.

Each data has the following meaning.

texture2d: input/output image files(jpg, png) are processed after converted to texture in Metal.
[[ thread_position_in_grid ]]: Metal attributes representing pixel’s index. This ref is helpful.
halfN: N component vector data, like rgb = (0.5, 0.5, 0.5), rgba = (0.5, 0.5, 0.5, 1.0).

This kernel function can be interpreted as

Getting each pixel’s RGB color from input ( inTexture.read(gid)), calculating the average, paint the pixel of output image average gray color ( outTexture.write).

In CommandLineProcessor.swift ,

inputFileName and kernel are defined. Changing these values, you can try applying your custom filter easily.

Open the execute file directory, copying Resources/landscape.jpg there. (I’m sorry this is a terrible approach. Will be improved in the future.)

Run Xcode. landscape_gray_average.jpg is created in the same directly.

You could get a grayscale image of input image!!

Try other grayscale filters.

top-left: average, top-right: common, bottom-left: BT601, bottom-right: BT709

Spatial Filtering

In the previous section, output pixel color are calculated from one pixel. Obviously, read input pixel color of gid, grid index and wrote output pixel color of same gid.

Spatial filtering is a technique for changing intensities of pixel according to the intensities of neighboring pixels.

Let I(i, j), I’(i, j) be an input image and an output image. I’(i, j) is calculated from I(i, j) and the neighboring pixels.

For example, when the current target pixel is I(2, 2), or gid is (2, 2), the neighboring pixels are I(1, 1), I(1, 2), I(1, 3), I(2, 1), I(2, 3), I(3, 1), I(3, 2), I(3, 3).