Swift/Metal Image Processing

Yuya Horita
4 min readMar 13, 2019

--

Introduction

This is a basic section. Others here.

1. Edge Detection, Prewitt, Sobel and Laplace

2. Smoothing, Gaussian and Bilateral

Basic

Image Processing is a method to perform some operations on an image.

  • getting an enhanced image
  • extracting some information, like edges

Metal render advanced 3D graphics and perform data-parallel computations using graphics processors (GPUs).

High performance image processing can be performed in iOS/macOS, thanks to Metal.

In computers, images are represented as a matrix. Each element of the matrix represents the color of each pixel.

Image matrix

Simply, let’s think about the following case.

Grayscale image

This is a grayscale image which just has four regions.

  1. Black (top-left)
  2. Dark-Gray (top-right)
  3. Middle-Gray (bottom-left)
  4. Light-Gray (bottom-right)

Gray scale color has its corresponding value. Here, assuming that each color’s gray scale value is

  1. 10
  2. 60
  3. 100
  4. 160

The number is not precise.

Then, the above image can be represented as

This is the matrix of it.

This is basic. Even if an image is large and complicated, it is just a cluster of color(or pixel). Of course, such image’s matrix has a large number of elements.

Image Processing will be performed on each pixel of an image.

For example, if you want to get a gray scale image from colorful image, change each pixel’s color to gray scale.

There are some methods for converting to gray scale.

  • Average: V= (R + G + B) / 3
  • Common: V= R * 0.3 + G * 0.59 + B * 0.11
  • BT.709: V= R * 0.2126 + G * 0.7152 + B * 0.0722
  • BT.601: V= R * 0.299 + G * 0.587 + B * 0.114

where V is grayscale value.

Let’s try.

I prepared simple project for using metal easily. Clone this repository and open SwiftImageProcessor.xcodeproj .

In SwiftImageProcessor/Shader/Color/gray.metal , the above converting shading functions are defined. One of them.

average converting

Each data has the following meaning.

  • texture2d: input/output image files(jpg, png) are processed after converted to texture in Metal.
  • [[ thread_position_in_grid ]]: Metal attributes representing pixel’s index. This ref is helpful.
  • halfN: N component vector data, like rgb = (0.5, 0.5, 0.5), rgba = (0.5, 0.5, 0.5, 1.0).

This kernel function can be interpreted as

Getting each pixel’s RGB color from input ( inTexture.read(gid)), calculating the average, paint the pixel of output image average gray color ( outTexture.write).

In CommandLineProcessor.swift ,

inputFileName and kernel are defined. Changing these values, you can try applying your custom filter easily.

Open the execute file directory, copying Resources/landscape.jpg there. (I’m sorry this is a terrible approach. Will be improved in the future.)

Resources/landscape.jpg

Run Xcode. landscape_gray_average.jpg is created in the same directly.

landscape_gray_average.jpg (resized)

You could get a grayscale image of input image!!

Try other grayscale filters.

top-left: average, top-right: common, bottom-left: BT601, bottom-right: BT709

Spatial Filtering

In the previous section, output pixel color are calculated from one pixel. Obviously, read input pixel color of gid, grid index and wrote output pixel color of same gid.

Spatial filtering is a technique for changing intensities of pixel according to the intensities of neighboring pixels.

Let I(i, j), I’(i, j) be an input image and an output image. I’(i, j) is calculated from I(i, j) and the neighboring pixels.

input image matrix

For example, when the current target pixel is I(2, 2), or gid is (2, 2), the neighboring pixels are I(1, 1), I(1, 2), I(1, 3), I(2, 1), I(2, 3), I(3, 1), I(3, 2), I(3, 3).

in the case for 3x3 matrix.

Using these pixels, I’(2, 2) is calculated just like the following. The calculation is called convolution.

3x3 partial filtering

Weighting each pixel is a key of this process. The weights, or called Kernel, is also represented by matrix.

kernel matrix

Then, the convolution is

convolution

where k is determined by kernel size. If the kernel is 3x3 matrix, k = 1.

For image border calculation, there are some choice.

  1. Ignoring the pixels at the border
  2. Adding artificial pixels
  3. Changing the size of the kernel

Kernel determines spatial filtering behavior.

Other Articles

--

--

Yuya Horita

Master of Nuclear Physics, CyberAgent, Inc. FRESH LIVE. M3. Software Engineer. Twitter: https://twitter.com/horita_yuya ,GitHub: https://github.com/horita-yuya