Swift/Metal Image Processing
Introduction
This is a basic section. Others here.
1. Edge Detection, Prewitt, Sobel and Laplace
2. Smoothing, Gaussian and Bilateral
Basic
Image Processing is a method to perform some operations on an image.
- getting an enhanced image
- extracting some information, like edges
Metal render advanced 3D graphics and perform data-parallel computations using graphics processors (GPUs).
High performance image processing can be performed in iOS/macOS, thanks to Metal.
In computers, images are represented as a matrix. Each element of the matrix represents the color of each pixel.
Simply, let’s think about the following case.
This is a grayscale image which just has four regions.
- Black (top-left)
- Dark-Gray (top-right)
- Middle-Gray (bottom-left)
- Light-Gray (bottom-right)
Gray scale color has its corresponding value. Here, assuming that each color’s gray scale value is
- 10
- 60
- 100
- 160
The number is not precise.
Then, the above image can be represented as
This is the matrix of it.
This is basic. Even if an image is large and complicated, it is just a cluster of color(or pixel). Of course, such image’s matrix has a large number of elements.
Image Processing will be performed on each pixel of an image.
For example, if you want to get a gray scale image from colorful image, change each pixel’s color to gray scale.
There are some methods for converting to gray scale.
- Average: V= (R + G + B) / 3
- Common: V= R * 0.3 + G * 0.59 + B * 0.11
- BT.709: V= R * 0.2126 + G * 0.7152 + B * 0.0722
- BT.601: V= R * 0.299 + G * 0.587 + B * 0.114
where V is grayscale value.
Let’s try.
I prepared simple project for using metal easily. Clone this repository and open SwiftImageProcessor.xcodeproj
.
In SwiftImageProcessor/Shader/Color/gray.metal
, the above converting shading functions are defined. One of them.
Each data has the following meaning.
texture2d
: input/output image files(jpg, png) are processed after converted to texture in Metal.[[ thread_position_in_grid ]]
: Metal attributes representing pixel’s index. This ref is helpful.halfN
: N component vector data, like rgb = (0.5, 0.5, 0.5), rgba = (0.5, 0.5, 0.5, 1.0).
This kernel function can be interpreted as
Getting each pixel’s RGB color from input ( inTexture.read(gid)
), calculating the average, paint the pixel of output image average gray color ( outTexture.write
).
In CommandLineProcessor.swift
,
inputFileName
and kernel
are defined. Changing these values, you can try applying your custom filter easily.
Open the execute file directory, copying Resources/landscape.jpg
there. (I’m sorry this is a terrible approach. Will be improved in the future.)
Run Xcode. landscape_gray_average.jpg
is created in the same directly.
You could get a grayscale image of input image!!
Try other grayscale filters.
Spatial Filtering
In the previous section, output pixel color are calculated from one pixel. Obviously, read input pixel color of gid
, grid index and wrote output pixel color of same gid
.
Spatial filtering is a technique for changing intensities of pixel according to the intensities of neighboring pixels.
Let I(i, j), I’(i, j) be an input image and an output image. I’(i, j) is calculated from I(i, j) and the neighboring pixels.
For example, when the current target pixel is I(2, 2), or gid
is (2, 2), the neighboring pixels are I(1, 1), I(1, 2), I(1, 3), I(2, 1), I(2, 3), I(3, 1), I(3, 2), I(3, 3).
in the case for 3x3 matrix.
Using these pixels, I’(2, 2) is calculated just like the following. The calculation is called convolution.
Weighting each pixel is a key of this process. The weights, or called Kernel, is also represented by matrix.
Then, the convolution is
where k is determined by kernel size. If the kernel is 3x3 matrix, k = 1.
For image border calculation, there are some choice.
- Ignoring the pixels at the border
- Adding artificial pixels
- Changing the size of the kernel
Kernel determines spatial filtering behavior.