Kamil Tustanowski
7 min readJul 20, 2023

In my previous articles, I focused on detecting people in the images. Starting from how the body and limbs are positioned in space and finishing on trying to understand the mood of the person based on face landmarks. All this information can tell us a lot about people and the situations they are in. But this is not all Vision can do.

Barcode detection may not sound as interesting as the other Vision features but it has a special place in my heart. A few years ago I was working on an application where scanning various barcodes, QR, and others was one of the key features. There wasn’t first-party support back then and my life would be much simpler if I had Vision then.

This week I want to introduce the VNDetectBarcodesRequest. This request is versatile and can detect QR, Aztec, UPC-E, and more. As a result, it provides a rectangle containing the code and decoded payload. Which is all we need.

We can either create the request and make it look for anything it can recognize:

let barcodeRequest = VNDetectBarcodesRequest()

Or, and this is more likely, work with a subset of codes:

let barcodesRequest = VNDetectBarcodesRequest()
barcodesRequest.symbologies = [.QR]

In this case, this request will look for QR codes and nothing more. We remove this code for now because we want to see what this request can detect.

The next step is to create a request handler and ask it to perform our request:

let requestHandler = VNImageRequestHandler(cgImage: cgImage,
orientation: .init(image.imageOrientation),
options: [:])

visionQueue.async { [weak self] in
do {
try requestHandler.perform([barcodesRequest])
} catch {
print("Can't make the request due to \(error)")
}

This is explained in Detecting body pose using Vision framework article.

When a request is performed it’s time to get the results. We need to cast them to [VNBarcodeObservation] before they are ready to use:

guard let results = barcodesRequest.results as? [VNBarcodeObservation] else { return }

In VNBarcodeObservation we are interested in:

  • The boundingBox which returns the CGRect representing an area in the image where the code is located.
  • The payloadStringValue which, if present, contains a string with decoded information.
  • The confidence which contains values from 0.0 to 1.0 describing whether Vision is certain of this observation or not.

We have the observations and now need to find a way to present them. Observation provides us two important pieces of information and we want to display both of them.

Let’s start with preparing the data:

let boxesAndPayload = results
.map { (box: $0.boundingBox.rectangle(in: image),
payload: $0.payloadStringValue ?? "n/a") }

Payload is optional which can be problematic while iterating through the results. To make it simpler I made boxes and payload tuple to hold boundingBox and payload string ("n/a" if missing).

What is worth mentioning here is that CGRect in boundingBox is using normalized coordinate space. In order to use it, we need to project it into image coordinates first. Similar to CGPoint there is a dedicated function to do that:

extension CGRect {
func rectangle(in image: UIImage) -> CGRect {
VNImageRectForNormalizedRect(self,
Int(image.size.width),
Int(image.size.height))
}
}

With all this in place, we end up with an array of tuples containing CGRects and corresponding payload strings or "n/a" if the payload is missing.

Let’s finish preparing the CGRects for our rendering code. We remember that there is one more step needed after handling the normalized values. We need to translate the coordinates from CoreImage coordinate space to UIKit. More info in Detecting body pose using Vision framework.

We need to translate the origin of the rectangle:

let rectangles = boxesAndPayload.map { $0.box }
.map { CGRect(origin: $0.origin.translateFromCoreImageToUIKitCoordinateSpace(using: image.size.height),
size: $0.size) }

Now we can add our rendering code:

extension UIImage {    
func draw(rectangles: [CGRect],
strokeColor: UIColor = .primary,
lineWidth: CGFloat = 2) -> UIImage? {
let renderer = UIGraphicsImageRenderer(size: size)
return renderer.image { context in
draw(in: CGRect(origin: .zero, size: size))
context.cgContext.setStrokeColor(strokeColor.cgColor)
context.cgContext.setLineWidth(lineWidth)
rectangles.forEach { context.cgContext.addRect($0) }
context.cgContext.drawPath(using: .stroke)
}
}
}

This time, unlike in previous articles, we will draw the image in a more swifty way with the help of UIGraphicsImageRenderer but apart from that it's "business the usual". First, we draw the image as a "background" and then draw the rectangles over it.

If we run the application this is what we get:

There are at least a few things wrong with this image. One barcode looks like it was detected multiple times and QR and Aztec detection rectangles are drawn below the codes.

Let’s think about the rectangles drawn below codes first. Previously we were working with CGPoints and were translating raw coordinates. The difference here is that we are dealing with the CGRect and there is a subtle difference:

In the default Core Graphics coordinate space, the origin is located in the lower-left corner of the rectangle and the rectangle extends towards the upper-right corner. If the context has a flipped-coordinate space — often the case on iOS — the origin is in the upper-left corner and the rectangle extends towards the lower-right corner.

From CGRect documentation

TL;DR not only the origin is flipped but the way rectangles are drawn is too.

Good thing there is an easy fix. We need to subtract the height of the rectangle from the Y-axis value of the origin:

let rectangles = boxesAndPayload.map { $0.box }
.map { CGRect(origin: $0.origin.translateFromCoreImageToUIKitCoordinateSpace(using: image.size.height - $0.size.height),
size: $0.size) }

One problem solved:

Now let’s see what we can do about the multiplied detection. The fix is straightforward but relies on the payload:

extension VNDetectBarcodesRequest: ResultPointsProviding {
var uniqueObservations: [VNBarcodeObservation] {
guard let results = results as? [VNBarcodeObservation] else { return [] }
let payloads = results.compactMap { $0.payloadStringValue }
let uniquePayloads = Set(payloads)

return uniquePayloads.compactMap { payload in
results.filter { $0.payloadStringValue == payload }
.sorted(by: { observationOne, observationTwo in
observationOne.boundingBox.area > observationTwo.boundingBox.area
}).first
}
}
}

extension CGRect {
var area: CGFloat {
height * width
}
}

The general idea is to get all the unique payloads. This means we display the detected and properly recognized codes and nothing more. And we do it once per payload (code). First we get all non-nil payloads and we put them into Set which guarantees the payloads are unique. Then we map each payload to the largest boundingBox available. It doesn't show in the example but not all the barcodes detected are represented by a rectangle close to a line, sometimes they are larger and the larger ones are more interesting.

We replace:

guard let results = barcodesRequest.results as? [VNBarcodeObservation] else { return }

With:

let results = barcodesRequest.uniqueObservations

The result:

The last thing to do is to display the payload.

We need to provide a CGRect and a String for the drawing function. This struct will make this easier:

struct DisplayableText {
let frame: CGRect
let text: String
}

Let’s prepare the data:

let displayableTexts = zip(rectangles,
boxesAndPayload.map { $0.payload })
.map { DisplayableText(frame: $0.0,
text: $0.1) }

We could use boxesAndPayload for both but it uses normalized bounding boxes. Since we have the translated rectangles array containing ready to use CGRects it would be a shame not to use it. To join our rectangles and payloads together we use zip function which:

Creates a sequence of pairs built out of two underlying sequences.

From zip documentation

Now we can add a parameter for texts to the drawing function and make simple text-drawing code:

let textAttributes = [NSAttributedString.Key.font: UIFont.systemFont(ofSize: 20, weight: .bold),
NSAttributedString.Key.foregroundColor: strokeColor,
NSAttributedString.Key.backgroundColor: UIColor.black]

displayableTexts.forEach { displayableText in
displayableText.text.draw(with: displayableText.frame,
options: [],
attributes: textAttributes,
context: nil)
}

And when we pass the texts:

self?.imageView.image = image.draw(rectangles: rectangles,
displayableTexts: displayableTexts)

Our work is complete:

This is the full request and rendering code:

extension ImageProcessingViewController {
func process(_ image: UIImage) {
guard let cgImage = image.cgImage else { return }
let barcodesRequest = VNDetectBarcodesRequest()

let requestHandler = VNImageRequestHandler(cgImage: cgImage,
orientation: .init(image.imageOrientation),
options: [:])

saveImageButton.isHidden = false
visionQueue.async { [weak self] in
do {
try requestHandler.perform([barcodesRequest])
} catch {
print("Can't make the request due to \(error)")
}

let results = barcodesRequest.uniqueObservations

let boxesAndPayload = results
.map { (box: $0.boundingBox.rectangle(in: image),
payload: $0.payloadStringValue ?? "n/a") }

let rectangles = boxesAndPayload.map { $0.box }
.map { CGRect(origin: $0.origin.translateFromCoreImageToUIKitCoordinateSpace(using: image.size.height - $0.size.height),
size: $0.size) }

let displayableTexts = zip(rectangles,
boxesAndPayload.map { $0.payload })
.map { DisplayableText(frame: $0.0,
text: $0.1) }

DispatchQueue.main.async {
self?.imageView.image = image.draw(rectangles: rectangles,
displayableTexts: displayableTexts)
}
}
}
}

extension UIImage {
func draw(rectangles: [CGRect],
displayableTexts: [DisplayableText],
strokeColor: UIColor = .primary,
lineWidth: CGFloat = 2) -> UIImage? {
let renderer = UIGraphicsImageRenderer(size: size)
return renderer.image { context in
draw(in: CGRect(origin: .zero, size: size))
context.cgContext.setStrokeColor(strokeColor.cgColor)
context.cgContext.setLineWidth(lineWidth)
rectangles.forEach { context.cgContext.addRect($0) }
context.cgContext.drawPath(using: .stroke)

let textAttributes = [NSAttributedString.Key.font: UIFont.systemFont(ofSize: 20, weight: .bold),
NSAttributedString.Key.foregroundColor: strokeColor,
NSAttributedString.Key.backgroundColor: UIColor.black]

displayableTexts.forEach { displayableText in
displayableText.text.draw(with: displayableText.frame,
options: [],
attributes: textAttributes,
context: nil)
}
}
}
}

If you want to play with Vision and see it for yourself you can check the latest version of my vision demo application here. If you want to check the code that was used in this article check version 0.3.0. The code is located in this file.

If you have any feedback, or just want to say hi, you are more than welcome to write me an [e-mail] (mailto:kamil.tustanowski@gmail.com) or tweet to @tustanowskik

If you want to be up to date and always be first to know what I’m working on tap follow @tustanowskik on twitter

Thank you for reading!

[This was first published on my blog]