Automated COVID Vaccination Card Verification — Verification Checks

Published in

UNC Blue Sky Innovations

5 min readJun 28, 2021

COVID Vaccination Card Verification Tutorial Series: Part 1: Image Alignment| Part 2: Verification Checks

Tutorial Part 2: Verification Checks

After alignment is performed, we need to check to see if the image is actually a COVID vaccination card. We do this by verifying the following three things are true: there is a sufficient number of RANSAC-inliers, the title contains the expected text, the CDC logo is present in the top right.

Check #1: Minimum RANSAC-inlier requirement

In the previous section, we discussed how RANSAC is used to estimate the homography which is used to align the input image. We also discussed how the estimation is supported by certain keypoint matches (inliers), while the rest are discarded (outliers). One way to verify that the alignment was successful is to require a certain number of inliers to be present. This is because if the image contains a vaccination card, we can expect there to be a higher number of keypoint matches that agree with a single homography.

An example of an invalid input, which contains 7 inliers (green) during RANSAC homography estimation and 93 outliers (red).

In our testing, we found that images containing vaccination cards tend to contain >15 inliers, while images not containing vaccination cards tend to have < 10 inliers. So we perform the following check:

if RANSAC_inliers < 11:
    print("Failed RANSAC inlier verification!")

However, we also found that there are exceptions to this rule, so we perform two additional checks.

Check #2: Verify the title is correct

As an additional verification measure, we verify the title of the card is correct by performing optical character recognition (OCR). OCR is another well-studied computer vision task, in which images of text are converted to computer-encoded text. To do this, we use pytesseract, a popular open-source python OCR package.

First, we must extract the portion of our aligned image which contains the title. We do this by referring to the template image to predefine a region in which we expect the title to be. This is done by manually obtaining the bounding box coordinates via an image editing tool, such as GIMP.

The predefined title region overlaid on the template image (left) and the aligned scan (right).

The extracted title region of the aligned scan.

This image is then provided as input to pytesseract’s image_to_string() method. The full code of our read_title() method is:

def read_title(aligned, fileTag=None, debug=False) -> str:
    
    # get title ROI
    (x, y, w, h) = (0, 0, 1487, 160)
    roi = aligned[y:y + h, x:x + w]...# OCR the ROI using Tesseract
    rgb = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)
    text = pytesseract.image_to_string(rgb)return text

Now that we have the extracted title, we can check to see if the title string contains the expected text, “COVID-19 Vaccination Record Card”, by comparing against a regex. We find it necessary to allow for variable amounts of whitespace between keywords. The test fails if the correct title is not detected.

expectedTitleRegex = re.compile(r'(COVID\s*-\s*1\s*9\s*Vaccination\s*Record\s*Card)', re.DOTALL)
match = expectedTitleRegex.search(title)
if match is None:
    print('Failed title verification!')

Check #3: Verify the logos are present

As the final verification check, we check to see if we can detect the CDC and Department of Health and Human Services logos in the top right of the image. We do this via a process known as template matching.

Card scan (left) and logo template (right)

Again, we begin by converting both the template and scan image to grayscale.

aligned_gray = cv2.cvtColor(aligned, cv2.COLOR_BGR2GRAY)
logo_template_gray = cv2.cvtColor(template_img, cv2.COLOR_BGR2GRAY)

Next, we then utilize OpenCV’s matchTemplate() method to perform template matching. This method essentially overlays the template over part of the image and computes a similarity score. It repeats this process for every pixel by sliding the template over one pixel at a time and recomputing the similarity score. The function returns a 2D array filled with similarity scores, one corresponding to every pixel.

tm_result = cv2.matchTemplate(aligned_gray, logo_template_gray, cv2.TM_CCOEFF_NORMED)

The matchTemplate() method can use a number of similarity measures to compute its scores: squared difference, cross-correlation, or zero-mean cross-correlation (also referred to as the correlation coefficient in statistics literature). Normalized versions of each measure are also provided.

For most template matching applications, zero-mean normalized cross-correlation (known as TM_CCOEFF_NORMED in OpenCV) is the most appropriate. Cross-correlation is directly related to the squared difference, but contains fewer terms and is thus easier to compute. Shifting the image distribution to have zero-mean enables the measure to be insensitive to image brightness. Adding normalization provides insensitivity to image contrast, and constrains the output to be within [-1, 1]. If you’re interested in learning more about template matching similarity measures, I recommend course notes from Duke’s Carlo Tomasi, and Lisa Brown’s publication from CSUR 1992, “A survey of image registration techniques.”

Because we are using the correlation coefficient as our similarity measure, the highest value in the output represents the highest correlation (most similar). We use OpenCV’s minMaxLoc() method to get the max value and its location.

_, max_zncc_val, _, max_zncc_loc = cv2.minMaxLoc(tm_result)

Given this output, we verify the logo is in the expected location by checking to see if it is within 10 pixels of the top right corner.

if max_zncc_loc[0] < 1470 or max_zncc_loc[1] > 10:
    print("Failed CDC logo check. Location should be in top right corner!")

We also check to see if the maximum correlation coefficient is at least 0.4. We found that fraudulent images tend to produce max similarity scores in the range (0.15–0.25), while images of valid vaccination cards produced max values of >0.5.

elif max_zncc_val < 0.4:
    print("Failed CDC logo check. Similarity score too low!")

If both above checks pass, then the logo has been detected successfully!

If all three verification checks pass, then the program indicates that a valid COVID vaccination card has been detected!

COVID Vaccination Card Verification Tutorial Series: Part 1: Image Alignment| Part 2: Verification Checks

References

[1]: Hartley, Richard, and Andrew Zisserman. Multiple View Geometry in Computer Vision. 2nd ed, Cambridge University Press, 2003.

[2]: Rublee, Ethan, et al. “ORB: An efficient alternative to SIFT or SURF.” 2011 International conference on computer vision. Ieee, 2011.