Syncing Post-Its to Miro with YoloV5 and AWS Rekognition

Pieterjan Criel @pjcr
Product & Engineering at Showpad
6 min readSep 12, 2022

Introduction

At my desk I have my laptop in front of me, two big monitors, a pen and a pack of Post-Its. I love using Post-Its! Fantastic for brainstorming, planning, and reminders for random pieces of information.

These Post-Its somehow always end up on my monitors, making them quite expensive surfaces to use for sticky pieces of paper.

We recently started using Miro. I love it! It’s great for brainstorming, planning, and retros.

Can I use it as alternative for my Post-Its-on-Monitors situation? Let’s create a service that will detect Post-Its, perform OCR, and syncs that information with Miro; It’s only logical. So let’s dive in.

Labeling and training a detection model

Before we can train a detection service, we need to create a dataset. I took about 40 pictures of my Post-Its and labeled them with Label Studio.

Labeling via Label Studio

I used the Label Studio Docker image.

docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest

From Label Studio, you can export a Yolo dataset (images folder containing all the images and a labels folder with all the labels). The only thing left now is to split this dataset into a test and train set. I chose an 80/20 split, which means that 80% of the data will be used for training purposes and 20% for testing purposes.

YoloV5 makes training a custom detection model super easy. All you need to do is create a config file to tell Yolo where your data is (you can do more than that, of course). But as I’m over-engineering this at this point rather than just manually creating stickies in Miro, I’ll stick with the default model parameters.

git clone https://github.com/ultralytics/yolov5.git

My config file (postit.yaml) looks like:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: mydata/media/upload/1 # dataset root dir
train: images/train # train images (relative to 'path')
val: images/validate # val images (relative to 'path')
test: images/test # test images (optional)
# Classes
nc: 1 # number of classes
names: ['PostIt'] # class names

We’ll start from a pre-trained checkpoint of YOLOv5 (yolov5l6.pt) that you can find here.

We’re ready to train. I took a batch size of 4, an image width of 1000px and 50 epochs. I used an AWS ml.g4dn.xlarge instance to train this model (it took about 35min to complete 50 epochs).

python3 yolov5/train.py --batch-size 4 --img 1000 --epochs 50 --data yolov5-config/postit.yaml --weights yolov5l6.pt --project=runs/train --name training --exist-ok
Training results (Could have trained it longer / try variations — but for my purpose this was good enough) (x-axis as the 50 epochs)

The training results will be saved and the best model will be indicated as such runs/train/training/weights/best.pt

Yeah! good enough on the validation set!

Running the prediction and perform OCR

First, the YoloV5 model needs to be loaded so it can be used.

import sys
sys.path.append('yolov5')
from models.common import DetectMultiBackend
from utils.datasets import LoadImages
from utils.general import (check_img_size, non_max_suppression,
scale_coords, xyxy2xywh)
from utils.torch_utils import select_device
# getting the weights
post_it_model = 'runs/train/training/weights/best.pt'
# loading the model
# (using helpers provided in the yolov5 repository)
model = DetectMultiBackend(post_it_model, dnn=False)

I had a YoloDetector Class lying around that I was planning on using in the future so I decided to use it and make the code a little easier to read.

detector = YoloDetector('runs/train/exp002/weights/best.pt') #nice

Great! Now that we have a detector, let’s crop out the individual detections to run them through AWS Rekognition for OCR. Per image we get the x, y coordinates and the width and height relative to the input image.

Looking good!

In order to use AWS Rekognition the file needs to be on S3.

s3_client = boto3.client('s3')
rekognition_client = boto3.client('rekognition')
def upload_file(file_name, bucket, object_name=None):
if object_name is None:
object_name = os.path.basename(file_name)
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return True
def detect_text(image, bucket):
response=rekognition_client.detect_text(Image={'S3Object':{'Bucket':bucket,'Name':image}})
textDetections=response['TextDetections']

return textDetections

Great! if we now run detect_text on the image above we get

[{'DetectedText': 'MODELS',
'Type': 'LINE',
'Id': 0,
'Confidence': 98.6706314086914,
'Geometry': {'BoundingBox': {'Width': 0.6352776885032654,
'Height': 0.3025865852832794,
'Left': 0.17596083879470825,
'Top': 0.3509500324726105},
'Polygon': [{'X': 0.17596083879470825, 'Y': 0.4877518117427826},
{'X': 0.7732686996459961, 'Y': 0.3509500324726105},
{'X': 0.8112385272979736, 'Y': 0.5167348384857178},
{'X': 0.2139306217432022, 'Y': 0.6535366177558899}]}},
{'DetectedText': 'MODELS',
'Type': 'WORD',
'Id': 1,
'ParentId': 0,
'Confidence': 98.6706314086914,
'Geometry': {'BoundingBox': {'Width': 0.6352776885032654,
'Height': 0.3004652261734009,
'Left': 0.17596083879470825,
'Top': 0.35201069712638855},
'Polygon': [{'X': 0.17596083879470825, 'Y': 0.4877518117427826},
{'X': 0.7733631134033203, 'Y': 0.35201069712638855},
{'X': 0.8112385272979736, 'Y': 0.5167348384857178},
{'X': 0.2138361930847168, 'Y': 0.6524759531021118}]}}]

Rekognition return text detections per line and per word with a confidence score and the location. For this little project the detected lines were good enough (I didn’t do much parsing with regard to individual words and there relative confidence scores).

Post the results to Miro

Now that we have the location of the Post-Its and their content, it’s time to POST these to Miro. In order to do so, you need to create a developer account (see settings) to create a token. The Miro Developer pages actually have a nice guide on how to do that.

def create_payload(content, x, y):
payload = {
"data": {
"content": content,
"shape": "square"
},
"style": {
"fillColor": "light_pink",
"textAlign": "center",
"textAlignVertical": "top"
},
"position": {
"x": x,
"y": y,
"origin": "center"
}
}

return payload
def create_sticky(url,content, x, y,token):
headers = {"Content-Type": "application/json; charset=utf-8", "Authorization": f"Bearer {token}"}
data = create_payload(content,x,y)
response = requests.post(url, headers=headers, json=data)
print("Creating Sticky - Status Code", response.status_code)

I have created two methods: create_payload(, which will create the payload to create the Post-It (x,y and the content), and create_sticky, which will post the payload to the dashboard (url).

board_id = "xxxxxxxx" # pick a board by id
url = f"https://api.miro.com/v2/boards/{board_id}/sticky_notes"

Finally, the loop will go through all detections, run the OCR, and create the Post-It. The result looks like this

With the original like. The relative positioning is retained quite well.

Closing thoughts

I’m quite happy with the results. all-by-all I only spend 3 hours on this little project start to finish. Lot’s of improvements are possible (which I’ll probably will never do except from listing them here)

  • Train on many more backgrounds / sizes / styles of Post-Its;
  • Detection of the color;
  • Detection of the rotation;
  • A service or app where you can choose a Miro board and just take a picture with your phone. YoloV5 has an export to CoreML method, so it is possible to — without a lot of effort — port a YoloV5 model to an iPhone. On an iPhone the OCR component could probably be replaced with Apple’s APIs.
  • Beter parsing of the Rekognition output to take alternatives into account and/or do more tests with more challenging handwriting.

--

--

Pieterjan Criel @pjcr
Product & Engineering at Showpad

👨‍👩‍👧‍👦 Dad of two 🇧🇪 Ghent 🎈 @Balloon_inc / @Aicon_inc 👨‍💻 Coding 🧪 Data science 📈 Graph enthusiast 👨‍💻 Principal Engineer @showpad