How I built my first game filter…

4 min readMay 23, 2022

While I play Instagram game filters and reach a point of diminishing returns, a thought popped up about creating my own small game filter using the computer vision knowledge that I have. In this article, I would like to share how I created my own game filter, the things I learned while building this and I would also like to share the journey, of how I came to the point of exploring and implementing it.

Used MediaPipe hand recognition for his game filter. Here we can clean the screen and get a clear image

Here is the end product of the filter that I created. It basically is cleaning the dirt on the glass screen. For this I have used MediaPipe, an open-source cross-platform to process perpetual data of different modalities, such as audio and video. [1]

MediaPipe hand perception

This approach provides hand and finger tracking to infer 21 key points in 3 dimensions. The method provides high-fidelity, real-time hand, and finger tracking. This solution uses the ML pipeline consisting of the following models working in parallel [1]:

Real-time, high-fidelity hand, and finger tracking [2]

Palm detector model: input for this model is a full image that returns an oriented hand bounding box. A single-shot detector model called BlazePalm is used for this implementation. Here the model is able to detect hands of various sizes which could also be partially occluded. Due to the high dexterity of the hand, unlike the face which has patterns of eye and mouth, etc., this task becomes very challenging. The average precision of this model is 95.7% in palm detection.
Hand landmark model: this model operates on the cropped output that we get from the palm detector model. It then returns high fidelity 3D hand keypoints. For training this model, ~30K real-world images with 21 3D coordinates were manually annotated.

21 key points recognized by the hand landmark model for each hand [2]

3. Gesture recognizer: classifies the previously computed keypoint configuration into a discrete set of gestures. This classification model will not be used for this use case.

High-level block diagram of the hand perception pipeline [1]

I will add more intrinsic details of the model's end-to-end implementation and a fun project in another detailed post :).

Now that we have a high-level idea of the model, let us get to understand the implementation of this game filter.

Implementation

For this implementation, I used MediaPipe and OpenCV packages. Please install these packages using the following commands. I used python for this game filter. Consider these versions for the ease of code debugging.

$ pip install mediapipe==0.8.9.1
$ pip install opencv-python==4.5.5.64

In the code we first import these packages using:

#basic imports
import cv2
import mediapipe as mp

We will now define a class that instantiates the basic parameters required for it to detect a hand and its corresponding key points [3].

class handTracker():      def __init__(self, mode=False, maxHands=2, detectionCon=0.6, modelComplexity=1,trackCon=0.5):           self.mode = mode
           self.maxHands = maxHands
           self.detectionCon = detectionCon
           self.modelComplex = modelComplexity
           self.trackCon = trackCon
           self.mpHands = mp.solutions.hands
           self.hands = self.mpHands.Hands(self.mode, self.maxHands,
 self.detectionCon, self.trackCon)
           self.mpDraw = mp.solutions.drawing_utils
     def handsFinder(self,image,draw=True):
           imageRGB = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
           self.results = self.hands.process(imageRGB)
           if self.results.multi_hand_landmarks:
                for handLms in self.results.multi_hand_landmarks:
                     if draw:
                         self.mpDraw.draw_landmarks(image, handLms, self.mpHands.HAND_CONNECTIONS)
           return image
     def positionFinder(self,image, handNo=0):
           positionlist = []
           if self.results.multi_hand_landmarks:
                Hand = self.results.multi_hand_landmarks[handNo]
            
                for id, lm in enumerate(Hand.landmark):
                    h,w,c = image.shape
                    cx,cy = int(lm.x*w), int(lm.y*h)
                    positionlist.append([id,cx,cy])
           return positionlist

The handsFinder function will track the hands in input frames. If draw is True, the hand landmarks will also be plotted on the image. The positionFinder function finds the x and y coordinates of all 21 key points of the hand which I will be using in this project.

Now let us create the main function which uses OpenCV to capture the video, after which we have a mask that contains the dirt on top of our original frame image. Once we read the camera video frame by frame we find the hands using the handTracker function and get the position coordinates of the hand landmarks. Once we get the position of the landmarks, choose 2 diagonal key points from the hand such that it forms a rectangle analogous to a cloth in our hand which cleans the dirt on the image mask. Here we set the values in that rectangle as (255, 255, 255) to mimic the behavior of cleaning the image. Now, I used OpenCV weighed image (cv2.addWeighted) to create the output masked image (i.e mask weighted on the image with alpha=0.5) for each input frame.

def main():
    cap = cv2.VideoCapture(0)
    tracker = handTracker()
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 960)
    mask = cv2.imread("image2.jpg", cv2.IMREAD_UNCHANGED)
    mask= cv2.resize(mask,(1280,720), interpolation = cv2.INTER_AREA)    while True:
        success,image = cap.read()
        
        if success:
            image = tracker.handsFinder(image)
            positionList= tracker.positionFinder(image)
            
            if len(positionList) !=0:
                mask = cv2.rectangle(mask, (positionList[8][1], positionList[8][2]), (positionList[0][1], positionList[0][2]), (255,255,255), -1)
                
            masked_image=cv2.addWeighted(image, .5 , mask, .5, 0)
            cv2.imshow("Video",cv2.flip(masked_image,1))
            cv2.waitKey(1)

Now, just call the main function :)

if __name__ == "__main__":
    main()

While I was implementing this, I tried to use circles (instead of rectangles) in an inefficient manner without image masks which lowered the performance. I also tried using lines which resulted in some artifacts. Hence I tried using masks along with the OpenCV rectangle.

I think this implementation is easy for any beginner who is trying to get started with these platforms. Once we get hold of this we can deep dive into something a little more challenging. These packages have a lot of scope for very interesting projects. Do you have any nice ideas in mind? We can work on it :)

How I built my first game filter…

MediaPipe hand perception

Implementation

Written by Nikita