Gesture Control of your FireTV with Python

Published in

Analytics Vidhya

6 min readAug 13, 2021

This post is a continuation to my article on python control of the FireTV. Using gesture control is an interesting way to activate the fire-sticks buttons in a remote free way! So in this article I will cover how to setup a basic hand detector using OpenCV and mediapipe, and then how to incorporate it into the fire-stick controller class that was shown in the first article.

So to get started, first of all we want to get a basic hand detector initialised in a new class. To do this we need to import mediapipe, OpenCV and math libraries.

import mediapipe as mp
import cv2
import math

We need to define a hand detection object that we can call, and initialise the mediapipe parameters for hand detection in an image/video stream. The hand detection in mediapipe requires a few parameters, and the ones we will focus on are: mode, maximum hands to find, detection confidence and tracking confidence. So the starting code will look something like:

class handDetector():
   def __init__(self,mode=False,maxHands=2,
                detectCon=0.8,trackCon=0.5):
      self.mode=mode
      self.maxHands=maxHands
      self.detectCon=detectCon
      self.trackCon=trackCon      self.mpHands = mp.solutions.hands
      self.hands = self.mpHands.Hands(self.mode,self.maxHands,
                                      self.detectCon,self.trackCon)
      self.mpDraw = mp.solutions.drawing_utils  
      (Draws mediapipes hand landmarks on the image)

Next we need to add a method that will take in the image to process and detect the hands. To do this is very simple, but there are a couple of key points to remember. Mediapipe needs the image colour data to be in RGB format, whereas the input from OpenCV camera is BGR format, so the first part of this new method we will call ‘handProcessor’ will start like this:

def handProcessor(self,img):
   imgRGB = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)

So now we have the RGB version of our image. The next point is to use mediapipes processing function and draw the scanned landmarks of the hands onto our image, which means we will add the following lines to our method:

self.results = self.results = self.hands.process(imgRGB)if self.results.multi_hand_landmarks:
   for self.landmarks in self.results.multi_hand_landmarks:
      self.mpDraw.draw_landmarks(img,self.landmarks,
                                 self.mpHands.HAND_CONNECTIONS)
return img

At the moment the landmarks we have found in the image can’t be printed, as they are objects of their own, so for if we want to get numerical coordinate values, these objects need to be unpacked. So to do this we will add another method call ‘findCoords’ which will return a list of coordinates for the landmarks, and coordinates for a bounding box around the hand which may come in handy later. This method in full looks like:

def findCoords(self,img,handNo):
   self.lMarkList = []
   xList = []
   yList = []
   bbox = []   if self.results.multi_hand_landmarks:
      myHand = self.results.multi_hand_landmarks[handNo]
      for ID,LM in enumerate(myHand.landmark):
         h,w,c = img.shape
         cx,cy = int(LM.x*w),int(LM.y*h)
         self.lMarkList.append([ID,cx,cy])
         xList.append(cx)
         yList.append(cy)
         cv2.circle(img,(cx,cy),0,(255,0,255),cv2.FILLED)
      
      xmin,xmax = min(xList),max(xList)
      ymin,ymax = min(yList),max(yList)
      bbox = xmin,ymin,xmax,ymax
      cv2.rectangle(img,(bbox[0],bbox[1]),(bbox[2],bbox[3]),
                    (0,255,0),2)
   return self.lMarkList,bbox

When this method is called, the outputs are a list of x and y coordinates of the hand landmarks, and then the coordinates for the corners of a bounding box around the hand. The bounding box is useful for scaling coordinates to normalize distances between points, particularly when the hand distance from the camera varies. This is not something that I’m going to cover in this post though.

So the last useful method that we will add to our hand detector class is a digitStatus method. All this will do is return a list of zeros or ones for if the finger/thumb is up or down respectively, and this is very easy. The main thing to note is that the landmark IDs for the tips of each finger are [4,8,12,16,20].

To determine if the finger is up or down, we will take the position of the tip of each finger, and if the y coordinate of the tip is greater than the coordinate of the associated knuckle landmark, then we will state that the finger is down. We are checking if the y coordinate is greater because the top of the image is y=0, and the bottom is the higher y coordinate (max size of image). The thumb is done using a different landmark because of its angle, but is done almost identically, so have a look at the method here:

def digitStatus(self):
   fingers = []
   self.fingerTips = [4,8,12,16,20]   #thumb
   if self.lMarkList[self.fingerTips[0]][2]< self.lMarkList[self.fingerTips[0]-1][2]:
      fingers.append(1)
   else:
      fingers.append(0)
   #fingers
   for ID in range(1,5):
      if self.lMarkList[self.fingerTips[ID]][2] < self.lMarkList[self.fingerTips[ID]-2][2]:
         fingers.append(1)
      else:
         fingers.append(0)
   return fingers

So that finalises our hand detection class! The next part is to open up a video stream from the webcam and apply our hand detector, and link it with the fire-stick controller we made in the last post.

If you have written the code for the fire-stick controller class, you can paste it into this document, or import the file that has it, whatever works best for you. We want to write our ‘main’ function that holds the main processes now, so to start that we need to open our webcam feed and make an instance of our hand detector and fire-stick controller.

def main():
   cap = cv2.VideoCapture(0)
   detector = handDetector()
   fireStick = FireStickController()
   fireStick.addDevice(fireStickIP)  #Define your fire-stick IP here

Next we need a continuous loop that is checking the webcam feed and processing the image. So we can do this with a while loop, either using ‘while True’ or ‘while cap.isOpened()’. The bare essentials that need to go in this loop are:

while True:
   success,img = cap.read()
   img = cv2.flip(img,flipCode=1) #this just flips the video in the y-axis and is not necessary if you dont want to   img = detector.handProcessor(img)
   lMarkList,bbox = detector.findCoords(img)

That is all it takes to get a continuous feed, and constant hand processing of the feed. To be able to view the feed all you need to add is this to the end of your while loop:

cv2.imshow('Feed',img)
cv2.waitKey(1)

And then just outside the while loop add:

cap.release()
cv2.destroyAllWindows()

So this is the part where you can add all the fire-stick control with gestures. In the middle of your while loop, you will need an IF statement that checks if the the landmark list is empty or not. In the event it is NOT empty, then you know its detecting the hand, so will look like this:

while True:
   success,img = cap.read()
   img = cv2.flip(img,flipCode=1) #this just flips the video in the y-axis and is not necessary if you dont want toimg = detector.handProcessor(img)
   lMarkList,bbox = detector.findCoords(img)   if len(lMarkList)!=0:
      
      ###CONTROLLING CODE###   cv2.imshow('Feed',img)
   cv2.waitKey(1)
cv2.release()
cv2.destroyAllWindows()

For our fire-stick controller code, it’s completely open to you, what gesture do you want to control which remote button. I will cover a very simple example and let you experiment with the rest!

So I want to pause/play whatever I’m watching when I tap my index finger on my palm, and all this takes is two lines of code. All we need to add to the controlling code section is this:

if detector.digitStatus()==[1,0,1,1,1]:
   fireStick.playPause()

Where the playPause method is one that I added to the fire-stick controller class we made in the last article. All this IF statement does is check if only the index finger is down, then send the play/pause command to the fire-stick over the adb server.

The final step is just to add the commonly used bit of code right at the bottom:

if __name__=='__main__':
   main()

And that’s it! You can add all kinds of different gestures to control the different functions, it’s totally open to your imagination. I hope you can have some fun with this code and change it to suit your needs or just tinker for the sake of it!

Here is a link to the previous article on the basic control of the fire-stick with python in case you missed it: https://medium.com/@tomaclarke16/controlling-you-fire-tv-with-python-d5e102669066

Gesture Control of your FireTV with Python

Written by Tom Clarke