Creating a Gesture-Controlled Slide Show

Adrian Krywiak
3 min readNov 5, 2023

--

Let's be honest; everyone loves a little twist, especially when it's you nailing that killer presentation. Why should you settle for pressing buttons to switch your slide on your slide show when you can rule the room with a gesture you make with your hand? That was the motive behind my gesture-controlled presentation system, designed to sprinkle a little Silicon Valley magic. Well, I created and here is a quick demo:

Demo

It all ties back to Python, the coder's language of choice for coding Machine learning projects, and with OpenCV, detecting movements and differentiating between 'next slide, please' and 'just scratching my nose' type movements. The math behind that is pretty wicked. The gesture tracking? That's the role of MediaPipe, an open-source library that essentially maps your hand, which gets fed into your computer.

Mediapipe Example

The System's Inner Workings

On the surface, the concept is relatively simple: using a webcam, the system tracks your hand movements, which interns control the presentation slides. But the simplicity of the user experience belies the complex math and programming running silently backstage.

The most immediate challenge I faced was the relative scale of the hand. It changes with distance from the camera — a gesture close to the lens appears larger than one further away. To normalize the units, I calculated the hand itself, using the distance between the pinky and thumb as a reference point. This technique relies on the Euclidean Distance formula, calculating the straight line distance between two points in space.

Euclidean Distance Formula

Adjusting for scale was just one part of the process; figuring out what each hand movement meant was the real trick. To do this, I needed to look at angles and positioning. I ensured the system could tell different hand positions apart by measuring how the fingers were set up. For instance, the system knew to flip to the next slide when it saw a sure two fingers close together at the right angle.

The system also had to figure out how to understand these gestures in a space where depth matters. It did this by looking at how fast the hand was moving. Quick movements were taken as signals to go to the next slide, while slower ones didn’t cause any change to avoid any accidental slide jumps.

To tell the left hand from the right, I kept it simple. If the system could see your face, it checked which thumb was closer to the screen’s edge. No look meant the presenter was turned away, which would flip the hands around.

By integrating these principles, the system became an excellent tool for providing a seamless and intuitive experience for the user, allowing for a more dynamic and engaging presentation style. It was also crucial that the system could run in real-time with minimal latency. Optimization became a mantra, with every millisecond shaved off the recognition time, translating to a smoother user experience. This required meticulous code management and leveraging the most efficient algorithms available.

Hey there! I’m Adrian. When I’m not busy convincing my friends that AI won’t take over the world (or will it?), I’m deep-diving into the mesmerizing world of computer vision. I mean, who wouldn’t want computers to recognize a cat from a croissant, right? Some call it a passion, others an obsession, but it’s just Tuesday for me. Are you curious about my other ‘weird’ tech hobbies, or do you want a virtual coffee chat? Hit me up on LinkedIn! If you’re into AI or robots, check out my website.

--

--