How AI Sees The World: A Glimpse into Computer Vision and Its Revolutionary Applications

8 min readOct 23, 2023

Making Computers See and Understand Like Us

Have you ever wondered how computers suddenly got this 'see-all-know-all' vibe? I did, and I'm here to explain it to you all! Before I bombard your mind with the complexity of computer vision, let's give some credit to us humans. We can spot a friend in a massive crowd or a favourite book on a cluttered shelf. Our minds are truly incredible at what we can accomplish through our eyes. Imagine if machines could do everything we can do. Spoiler: They can, and they're improving with innovation. Consider this: Every advancement in machine perception trickles down into applications you use daily, from your smartphone's camera to augmented reality games. As machines learn to see, your virtual and real worlds merge in ways previously confined to science fiction.

What is Computer Vision?

Computer vision seeks to recreate our own eyes in machines. But when combined with artificial intelligence's raw computational strength and pattern recognition abilities, it transforms into a supercharged version of 'sight.' I have used computer vision and AI to make a presentation clicker using only my hands and nothing else. Imagine the multitude of physical tasks in your life being simplified or enhanced through gestures alone, from controlling your home's lights to navigating a virtual environment.

Example of Computer Vision and AI working together

From Neurons to Neural Networks

Our human visual process is spectacular. Light enters our eyes, processed through intricate neural pathways and BOOM. We recognize our surroundings. For machines, the process is different; the task starts with pixels from an image. Enter neural networks, specifically Convolutional Neural Networks (CNNs), which have taken computer vision tasks by storm. If we need to process videos or live camera feeds, we process each captured video frame through a model. This uses much computing power, like the hardware in self-driving cars. As we break down this process, you'll begin to appreciate the complex intelligence behind the convenience of innovations soon to surround you. Whether shopping with augmented reality or hopping into an autonomous taxi, this neural wizardry will be at its core.

Understanding the Core Components of Convolutional Neural Networks (CNNs)

1. Convolutional Layers: Remember the last time you put on those fancy glasses, and the world seemed sharper? That's what convolutional layers are for computers. Convolutional layers utilize filters to extract distinct features from each pixel in the image. Moving across the picture, these layers can identify edges, textures, and other patterns.

2. Pooling Layers: These layers play a crucial role in downsizing the spatial dimensions of the processed image. Picture it as the study notes for your upcoming exam. You don't want to waste your energy reading the textbook to study, so you create study notes to make it more efficient. Well, that's exactly what computers do. By doing so, they retain only the most salient features, making the network more efficient and reducing the computational burden.

3. Fully Connected Layers: As the name suggests, in these layers, every neuron is connected to every other neuron in the subsequent layer. Their primary role is to interpret the features extracted by the convolutional and pooling layers and make informed decisions or classifications based on them.

How Neural Networks Learn

Remember how we study for exams and learn from our mistakes? Neural networks have a similar problem! They continually update their 'knowledge' by tweaking tiny settings called weights, trying desperately to match their output with the correct answer. Using the dynamic duo of backpropagation and gradient descent, they slowly become masters of tasks like spotting a cat in a picture or dreaming up an entirely new image. It's like their personal growth journey, minus the teenage angst. As we grow and learn from our experiences, these networks evolve, driving technological growth. This iterative learning process shapes how we receive recommendations on streaming platforms, get personalized advertisements, and even receive healthcare advice.

The Touchless Slide Switcher

While doing my first presentation at TKS (The Knowledge Society), I had to present a slide show and use one of those clickers. The clicker didn't make the display look natural or sometimes didn't work. So, I decided to build an application using computer vision and AI to click the slides for me.

Gesture Recognition: Using Mediapipe, a light hand gesture detection system that puts marks on your hand using AI. Then, using nerdy math, I calculate the relative distances between fingers and the angle of the hand.
You can select features such as going back and forth through a presentation with your hands or just forward. You are also able to choose what gesture you can use.

As workplaces and schools become more tech-integrated, imagine a world where touchless interfaces become the norm, enhancing cleanliness, efficiency, and engagement. Your presentations, lessons, or conferences could soon have a touch of magic.

Demo:

What does AI and computer vision do today?

DeepFakes

What It Does: DeepFakes refers to using artificial intelligence to create hyper-realistic but entirely fake content. Initially, this technology was mainly applied to video, especially manipulating video footage of people to make them say or do things they never did. However, the technology has since expanded to audio, images, and more. Concerns arise when this technology is used maliciously, leading to misinformation, fraud, or personal attacks.
Most Prominent Players: DeepDream by Google and FaceApp.
Who to Follow: Alec Radford (OpenAI), Ilya Sutskever (OpenAI), and the 'Deepfake Detection Challenge' led by Facebook for insights into the ongoing efforts to identify and combat deepfakes.
Why Should You Care? In today's digital age, distinguishing between reality and fabrication is more crucial than ever. Imagine seeing a video of a close family member saying something they never uttered or an image of a friend in a place they've never been. Not only can DeepFakes have personal implications, but they also affect public trust and the integrity of information online. Being informed means you can discern, question, and protect yourself from misinformation.

2. Medical Imagery

What It Does: Medical imaging uses artificial intelligence to analyze medical images to discover diseases or other medical problems. By drawing attention from professionals to areas of concern to explore, the device assists in early detection and diagnosis.
Most prominent Players: Aidoc and Google's DeepMind Health.
Who to Follow: Regina Barzilay (MIT professor specializing in AI and its applications in oncology), Pranav Rajpurkar (Stanford ML Group and known for his work on medical imaging datasets).
Why Should You Care? Think about your last medical check-up. Now imagine a scenario where subtle changes, invisible to the human eye, could be detected years before they develop into a severe condition. With the help of AI in medical imagery, early detection can become a reality, offering you, your family, and friends a better chance at early and more effective treatments. It's about adding more healthy years to your life and the lives of your loved ones.

3. Self-driving Cars

What It Does: Artificial intelligence drives autonomous or self-driving cars' decision-making, enabling them to navigate, respond to their surroundings, and travel safely without human oversight. Making split-second decisions to avoid collisions and navigate the highways requires deciphering data from sensors, cameras, and radars.
Most prominent Players: Waymo (a subsidiary of Alphabet, Google's parent company), Tesla's Autopilot, and NVIDIA's Drive platform.
Who to Follow: Andrej Karpathy (Former Director of AI at Tesla), Sebastian Thrun (Founder of Google's self-driving car team), and Mobileye (an Intel company) for the latest in assisted driving technology.
Why Should You Care? Have you ever thought of the time you could save during commutes, the books you could read, or the extra sleep you could catch up on if you didn't have to focus on driving? Beyond convenience, consider when you felt too tired to go but had no choice. Self-driving cars have the potential to reduce road accidents, making travel safer for you and your family. Imagine a world where your elderly parents or young children can be more independent, not constrained by their ability (or inability) to drive.

Where are we headed?

Looking ahead, AI won't just be content watching. Oh no! It aims to predict. Imagine systems that, based on body language, might alert you about potential crimes being committed. It's like having your personal psychic, but in binary! But just like any technological advancement, we must be responsible for creating and using it. JUs

Computer vision combined with AI allows machines to "see" and interpret their surroundings like humans. This guide dives into Convolutional Neural Networks (CNNs) components and their applications, from detecting hand gestures for a touchless presentation clicker to medical imaging and predicting potential crimes based on body language. As AI and computer vision evolve, the applications stretch far and wide, potentially influencing everything from entertainment choices to safety precautions. Staying informed ensures you're prepared and proactive in this changing landscape.

Citations:

Mandal, Manav. “Introduction to Convolutional Neural Networks (CNN).” Analytics Vidhya, 30 June 2023, www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks-cnn/.

Page, Aubrey. “What Marketers Need to Know about Video Deepfakes .” Vimeo Blog, 19 June 2023, www.vimeo.com/blog/post/video-deepfakes/.

Faggella, Daniel. “The Future of Medical Machine Vision — Possibilities for Diagnostics and More.” Emerj Artificial Intelligence Research, Emerj, 10 Feb. 2019, emerj.com/ai-podcast-interviews/the-future-of-medical-machine-vision-possibilities-for-diagnostics-and-more/.

Guizzo, Erico. “How Google’s Self-Driving Car Works.” IEEE Spectrum, IEEE Spectrum, 18 Aug. 2022, spectrum.ieee.org/how-google-self-driving-car-works.

“What Is Computer Vision?” IBM, www.ibm.com/topics/computer-vision. Accessed 14 Oct. 2023.

Deep Residual Learning for Image Recognition — Arxiv.Org, arxiv.org/pdf/1512.03385.pdf. Accessed 19 Oct. 2023.

End to End Learning for Self-Driving Cars — Arxiv.Org, arxiv.org/pdf/1604.07316v1.pdf. Accessed 19 Oct. 2023.

Hey there! I'm Adrian. When I'm not busy convincing my friends that AI won't take over the world (or will it?), I'm deep-diving into the mesmerizing world of computer vision. I mean, who wouldn't want computers to recognize a cat from a croissant, right? Some call it a passion, others an obsession, but it's just Tuesday for me. Are you curious about my other 'weird' tech hobbies, or do you just want a virtual coffee chat? Hit me up on LinkedIn! If you're into AI or robots, check out my website.