Augmented Startups
Published in

Augmented Startups

OpenCV AI Kit — An Introduction to OAK-1 and OAK-D


Okay so I just got my OpenCV AI Kit and I am really excited to play with. But even though Iv seen a bunch of features on Kickstarter, I don’t quite understand the true capabilities of this device and why I should even care. Well in this video we going take a deep dive into this device and find out exactly what its capable off, and at the end of the video I will share my thoughts on if this device is worth all the hype as well how it matches up to the competition.

What is the OpenCV AI Kit

So before we get into that, lets get into the first question of What exactly is the OpenCV AI Kit or OAK. According to the creators, it is a tiny, powerful, open source Spatial AI system. So you can think of it as if an embedded 4k camera and neural compute stick had a baby, make that two babies OAK-1 and OAK-D

Difference between the OAK 1 — OAK D

Now What sets these two devices apart is that the OAK-1 has automatic motion-based lossless digital zooming which means that the sensor has a higher resolution that the final display resolution of the image whereas the OAK-D on the other hand has Stereo Depth cameras which allows for 3D object Localizations and object tracking in 3D space which is really cool. In particularly Spatial AI

Spatial AI, Ritz What is that?

Well my friend, Spatial AI is the capability of an AI system to reason based on not just what it is looking at but also how far things are located. So OpenCV AI Kits specifically the Depth (OAK-D) allows for real time Spatial AI utilizing its RGB camera for deep neural inference and a stereo camera for depth estimation.


Speaking of Capabilities, let see what this device is capable of. First up in terms of Device Specifications for the camera

Both OAK devices have:

  • The IMX378for the image sensor by Sony which allows a max frame rate of
  • 60FPS with a resolution of
  • 12MP which is slightly higher than 4K.
  • It also has a display Field of View or DFOV of 81 degrees, and
  • Autofocus

On the OAK-D, however we have:

  • Additional Stereo Cameras, With Synchronized global shutter, so that they capture the image at exactly the same time
  • The image sensor is bit different using an Omnivision OV9282
  • which has a lower resolution of 1280x800
  • which can run at over 120FPS. It has the same FOV has the main camera
  • but with an F-number of 2.2.

Myriad X Specs

Now having these high-end cameras are great but whats the use if you can’t utilize its full power. This is where the brains of the kits come into play. They are using the Myriad X Visual Processing Unit or VPU for processing the visual information from the cameras. If you ever used the Neural Compute Stick from Intel, you should be familiar with the power that this AI chip provides.

  • So all OAK modules with the Myriad X allows for a
  • Computer capacity of 4 Trillion Ops/Sec — Comparing this to the
  • Jetson Xavier NX which has 21 TOPS and the
  • Google Edge TPU has 4TOPS
  • It has 16 High performance Shave Cores — Shave stands for Streaming Hybrid Architecture Vector Engine which is an architecture was designed primarily for the acceleration of machine vision processing
  • 20+ Vision Accelerators
  • And 450GB/sec of memory Bandwidth.


Awesome okay I got this device now what can use it with?

  • Okay Windows — Check
  • Ubuntu — check
  • Mac, uuuhh I don’t have one, too expensive for me but from sources — Check
  • Raspberry, ROS2 and Jetsons — Check

Hardware Layout

Looking at the Physical Hardware, At this time I could only get my hands on the OAK-1 due to popular demand of the OAK-D. But essentially the device is quite small with the size around 65x36mm.

First things first, This thing propping out here is the 12MP RGB camera sensor that we spoke about. It seems bulging cameras are the in thing these days

Over here is the USB C type connector which allows for power and transmission of data. Sometimes you’ll also want to hit reset, So there’s an app, I mean there’s a button for that

Underneath the heat sink, you will find the peripherals such UART, SPI, I2C and Several GPIO Pins


Okay so we had a look at the hardware lets look at its capabilities

To get the best idea of high level features of the kits, Lets quickly browse through the Kickstarter page

  • Okay so they show the OAK Kits here. Okay Detect and track anything. Nice to have when you are playing hide and seek at 30FPS.
  • Scrolling down. You can also stream your child crying in Realtime 4k in the H.265 codec. Lets click play. Oh sorry my bad, you can stream your child running bubbles in 4k 30FPS
  • Next combine Live Depth and AI. Hmmm looks like Brandon is experiencing some rapid temperature changes there, Better get that check out.
  • Just kidding, the color represents the depth data from the camera
  • The last one we have here is easily train your own neural networks, that’s really nice. From the demo I can train Skynet it pick strawberries for me :P. I really like strawberries.

Programming Languages

Okay you got this kit, its plugged in and ready to code, now what language do you use. So it comes natively with Python examples so that’s really great. But C++ is also support as the API was written in C++ with PyBind11 for python bindings.

There is also support for MicroPython particularly on the Myriad X in the Pipeline Builder which we’ll discuss in a moment.

Out of the Box Examples

Now this is where I was most impressed. So normally some manufactures give you a few ready-made examples and then let you venture off into unknown territory to develop other common apps. On the OpenCV Kits. You get a lot right out of the box. Let’s take a look.

  • So we have object detection which you can use for detecting fruit. They also have an application for mask detection, we covered this in my YOLOv4 course. Speaking off YOLOv4, Brandon mentioned that this kit will soon be able to implement tiny YOLOv4. So can’t wait for that.
  • Moving on they have face detection, hey look its Brandon and Satya. Wonder what Brandon saw that made him so surprised.
  • Vehicle detection and number plate detection and OCR… wow I really this.
  • Pedestrian detection with reidentification, nice
  • Pose estimation with 3d Location. This would be really great if we could integrate it into unity. You know for avatar overlaying.
  • Text Detection with OCR for when you rather read wrestling comic books, rather than watching WWE.
  • Lastly semantic segmentation but it depth assisted. Cool

I must say Im really impressed with comes right out of the box. What more do you need. I mean of course it is also Fully OpenVINO compatible should you wish to go deeper with the tools.

Pipeline Builder

When I spoke to Brandon during my interview, he mentioned that the Kits had a pipeline builder which would allow you to drag an drop blocks which you can generate as a script that would run your image processing pipeline. Now instead of developing this builder from scratch, they leveraged Pyflow which is a general purpose visual scripting framework for Python.

So for example If I want to build a face detection app, I can just drop a face detection block and then drag any other transformation and parameters for customization. I think this will be quite useful for rapid prototyping.

Stretch Goals

Lastly lets look at all the unlocked goals.

  • So, they $250k mark the OAK-D kits will get IMU. This is really nice, I mean IMU’s can be used to assist with image stabilization and motion-based deblurring.
  • They also plan to have power over ethernet variant mean that you can run a long cable to connect to your OAKs. By long I mean the length of a football field. Whereas USB is limited to just 5 meters
  • The 1Miliion mark, the kits will have an aluminum or aluminum/ case. I think I will just 3d print mine
  • There’s options of Wi-Fi and Bluetooth versions. Now I wonder if they will send the video over Wi-Fi or just use to transfer information like for IoT applications.
  • And the final milestone, Is the FREE model suite. Wait, What do they mean high quality. Do they mean models with high accuracy and frame rate? Or state of the art models?


Awesome so we covered a lot of features, and my opinion of this device is that its has a lot, I mean a lot of features and out of the box items, that I would classify as this hardware one of the best options for Embedded Computer vision AI. Not only for its capabilities but also for its form factor and flexibility. Its easy to use and get started which we shall see in the upcoming tutorials but because of its partnership with OpenCV, I can only imagine disruptive this device will be to the market.

If you are interested in Enrolling in my best-selling courses on YOLOR and YOLOX then sign up over here when it gets released — Click Here



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ritesh Kanjee

Ritesh Kanjee

CEO Augmented Startups — M(Eng) Electronic Engineer, YouTuber 100'000+ Subscribers.