OpenCV AI Kit — Interview with CEO Brandon Gilles

Ritesh Kanjee
Augmented Startups
Published in
18 min readAug 6, 2020

Im not sure if you have heard about the OpenCV AI Kit or OAK devices that have taken Kickstarter by storm! We’ll if not these kits have achieved over $700'000 on Kickstarter which is phenomenal considering their original $20'000 goal.

To learn why these kits are so amazing, I got onto an interview the the CEO of Luxonis Brandon Gilles.

Interview with Brandon Gilles — CEO of Luxonis and Creator of the OpenCV AI Kit

Ritesh — Hi to Brandon and welcome to the Augmented Startups Virtual Studio!

Brandon — My pleasure to be here.

Ritesh — Q1. So Brandon, tell us a bit about yourself, what do you do?

Brandon — I’m the founder and CEO of Luxonis and the Chief Architect of the OpenCV AI Kit.

Ritesh — Q2. How did you get started in this space, after you’ve graduated from University, what was your journey like up until this point?

Brandon — A mentor of mine quit the company for which we were both working. We both loved the company and the business was (and is still) on a huge upward trajectory when he quit. So in quizzing him on what the hell he was doing, he mentioned that ‘artificial intelligence is the biggest opportunity of my life’.

This was a shock to me.

I hadn’t been following AI at all… the last I had really thought about it was a roommate in college (programming in LSIP, in 2004) talking about how useless it is. Fast forward to 2017, apparently things had changed… but I had no idea.

I started Googling… calling colleagues, etc. And it turned out another one of my mentors (in wireless charging) had even already started his own AI company. I was WAY behind the times.

I spent a ton of time catching back up… mostly reading, writing code, following PyImageSearch tutorials, etc. Discovering that I was 4 years late to the AI party.

A year later, in 2017, after getting back up to speed and investing a year of weekends, with a shaky plan in hand… I too left my job to start a company around embedded computer vision and AI.

The plan was actually to leverage the nascent capabilities in embedded AI-based CV to make an augmented-reality system that laser-tag facilities could install.

In prototyping and building this, tragedy struck around me — a whole slew of friends, family, and contacts were struck by distracted drivers. One was killed, one resulted in a traumatic brain injury, and the least impacted suffered broken backs, hips, and femurs.

So although I was fond of the multiplayer, augmented-reality laser-tag idea (and still am, although The VOID seems to be doing a GREAT job there), I hard-switched to seeing if AI-based embedded computer vision could be used to keep people safe. Like an early warning system to get the car to swerve.

All of the research, prototyping, and experience from this journey up until this point lead me to discover three things:

  1. Spatial AI is insanely powerful; the combination of depth sensing with AI is like a cheat-code for solving many problems.
  2. Although it is easy to prototype spatial AI solutions (depth camera + AI accelerator + other stuff) there was no tractable way to embed this into an actual product.
  3. The Intel Movidius Myriad X had been architected to be an embedded spatial AI solution you can build into an actual product, but there was no hardware, firmware, or software platform which allowed its use in this way… it could only be used as a neural accelerator.

And here we are today — with the OAK platform — which allows anyone to embed spatial AI into their products.

Ritesh — Q3. How big is your team?

Brandon — We’re lean and mean, but backed by a huge open source community (OpenCV, OpenVINO, OpenCL, Python, microPython, etc.) which is what makes all of this so powerful. We’re standing on the shoulders of teams of giants.

Ritesh — Q4. Cool, so tell us a bit about the OpenCV AI Kit, I see on Kickstarter its doing incredibly well. What is this board all about?

Brandon — The OpenCV AI Kit is the culmination of all of this effort. There are so many industries that can benefit from this. Being believers in the power of disruptive technology, and seeing early signs of how widespread this could be useful, we took it upon ourselves to open source as much as we could — to enable engineers to go build their own things with these embedded super powers.

And so that’s OAK… it is the open source ecosystem of hardware, software, and AI training that allows engineers and makers all over the world the capability to embed never-before-possible human-like perceptions into actual products.

Ritesh — Q4.1 What were the pain points you were trying to solve with this kit?

Brandon — Mainly there is no way to embedded spatial AI (I.e. human-like perception) into actual products. This is the first solution that allows this. So for example you can make glasses for visually impaired that real-time convey to the user (through audio) a 3D view of the world. It’s like scifi stuff… and we’re able to be a part of that story, to enable these sorts of applications.

So the pain point was that all these really mature, incredibly useful techniques existed for real-time 3D intelligence, but there was no way to actually leverage these in an end product.

Now you can, anyone can.

Ritesh — Q5. What drove you to follow through with this project?

We were trying to solve an incredibly serious life-safety problem. And what was standing in the way of solving that problem was this platform not existing.

We had a choice to make… give up on this mission, hope that someone eventually builds the platform so we could solve this problem at some unknown later date, or just build the damn thing ourselves. So we did.

Ritesh — Q6. I see it comes in two flavors, the OAK-D and OAK-1 — For those who are watching, please explain the key difference between these two kits.

Brandon — Yes, so OAK-D is really where all the magic happens. It’s what allows all this spatial AI, this human like perception to happen. So what an object is, and where it is in physical space (and other properties about it) in real-time. So for example where are all the strawberries in a field of view, their exact location in mm, and their approximate ripeness (which would allow a machine to automatically sort them while picking them).

OAK-1 is the limited cousin. Not all problems need spatial information… OAK-1 allows embedding such capabilities in devices where spatial data is not needed. A good example of this automatic sports filming. In this case OAK-1 can be used to run a neural model that estimates where the action is in a sport, and automatically losslessly zooms (up to 13x, from 12.3MP to 720p), to film the action and produce HD content of the actual action in the game — just like an amateur camera man would do.
And spatial information is not necessary in this case… although if you used OAK-D you could get statistics like how far a given player ran in miles, etc. during the game.

Ritesh — Q7. Wow! I would like to know about the hardware in the OpenCV AI Kit. What’s powering these units?

Both OAK-1 and OAK-D are based on the Intel Movidius Myriad X. The special sauce that enables the new capabilities brought forth by OAK all lies in the optimizations done at the lowest level on this chip… tons of long hours and tuning high-bandwidth buses, fitting multiple features temporally into the same 128K slice of high-speed cache, etc.

So the OpenCV AI Kit were able to get a ton of functionality out of the Myriad X that is otherwise impossible…

Ritesh — Q8. How do these device compare to something like the Jetson boards, like the Jetson nano, Google Coral. Why would people buy your kits over the already available alternatives? What sets your boards apart?

Brandon — It largely depends on the problem you are trying to solve. So unlike the Jetson Nano, Google Coral, all OAK devices are actually cameras — either a single 12.3MP camera as in OAK-1 or the 3-camera spatial AI solution in OAK-D (12.3MP color, 2x 1MP global shutter grayscale.

So OAK-D is like taking a $200 stereo depth camera, a $200 12MP color camera, and either a Jetson or a Google Coral, and making a baby.

So it’s those 3 things together, in one thing that can be embedded into a product.

Now the interesting thing is that they’re not really competitors. We have customers using OAK-D and OAK-1 as an AI camera on Jetson Nano, Jetson Tx2, Jetson NX, etc. In these cases they are trying to solve complex problems for which OAK-D does the first stages of heavy lifting (3D object localization, feature tracking, optical flow, video encoding, etc.) and the Jetson then does further stages of heavy lifting.

Conversely, there are variants of OAK-1 and OAK-D (also open source) that can be used completely standalone… so where the customer can program microPython on either to interface over SPI/I2C/UART to motor controllers, actuators, etc. Based on onboard computer vision/AI results. And we have open source reference designs OAK with an onboard ESP32 for easy AIoT solutions.

So differences are two-fold:

  1. OAK are tightly-integrated AI camera solutions, both spatial and non-spatial. Not dev. boards.
  2. OAK can be embedded w/out any need for Linux and w/ ~200ms boot time for ultra-low power/long-battery life operation.

Ritesh — Q9. Now with great hardware, comes great firmware and software. What is the API like?

Brandon — So the hardware ecosystem is important, don’t get me wrong — getting the sourcing right, the FCC, CE compliance, getting everything modular so it’s easy to integrate into products, the right architecture for the full application space, etc. That’s all important, and we’re super excited to share this wealth of open-source hardware capabilities to allow fast, and low-risk building of products off of these FCC/CE-qualified modules.

But, probably 95%+ of the work has been in firmware, software, AI training, etc. and tying this all together with the hardware to make a cohesive, 30-second setup developer experience.

So that’s a core value add of the platform.

We’ve made it so an artist can use this to make an interactive sculpture that for example playfully mimics a bystander’s pose, etc.

You can go from zero to running real-time object localization (I.e. 3D object detection) in 30 seconds, on any platform (Linux, MacOS, Windows, etc.). And then you can embed this into a device and have it communicating over SPI with your microcontroller or motor controller moments later. So talking about very little dependencies… this thing will work with OpenVINO neural models, yes talk with an 8-pin microcontroller like an ATTiny8.

And conversely, an expert can make the thing sing and dance, using OpenCL (and Intel’s great work there) to compile arbitrary computer vision algorithms for the SHAVES. Then with our pipeline builder, these can be allocated anywhere in the CV pipeline.

So do you have a year’s worth of work build into some proprietary algorithm? Don’t work, you don’t have to tell us about it, you can use OpenCL to convert it to be assembly-level SHAVE-optimized code that runs in our pipeline builder.

And this gets me to the 3 modalities of OAK:

  1. Very fast, easy, limited flexibility — so some finite number of features (e.g. depth, neural inference, video encoding, feature tracking, optical flow, median filtering, WARP/de-WARP, motion estimation, background subtration). So these are like discrete building blocks we’ve optimized the hell out of. But there are only so many.
  2. Slow, easy, very flexible — microPython code can be run on the Myriad X directly for allowing things like talking whatever protocol you want over OAKs SPI, I2C, UART interfaces — to allow interfacing with some protocolo you may already have on the other side of these interfaces on a microcontroller or some other equipment. You can run your code as a node… so say if you want to also pull in audio data and then run neural inference in it… you can. Implement your own interface code to pull the data from a ADC for example, and then this output can be fed into a neural inference pipeline.
  3. Fast, flexible, but requiring experts. OpenCL can be used to convert any computer vision algorithm you have (within the 512MB RAM limit of OAK) to run hardware accelerated on the SHAVES. So if you’re an expert and you have such an algorithm that either isn’t covered by 1 above, or is say your proprietary faster/better version of what’s in 1, you can convert it, and run this custom code with full hardware acceleration from the SHAVES on OAK.

Ritesh — Q9.1 What software examples comes straight out of the box?

Brandon — It’s a huge list… but here’s a quick from-the-brain rundown:

  • 3D object localization working with object detectors like MobileNet-SSDv1,v2, tiny YOLOv3, YOLOv3, YOLOv4-VPU in progress. So this mode we call ‘monocular neural inference fused with stereo disparity depth’. In short, it runs a standard 2D object detector, and fuses the bounding box results with 3D depth data to get 3D object detector. This is super valuable as it means you don’t need to retrain the networks — you can use your standard 2D-trained networks — and more importantly, you don’t need 3D training data to get 3D results.
  • 3D feature (and object) localization levering stereo neural inference. So this means running the same neural model in parallel on both cameras. This is super useful for tiny objects or for neural networks that return single feature points — like facial landmark estimators, pose estimators, etc. And it’s also useful for objects that semibglobal-matching depth estimation algorithims struggle with — like shiny things, chrome, mirrores, glasses, etc. So stereo AI lets you get 3D locations of things you couldn’t with standard disparity depth and/or SGBM techniques.

So those are the two core Spatial AI modes. Then there’s a pile of other computer vision capabilities that can be tied together in arbitrary series and parallel combinations — only limited by the overall speed of the system and total RAM.

  • h.264 and h.265 encoding. Including live ROI adjustment — so allowing live zoom based on AI results, or motion results, or both. So since the RGB camera is 12MP, this allows 13x zoom with 720p encoded video output, or 6x zoom for 1080p encoded video output.
  • Feature tracking including optical flow and IMU-assist
  • Motion estimation
  • Background subtraction
  • SmartMotion’ — a combination of motion detection and neural inference which leverages the motion estimation hardware blocks to run neural inference only on sections of the camera feed that are moving — greatly increasing object detection distance for fixed installs (such as automatic sports filming).

There’s a bunch more features like that as well… but I can’t remember the rest off the top of my head.

We use PyFlow to allow dragging and dropping these features together. PyFlow outputs a JSON as well… so this allow programmatically defining the pipeline easily, which has interesting use cases I won’t get into here.

So this covers computer vision and AI capabilities (spatial AI modes as well). Now there are a slew of things that can be run with single lines of CLI (or clicking buttons if you so choose):

  • Standard class stuff like person, TV/monitor, desk, sheep, etc.
  • Person detection for ADAS, retail, etc.
  • Various face detection including facial landmark, age estimation, facial expression (happy, sad, etc.)
  • And a whole slew of others like this.

And then we have a slew of models that we’ve release with example programs, which also are just a single CLI command (or button) to get running, including:

  • Strawberries
  • Mask/no-mask/improperly-warn mask
  • Helmet
  • High-vis vest
  • Safety goggles
  • And then a slew like that.

And for example apps, we show how to MJPEG stream, make a social distance monitor, pick strawberries, etc.

Ritesh — Q10. For people just starting out in Computer vision and embedded AI, how is it for someone to get started with your kit. Considering the scenarios of the student, the graduate, or even a hobbyist

Brandon — So our overarching goal is to allow an artist with no experience programming be able to use the platform to do their bidding. Specifically, to allow an artist to make a sculpture that mimics you. And funnily enough, we’ve had exactly that happen. Where an artist took it, looked at the files, and exported what was necessary to control Arduino-based servos.

And then we have had an undergrad student (over the course of a weekend) get from zero to a robot picking strawberries.

So it’s very easy. We believe in what we call ‘discoverability’, and we architect everything around that.

So our definition of that is this: designing something to be ‘discoverable’ means that even if it is crazy-powerful, the power never gets in your way. In never slaps you in the face. You can get up an running with the absolute minimal effort… all configuration parameters are set to intelligent defaults, and it’s one button or one line of code to get the whole thing doing what it is designed to do — perceive the world like a human, and give results real-time.

So getting to that point on OAK is under 30 seconds.
And then this is where the ‘discoverability’ part comes in… the user has now got the full power up and running, it just works, but they don’t know why, and they don’t know what settings they can control.

So now with this fully working system, they can go poke around and see what to change. And if they break something, they can revert back to a known-working state easily.

This is the opposite of many very-powerful systems… where to get the thing running at all, you have to configure all the settings, know all the variables, and figure out what to set them to that’s valid and gets the thing to run.

So our goal — and what we’ve made is the opposite — it just works. And if you don’t want to know why, you don’t have to.

But if you start digging… you’ll discover that there’s a ton of power and flexibility just below the surface that is at your full control.

Ritesh — Q11. I saw that there is a drag and drop visual programming interface called Pipeline builder, can you tell us a bit more about that.

Brandon — Yes. So we started out architecting our own UI in addition to of course all the firmware and software support. Then actually our hardware engineer over a weekend was poking around Github when thinking about what we were making and found PyFlow and PyFlowOpenCV.

It was the exact UI we were rebuilding from scratch (ours was web-based though), and was very intuitive and easily supported everything we needed from the UI.

And best yet, it also supports OpenCV. So the whole idea of the pipeline builder is to be able to flexibly define a pipeline that than runs entirely on OAK unaided.

But we also architected it so that in-between any node in the pipeline, the data can flow to or from the host. So this intended QA and automated test for example of the whole pipeline, or the sections of the pipeline, with scripted/automated testing — with OAK still running it exactly as it would in the field.

So with PyFlow, we realized that this didn’t have to just be an automated test capability… the pipeline could seamlessly integrate with a host at any stage — and still in the same visual pipeline builder.

And the architecture already supports this.

So that’s a cool thing about it… you can bring any stage back to the host and use host-side OpenCV functions in the same graphical pipeline and seamlessly send back results to OAK.

The pipeline builder also supports making custom nodes that execute microPython code directly on OAK. So you can make nodes that interact with the outside world through any of the interfaces that OAK supports (including SPI, I2C, UART, etc.).

An example use case I like to mention here is making a robot that automatically follows you. OAK-D nodes are returning metadata that returns that it’s you (3D object localization ROI fed into person identification), and you can put a microPython node right after this that say sends out the commands to driver your motorcontrollers.

https://augmentedstartups.info/yolov4release
Social Distancing Monitoring App

Ritesh — Q12. Now my past course has been mainly focusing on YOLOv4 over the past few months. Would there be a possibility of porting something like YOLOv4 tiny to this platform any time soon?

Brandon — Yes. So AlexeyAB had actually mentioned making an optimized version specifically for the Myriad X. And definitely before OAK ships we will have YOLOv4 tiny running on the platform. YOLOv3 tiny already works (and we include a custom retrained mask/no-mask/improper-mask YOLOv3 tiny model, and we share the full Colab Notebook used to train this).

We’re doing some more work on pose estimation now for the Spatial AI competition Phase I winners, which paused YOLOv4 tiny work… but we’ll be on it soon and it could be running as soon as a couple weeks.

Ritesh — Q13. What are the future plans for these kits.

Brandon — So we’ve been at this kit since 2018… and we’re already starting on Gen2 OAK. The nice thing for OAK backers is they’ll have an upgrade path where anything they make with OAK now, will just get faster and more capable, and since the design is modular — they can just pop in a Gen2 module and off they go. We’re talking 2021-ish.

Ritesh — Q14. How do you see these kits being used to combat real world issues like covid-19.

Brandon — I swear I didn’t plant this question! Actually we are on a robot (Akara.ai Violet) that automatically cleans using UV-C. It uses OAK-D for spatial perception (I.e. don’t run into things) and more importantly, telling where people are, and disabling the UV-C light because UV-C is terrible for the eyes.

So then the other use-cases is embedded spatial mask detection with social distancing. So counting how many times people who weren’t with each other cross paths where one is not wearing a mask.

Probably the most important use case is for the visually impaired. So visually impaired get yelled at, have thing thrown at them, or worse, end up accidentally too close and catch COVID-19. Many people who are visually impaired or blind simply do not leave their houses now.

Depression of visually impaired and blind is over 3x of the rest of society already.

With OAK-D, a visual assistance device can be made to provide this spatial awareness to help people with visual impairments navigate with confidence and also maintain social distance with confidence.

We had over 60 entries for such a device in our OpenCV Spatial AI Competition sponsored by Intel. And we awarded quite a few. We will be doing a follow-up competition specifically for this sort of device.

Ritesh — Q15. So how do people watching this video get there hands on these kits and when can they expect them delivered and In stores?

Brandon — So the KickStarter campaign is done in the spirit of KickStarter, so to pool orders together to allow a guy in his garage to get the pricing of a corporation who is buying a 31ku order… The trade is even our suppliers don’t hold stock at this level, so the lead times on the components are ~10 weeks. So that couple with manufacturing, test, and distribution, we’re looking at December for the KickStarter backers.

But you can buy it now! So we have full-MSRP units that were produced at lower-volume in stock now. So if you’re wathing this an have a pressing problem, and need to get going now, we can get units now. In fact there are over 15 separate teams building the visual-assistance device I mentioned in parallel already using OAK-D.

And several hundred customers building devices around OAK-1 and OAK-D already.

Ritesh — Q16. I see also you plan to release a crash course — Here at augmented startups we are all about training and making learning easier through courses. Would you be interested in partnering up us here at Augmented Startups for a collaboration on a course?

Brandon — Yes. We’re all about increasing the reach and actionability of machine learning and computer vision systems. So let me get you in touch with Dr. Mallick from OpenCV on this one — I asked him and he’s interested to see what you have in mind. Could be some cool courses here with augmented reality.

Ritesh — Q17. And lastly how do people contact you if they have any questions?

Brandon — So we have a community slack which has me and the other engineers in it, and also a discussion forum.

Ritesh — Awesome! Thank you so much Brandon, it was a pleasure having you on the show and we looking forward to innovating with your OpenCV AI Kits.

Brandon — Thank you for having me on the show.

If you are interested in Enrolling in my upcoming course on YOLOv4 then sign up over here to Enroll Now — Click Here

--

--

Ritesh Kanjee
Augmented Startups

CEO Augmented Startups — M(Eng) Electronic Engineer, YouTuber 100'000+ Subscribers.