20/20 Vision: Dr. Huican Zhu on The Future of Computer Vision

Amino Capital
AMINO insights
Published in
12 min readJul 23, 2020

With recent, rapid advancements in computer vision technologies and the increasing presence of these technologies in everyday life, it’s unsurprising that the computer vision domain is top-of-mind both for investors looking to deploy capital in a growing space as well as for startup founders looking to advance the space through entrepreneurship. The growth of the computer vision space has been of particular interest for Amino Capital: as a data-focus venture firm with many successful portfolio companies in the computer vision space like Orbeus (acquired by Amazon) , Grokstyle (acquired by Facebook), Daedalean.ai, AIFI.io, Voyage.auto, BrainKey and Wyze, Amino seeks to identify new, innovative technologies in the space and help them grow by leveraging the firm’s expertise and resources.

Selected AMINO portfolio companies in computer vision

Additionally, Amino’s team of technologists are doubly interested in the growth of the computer vision space because of their expertise: partner Dr. Huican Zhu, for instance, is a pioneer of computer vision as the inventor of Google Image Search. Amino Fellow Patricia Tang recently sat down for a conversation with Zhu to learn more about his thoughts on the growth of computer vision over the years, emerging applications in the space, as well as on his advice for potential startup founders in computer vision.

This interview has been lightly edited for clarity.

You invented Google Image Search in 2000. Could you talk about the gap you saw in Google’s functionality at the time, how Google Image Search dealt with that, as well as about the evolution of Google Search over time?

When I started Google Image Search and when we launched the product, the product was very straightforward compared to what you see today. Back then, Google had only web search, which was text-based. You could only search for HTML pages as well as for some text. There was no image search, so I made Image Search.

But, our image search engine at the time was also text-only in the sense that you could only type in something, like “cat” or some celebrity’s name, and you would get a list of images related to your queries. You’d get images of cats back, you’d get pictures of Jennifer Lopez and other pictures back. So, the image that returns from search comes to you because, somehow, the images are on webpages that happened to mention the keywords that you searched. So, the first iteration of Google Image Search was keyword-based.

Today, Google Image Search definitely evolved a lot especially with advancements in AI, deep learning, and neural networks. Google Image Search today can do so many things: for example, similar image search and reverse image search. It’s not completely related to Google Image Search, but Google Goggles can even do OCR (Optical Character Recognition) for you. If you give it an image, it can get all the text for you in that image. For example, if you take a picture of a restaurant, it can give you the name of the restaurant and the opening time.

It can do all kinds of other image classification. It can recognize objects for you: if you give it an image it can tell you whether it is a dog, a cat, or a flower. Sometimes it can even describe images: for instance, it can process a picture and be able to say, “This is a kid flying a kite at the beach.” So, overall Google Image Search is doing much more complicated stuff.

As an early adopter of computer vision, how have you seen the computer vision domain change in the past 20+ years? What have been some notable applications of computer vision technology?

When you say computer vision, I think you mean two things: the first is image classification, and the second is object recognition. For Image classification, basically you give the computer an image and it’ll tell you whether the image is a flower, an animal, or something else. Or, you give the computer an image of a person and it’ll tell you which celebrity it is. So, the technology categorizes the images. Object detection technologies in self-driving cars, for example, need to detect whether there are pedestrians nearby, what’s the color of the traffic light ahead, things like that. That’s object detection: you can detect what’s in the image, especially in the typical case in which there are multiple objects present in an image. I think in both domains, AI and deep learning have helped these technology areas improve quite a bit.

Such technologies have been applied everywhere, for everyday life. All smart phones have facial recognition technology built-in today. Some companies deploy facial recognition technology outside of personal use to see, for instance, if you’re an employee or not of said company.

Grokstyle, one of our portfolio companies, has been in the news lately for GrokNet technology powering Facebook’s new AI shopping too, after acquired by Facebook successfully last year. What Grokstyle’s technology did was it helped you identify objects in images. Given a picture, it can tell you what the furniture in the picture is, for example, what kind of the bag a person is carrying, what the brand of the bag is. Grokstyle was working with IKEA (Scandinavian furnishings company) to deploy their technology. If you want to buy a piece of furniture from IKEA, for instance, you would want to know what it looks like in your home. So, Grokstyle uses AR technology to show you what the table would look like when it’s put in your family room. That’s all based on computer vision.

With all these applications in mind, including Grokstyle, what do you think have been some important breakthroughs in this field that enable these technologies and where do you see room for growth?

Computer vision technology is enabled by deep learning, but within the broader category of deep learning there are also different technologies.

For image classification, it’s mostly a combination of neural networks. For object detection, it used to be a very difficult problem because there are many different types of objects, they may be in different parts of the image. Nowadays, there are some advances like YOLO (You Only Look Once), a deep learning technology that makes object detection very efficient. Because of this, object detection can now be applied everywhere.

Besides the applications I’ve already mentioned, there’s also process automation based on computer vision. The next thing we can anticipate with regards to breakthroughs in computer vision will be in healthcare, in which we definitely want to see more progress. Today, we rely on doctors to read X-ray images, to understand CT scans and to diagnose patients based on what they see. But doctors might make mistakes, and different doctors classify diseases differently. Computer vision can definitely make good doctors better.

As a follow-up: how do VCs fit into the picture? More specifically, what are Amino’s priorities with regards to investments in computer vision?

At Amino, our investment theme is data. We focus on big data, so we invest in companies that can utilize data. Computer vision/AI is definitely one area where businesses use big data to do analysis and prediction as part of their core business models. So, Amino is definitely really interested in investing in computer vision and object detection, especially in healthcare. We’ve actually invested, for example, in a company called BrainKey. It uses object detection and computer vision to analyze CT, MRI and MRA scans of people’s brain images. Based on computer analyses, it can tell whether someone has a tumor, their brain age and early sign for neurodegenerative diseases.

Generally, I think all the image recognition work in healthcare done currently by doctors and technicians will eventually be replaced by computers. Amino is definitely excited to be in this area and is very interested in investing in this area.

How do you think Amino leads with regards to investing in computer vision technologies, and how does Amino use its platform to be a leader in this space?

At Amino, our partners are all tech-savvy, and we feel very comfortable investing as first check in technology, especially domains like big data and machine learning. That’s our advantage.

Also we are based in Silicon Valley, that’s where technological trends come from. We definitely want to take advantage of both our location and expertise areas. By investing in technologies like computer vision and AI, we can do much better than other investors in this regard.

You said that technology trends start in Silicon Valley. I’m curious, though, about where computer vision technologies in particular come from. We see in the case of Grokstyle: here’s a computer vision technology that started in academia at Cornell, not Silicon Valley, and then spun off into the private sector. Where do you see these technologies coming out of mainly? Are they coming out of Silicon Valley, or are they increasingly coming out of academia like in Grokstyle’s case?

That’s a good question. If you look at deep learning overall, the technologies are actually not coming from Silicon Valley. The real force behind recent advancements in deep learning actually come from academia. But the problem with academia is that academics don’t have a huge amount of data to prove that their algorithms can work effectively. A huge company like Google, for example, can then pick up the algorithms these academics create while working in academia and prove their effectiveness through the huge amount of data available to Google. As a result, Google is able to use these technologies and is also able to improve upon these technologies and advance these fields in tech.

Even Grokstyle, for example, started at Cornell from Professor Kavita Bala and her PhD student Sean Bell. But, they moved to San Francisco because it’s where the talent, venture funding and tech giants are. A lot of ideas come from academia, but founders move to Silicon Valley if they want to build a successful company.

Besides talent, the companies here in Silicon Valley, for the most part, have a huge amount of data. This trend is affected by large companies like Facebook and Google that have a lot more advantages over smaller companies since they have a lot of data. For computer vision to work, you have to have data to work on!

So, yeah…academia is still a driving force behind a lot of the technologies.

You spoke about academics interested in scaling their technologies moving to Silicon Valley to take advantage of the region’s resources as well as about taking advantage of the resources of large companies. How do you think early-stage computer vision founders moving to Silicon Valley can build winning companies? In addition, what is the “best path” to success: is it scaling to IPO or shooting for acquisition from one of these big players given the fact that these big players have a lot of data?

Startups need to either partner with big companies that have access to a lot of data or find ways to be able to succeed independently. In the case of Grokstyle, they were able to do something with IKEA as an independent company before the Facebook acquisition, but they still needed a lot of resources in order to integrate completely with IKEA. Facebook definitely gave the company some leverage in that regard.

Compared to those big companies, startups definitely need to find ways to grow their own data. They need to find partnerships with data providers. That doesn’t necessarily mean big tech companies like Facebook or Google…it just means that founders need to figure out how to collect data. That’s definitely a problem that startups have to overcome.

With that in mind, another computer vision company we’ve invested in, Orbeus, was acquired by Amazon in 2015. It had some users before acquisition, but it definitely grew a lot faster after it was acquired by Amazon. It was also considered a very successful acquisition, as Orbeus team joined Amazon and their ReKognition API was rebranded to be AWS ReKognition API offered for developers. So, that’s one way that startups can achieve success and exit: through acquisition.

But, as venture backers, ideally we want to see these startups pursue IPOs since that’s much better for us as investors not only financially but also with regards to looking out for companies’ success as organizations. To do that, startups need to figure out some ways to pursue partnerships to get data or become a platform to collect enormous data itself. In some country, they might be able to deploy their technologies to public sector to get data. I think that’s how a lot of companies in China succeed and became billion dollar startups, such as SenseTime: they were able to, for instance, deploy facial recognition technology in shopping centers and other public space. On the other hand, facial and object recognition companies in the US such as our invested companies Orbeus and GrokStyle, rarely reached billion-dollar valuation before. Fortunately, in recent years, we’ve seen startups like Wyze Cam, one of our portfolio companies, taking advantages of both computer vision and hardware excellency to better protect families and businesses. Their key success is that they made AI powered hardware very affordable, just like Tesla Model 3. We are excited to see that AI startups finally find a very sustainable business model in the US.

Seems like there are a lot of paths to success for these startups. To that point, seeing as Amino is an early-stage fund, I’m sure that you get to see a lot of emerging technologies early-on and get to help these early-stage founders achieve success. What do you focus on with regards to helping them succeed?

A lot of our founding Partners, LPs, and advisors are company executives in Silicon Valley, so those connections are valuable in raising further rounds of funding or pursuing acquisition. In Silicon Valley, the circle we have is also tech-savvy, so we are happy to make connections and introduce top tier talents when startups are experiencing fast growth.

Another thing is that we’ve invested in a lot of companies, over 160 in fact, so we have our own circles of startups and startup founders. We organize meetings and technical discussions for founders from our portfolio companies to get together and get to know each other. These kinds of bonds will help founders from our portfolio companies succeed.

So sharing the learnings, in a sense.

Yeah.

Given your expertise in computer vision, my final questions center around that. First, how do you see the computer vision landscape changing in the next few years or so and, second, how do you think startup founders and investors can take advantage of these changes to build innovative startups or to invest in new technologies?

Computer vision helps automate a lot of things. Right now, we’re in a very…strange and interesting time during the COVID-19 pandemic. In a lot of things, we’re trying to avoid contact with each other. So, computer vision is definitely a huge help here in eliminating the need for humans to do things in places like factories and warehouses. We can use computer vision to replace manual work like that. That’ll definitely help with the current situation.

On the other hand, I think the impact on society…there can be some negative impacts. Privacy is definitely an issue. If you deploy computer vision everywhere then people will feel like they’re monitored everywhere. That’s a huge concern for Western society. The other thing is: computer vision will definitely be deployed more and more, and some people will lose their jobs to automation. We need to think about how society deals with the situation where many people lose their jobs due to computer vision and automation.

With more and more data available as well as with advances in algorithms and hardware, I think computer vision will definitely be deployed everywhere for applications in everything from healthcare to autonomous vehicles. I also think robotic process automation will be a major application of computer vision.

Deep learning has definitely been a factor for computer vision’s rapid advancement in recent years. I think we should be really excited about this, and I think VCs and investors want to ride this wave and invest more in startups doing things in the computer vision domain.

Last question as a follow-up to that thought: what do you think the role of VCs is in accelerating the growth of the computer vision field?

When VCs recognize the potential of a certain domain in tech, a lot of money will flow in.

With regards to connections for startups, VCs add value by sharing their experience and their networks, as we have been there and have done that. That can help startup founders form partnerships and find potential buyers/investors to help grow their businesses.

Makes sense, thank you! And thank you so much for your time!

Thank you!

Written by Patricia Tang, Northwestern ’20

--

--

Amino Capital
AMINO insights

An early stage venture firm based in Palo Alto, focused on data driven technologies