Insights from Visual Deep Learning survey 2019

Published in

Diffgram

5 min readApr 26, 2019

Thank you very much to everyone who completed the lengthy 23 question survey. The survey took an in depth look at how deep learning systems are structured and plans for 2019. Here I add my opinions on the data and include the raw charts.

Top 5 takeaways:

79% of primary use cases are not very time sensitive.

This stands in stark contrast to so much effort to reduce prediction time from say 500ms to 400ms. While there’s clearly a need for speed in some cases, they are the exception not the rule.

89% of annotation work is not easy, requiring hours or days of training, or subject matter experts, ie Engineer, Doctor, Lawyer

As models improve, annotation work is expected to get more complex over time. 1/3 of cases already require a subject matter expert!

Less than 10% have an automated method to manage exceptions, and most aren’t sure how to

69% plan to use active learning

Given how new active learning is in the context of deep learning this is somewhat surprisingly high. Active learning is when you use your model’s results as the starting point for next person’s annotations. Fast Annotation Net is one example.

In house annotation is 33% more popular than outsourcing.

And a noticeable percent plan to use simulated data!

Takeaways

Bias is as much a concern as system cost. Data annotation remains the biggest challenge.

On average each person selected 3 items. This supports there being multiple large challenges to tackle. A team needs to solve most of these problems to deploy a useful system.

About 50% plan to implement a system within 6 months.

1/3 plan to pilot a system in 2020. This is sorta like a vase that looks like two faces problem. One could say this supports the majority not planning to deploy systems for a while, and/or that the majority are trying to get something out the door, but limited by the challenges mentioned above.

78% of classes are somewhat or very complex

This supports the 89% of annotations considered to be “not easy” question.

In house and open source tools dominate.

Some are starting to use commercial tools like Diffgram.

59% use at least 2 models for a single use case

Many public examples focus on training a single model, so it’s interesting to see such a high percent working with at least 2 models! Even 12% saying they are using 10+ models is interesting! This a direction Diffgram supports through Enterprise product but would be difficult to implement in house.

74% use only public off the shelf models, while 20% use a mixture of that and secret / propriety models

I predict this will shift more and more towards public algorithms over time. With AI progress being faster than Moore’s law it just doesn’t make sense (in most cases) to try and keep something propriety here.

53% of datasets are within 501–10k samples, and 19% > 10k samples

No single dataset larger 1 billion examples, but a small percent are working with over 1 million examples per set.

91% plan to work with 2 or more data sets

With a small percent planning to work with over 100 data sets! Most public stuff talked about singular data sets so seeing the vast majority work with many at a time is very interesting.

83% plan to retrain models on a regular basis

It seems like people are well aware of the concept of model drift. With 1/3 having a daily or weekly re training plan.

Who took the survey?

82% from engineering or science roles

53% from software organizations and 21% engineering or design

70% from teams of 1–30 people and 29% 31–10,000 people

Thanks again to everyone who took this survey!

Diffgram has great support to work with the most in demand approaches, like object detection, active learning, and regular retraining.

If you need to deploy an AI system Diffgram (where I work) is one of the best ways to do it! You can create a free account to try it. And if you are an exceptional software engineer we encourage you to apply.