The fastest way to analyze models for object detection

Pierre-Nicolas Tiffreau
Picsellia

--

Training an object detection model should be an easy task today but can still feel struggling. We will learn how to choose, tune and train any deep-learning architecture seamlessly to counter this problem.

Do you ever feel that you can’t keep up with all the new trendy models that came out of labs every days ?

To me it seems like I see plots like this everyday

model benchmark

And now I think :

I used to train Yolov3 a few years ago and it was awesome, let’s check those brand new architectures and create even better models !

So I start searching for tutorials, Github repos, or even pre-trained weights so I can start using those fancy new architectures on my own data and see if I can produce better AI models for my company’s tasks.

But the truth is, even if I found the resources needed to train those architectures, I usually end up with the feeling that the results are not far greater than those I managed to get a few years ago.

Is it my fault ? I made mistake in my code ? maybe my dataset isn’t that great ? How can I rapidly know if it’s better than other models ?

So how can you really be sure of your ability to train and reproduce results ? Always have the best architecture at the tips of your fingers ? And finally choose the best algorithms for YOUR BUSINESS PROBLEMS ?

That’s what we will cover in this article, we will divide the work in a few parts :

  • Choose a suited dataset (for object detection)
  • Select a bunch of models and create training scenarios
  • Compare the training logs and metrics
  • Compare the performances live on some images

Let’s go !

For the sake of simplicity of this tutorial, we will use Picsellia Platform as it gives us a great panel of tools to do all the above tasks seamlessly.

Choose a dataset

As we are going to train models for an object detection task we have to choose a dataset for … you get it ! Object Detection !

In order to have significant results, we will train our models on ~6500 images annotated with 144000 cars and 106000 pedestrians that looks like this

ground truth example

The dataset is called VizDrone and can be found fully annotated with 11 different classes on Picsellia.

Select models

Picsellia propose some ready-to-use SOTA architectures including most recent ones (for example EfficientDet-dx).

some available architectures

For our test we will compare the following architectures (brand new ones against older ones):

  • efficientDet-d0 (base efficientDet)
  • efficientDet-d2 (heavier efficientDet)
  • faster-rcnn-resnet50 (older model)
  • ssd-mobilenet-640 (lighter older model)

First, let’s create a project where we can schedule training for each of this model.

project dashboard

We can see that our dataset is well attached, now let’s create the different experiments.

Create training scenario

We will create some ‘dummy’ experiments meaning that we will not try to optimize the hyperparameters in this article (don’t worry we will cover the subject in the next one).

That said, I will keep the default parameters proposed by Picsellia and I will use the very same parameters for each and every experiments so we only compare the architectures.

experiment setup panel

We will initialize our different tests with the dedicated UI where we can select a base model from the ones we saw earlier and have already configured optimal parameters.

After repeating the operation for all models we can check that we have everything set up.

list of experiment

Looks good ! If we go to any of the experiment and check the files, we can see that we have successfully cloned all the needed files for training from the pre-trained models.

experiment files

Now, we just have to launch training and save every metrics and logs to the platform thanks to Picsellia’s Python SDK.

One way to do it is to check the ‘launch’ tab in one of those experiment and copy the command to launch the pre-packaged Docker image ready for training.

(For obvious reasons we have blurred our account and project token but you can replace those with your own)

docker snippet for training

Now, I will just copy paste this command in our server equipped with NVIDIA GPUs and see the magic happen !

Let’s do this with the 4 experiments we just created and then wait a few hours for the training of the whole thing.

Compare the results

If your trainings are all over, you can now go back to Picsellia, in your experiment list, select them all and then click on ‘Compare’.

Now you should see a dashboard of all the different training logs, evaluation metrics …

the experiment dashboard

What can we conclude about our trained models ?

To perform a first, really simplistic analysis, we will only look at a few evaluation metrics :

  • the mAP (mean Average Precision)
  • the AR (Average Recall)

And some of the training logs :

  • the Total-Loss

If you don’t know what those metrics are and want to learn about them in details I encourage you to check this blog post that explains everything in depth !

learning rate for all exps

Here, we can see that our models are learning something as the loss curve is slowly decreasing. But it’s quite noisy…

To solve that, as we have a lot of training images we could try to increase the batch size during training, set the learning rate to decay faster than it actually does (see next figure) or use other more advanced techniques.

What we can learn from this plot is that the efficientDets models seems to converge faster than the other and that the variance of faster-rcnn doesn’t seems to decrease with time.

What you have to understand is that our analysis is in NO WAY exhaustive and that each architecture would need different parameters to optimize the training process. But this gives us a good intuition on how they behave comparing to each other, in a very little time.

Learning rate for all experiments

If we sort our experiments based on the AR@10 score (the higher the better), we can see that our faster-rcnn model seems to perform better than the other architecture. It means that this model will be the one that will most likely NOT make false predictions.

metrics table (part 1)

What we can also see it that faster-rcnn is also the one with the best global mAP which means that it’s actually the most precise model.

Here is the other metrics available (which are variations of the mAP or AR), we can see that they follow the same trend.

metrics table (part 2)

As I said, this is not a complete analysis and your job as a data scientist is to explore the models in depth so you can compare them once you have found the best parameters for each one.

Now that we have performed a little ‘quantitative’ analysis, we will try our model on some images to see how they perform on real data.

Compare the performances

In Picsellia you can find what is called a Playground. It is a place where you can try out models live right after your trained them and stored the weights and checkpoints.

For the test, we will use the following image where we can see the ground truth overlay. This image has been used for evaluation so it is not part of the training set.

No we will try our models, adjust the threshold and see how they perform.

Testing all the models

Here are the steps we observe in the animation above :

  • Try efficientDet-d2 model and set a reasonable threshold
  • Try all the other models with the same threshold
  • Adjust the threshold until we can see some pedestrians
  • Try all the models with this threshold

As we can see, it is faster-rcnn that seem to perform best on our image, meaning that our previous little analysis wasn’t that far from truth !

results with faster-rcnn

Conclusion

The goal of this article was to see if we could rapidly identify, use and compare some model architectures (more or less new) and perform a quick analysis that would drive our future full exploratory experiments.

As we can see, it’s not the newest, fancy architectures that perform the best with our first approach, doesn’t mean that they can increase in performance with a real, exhaustive training.

The whole thing took me half a day, training included, which is a good thing because that means that you don’t have to spend days or weeks (or even months) to explore your project and know for sure if it has a chance to succeed or not.

I hope that you enjoyed this article, soon, we will dive into one architecture and see how we can efficiently perform hyper-parameter-search and what is the influence of every parameter on the training process and on the model results.

If you think that Picsellia offers a very optimal way to run experiments and want to join, claim your 14 day free trial here! We’ll get back to you shortly, and get you started in no time.

Please tell me in the comments what you would want me to discuss in the next articles!

See you soon! 👋

--

--

Pierre-Nicolas Tiffreau
Picsellia

Tech and software enthusiast. Cofounder and CTO @Picsellia, building end-to-end AI products.