How to build own computer vision model? Part:1

Halil Yılmaz
Analytics Vidhya
Published in
4 min readNov 11, 2020
photo by — https://news.mit.edu/

How to build own computer vision model? Let’s answer this question by understanding deep learning.

How does it work?

Deep learning works by imitating normal neural networks. In the human brain, this process is provided by small electrical currents from synapses.
Artificial neural networks also perform these operations with the help of layers created by imitating them. The models we create work like a baby sees, saves it in memory and then recognizes it. For example, if we want to have an image processing process, we first upload the photographs of the object we want to recognize to our model. Then when we show a different photo with the same object for testing, we expect it to recognize it. I will talk about the details of this recognition process in my next article.

Layers and model architecture are shown above in its simplest form. Deep learning basically works with this architecture. Hidden layers where the input layer data is loaded into the model and then processed. The last output layer from which we output. The basic here is to find the correct weights by updating the W (weight) values. After mentioning this briefly, let’s see how operations are performed on the picture in computer vision.

When an image is given to the model, the machine divides it into pixels and converts it into a single line (1 * n) array. Since this is an introduction, we will talk about the details in the second article..

Why does it know wrong?

One of the latest news. The artificial intelligence confused the referee’s bald head with the ball and followed the referee’s bald head instead of the ball.
What do you think could be the reason for this? Yes, the lack of ball photos in the photos pre-uploaded to the model. Or the lack of human photographs in every way and perspective. Unfortunately, computers are the stupidest creature ever. If you want to tell him something, you have to explain it in great detail.

How are data sets created?

For this, you can take photos from a very nice and large database.

https://storage.googleapis.com/openimages/web/index.html

You can access it from the link. There are useful tools for downloading photos from here.
OIDv4_ToolKit created by EscVM open source is very useful for this.
In this series we will use the darknet’s yolo framework. This tool also allows us to download and convert images in yolo format. First of all, let’s examine how this format is.

Yolo (you only look once) expects to have txt files with the same name as the images. For example picture1.jpg and picture1.txt. So what is in these txt files? The object we want to introduce to the model has coordinates and a class number. The number of the class can start from 0 and increase up to the number of classes. Coordinates are formed by normalizing between [0,1]. Like left, top, right, bottom. OIDv4_ToolKit does not give us txt files in this way, we create it by putting the python code in the same directory. Sample txt content should be as follows.

‘0’ indicates the class. The other four values ​​are the coordinates of the object.

With the block above, you can convert the txt files into this form and bring them into the same directory as the images. The final version should be as follows.

The steps for cloning and downloading should be as follows:

You can download it by running the git clone command.

Download the requirements with the above command.

And finally, as above, you can use the main.py file to download files and photos of the class you want. Note: Remember! The larger your dataset, the better the success of this model.

After the convert_annotations.py file is run, the directory will be as shown above. This means it is suitable for yolo format.

Conclusion

In this article, the working logic of deep learning is touched on superficially and the download and creation of data sets to be used for computer vision are examined. In the next article, we will talk about creating a model and processing our data set.

--

--