TV Stations Logo Recognition

Implementing a logo detection application based on Deep Learning deployed on Flutter.

Leonardo Lara
Analytics Vidhya
8 min readSep 2, 2019

--

AI is revolutionizing the systems and causing a global paradigm shift.
In countries like Brazil under a powerful media environment that concept seems to fit very well due the citizens have no access on how their public measuring system works e.g.:

  • Audience System (Monopoly of an unreliable Institute)
  • Election Poll (Done by manipulated institutes)
The Beep App Project

⠀⠀⠀⠀⠀⠀⠀⠀⠀

I was part on the Beep App that brought the proposal to give the population the ability to interact with audience data, also delivering trustful information to advertisers and media agents with real time data.

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

What to expect from machines in object recognizing?

Machines are fantastic, nowadays are better than humans at recognition of patterns, shapes, colors, human faces. What’s more, they surpass our listening level and can copy a person’s voice in a matter of seconds.

Object Detection task of the Large Scale Visual Recognition Challenge (LSVRC). Source: image-net.org

Deep learning models could outperform human level of detection in object recognizing on Imagenet competition since 2015. This kind of technology could have a wide usage for surveillance, automatic identification, mapping resources using drones and so on.

How to automate audience measuring system?

To achieve this goal, I trained 2 deep learning models on Darknet neural network framework, the first for capture the TV Station Logo from close distance getting more accuracy on detection and other for greater distance getting more convenience on smartphone handling.

The logo objects for detection:

Logos for Object Detection

Here, a demonstration of the final trained convolution neural network:

Trained Convolution Neural Network for Logos Detection

I decided to use Yolo V2 (You only look once) because its excellent level of accuracy together with the minor time of training. I couldn’t use the Yolo V3 version because some operations aren’t supported at the conversion to TensorFlow lite (FlatBuffer Model). I had to build a lightweight version of the model using “Tiny” configuration on Yolo to work properly on mobile.

PROJECT GITHUB:

This Project is related to my Github repository: (Link)

Part 1: Getting Data

For the close distance model I got the data from Google Images using python library google_images_download (Link), I have demonstrated the use on my github web scraping script where I got all 483 Brazil TV Stations Logos (Link).

TV Program: National News from Globo Television 2019

For greater distance model I got the data from recorded videos of TV Programs from Youtube together with Logos on the corner of screen. Therefore, I got at least 2000 full screen images from each TV Station using the FFmpeg platform for video manipulating.

⠀⠀⠀⠀⠀⠀⠀⠀⠀

ffmpeg -i RECORD1.mp4 -vf “fps=1” -q:v 2 record_%03d.jpeg

On this command I extracted 2000 frames from this video (RECORD1.mp4) at every second in a row generating images record_xxx.jpg for feed the model.

Part 2: Data Augmentation

For this purpose, I used the vision module from Fastai library Deep Learning on Google Colab. The process is explained on notebook:

Data Augmentation Script

This notebook apply all types of transformations on TV Logos images like illumination change, size, crop, angle of view, blur and so on. Data Augmentation is a way to artificially expand your dataset and improve the final accuracy of the model. Here, Data Augmentation of Globo Television Logo:

Globo’s Logo Data Augmentation

Part 3: Annotation

So I’ve got at least 2000 augmented logos images for each Tv Station. Is it ready to train? Not yet, the training process needs a process called annotation to recognize specifically the logo format and position on screen.

Yolo Mark Tool

Yolo Mark was chosen because you can easily get the annotation under Darknet standards. Over each image the software generate a txt file containing the coordinates of the Bounding Boxes and class.

Part 4: Training on Darknet

What does Computer see?

⠀⠀⠀⠀⠀⠀⠀⠀⠀

⠀⠀⠀⠀⠀⠀⠀⠀⠀

The CNN (Convolutional Neural Network) process images through many layers getting patterns of shapes and colors using filters (kernels), after go to classification to identify if the object belongs to some specific class. All parameters can be configured for training.

⠀⠀⠀⠀⠀⠀⠀⠀⠀

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

I accessed Darknet using the AlexeyAB Fork which is a popular version of the Darknet with more options, its faster and more accurate compared to original Darknet program.

In order to start training, you need basically 3 files:

1- (.DATA file). In order to give Darknet all image files, txt annotation path and classes.

2- (darknet19_448.conv.23). Pre-trained weights are used to transfer learning into a new model to avoid start from scratch.

3- (.CFG file) Configuration File. I used the Yolo V2 Tiny file and did the configuration of parameters like filters, batch size, subdivisions.

SO, started to train:

./darknet detector train logopaca.data cfg/logopaca_yolov2-tiny.cfg darknet19_448.conv.23 -map

I put the option “-map” on command in order to visualize the history of Loss Function that shows a penalty score from the False positives and False Negatives. The mAP indicator calculate the Average Precision of the model at specifics ranges, it’s really useful for evaluate the best model.

I decided to use the 14000 iteration instead the later iterations due:

1- The variation on loss value was almost null at 14000, what indicate reduced chance of overfitting compared to later iterations.

2- The the good mAP 98.74% , what indicate a excellent precision on detection.

3- The F1-score was 0.96, it considers both the precision and recall to analyze how the model perform in detection of the specifics TV Station Logos adding the False Negative results.

This is a resume of the 14000 weight file containing the precision for each TV Station:

Calculation of mAP of 14000 iteration weight file.

As a result, with the (.weight file) I start to translate the Darknet to TensorFlow format using the Darkflow:

This code translate to (ProtoBuf) format:

sudo flow — model logopaca_yolov2-tiny.cfg — load logopaca_yolov2-tiny_14000.weights — savepb

From Darkflow: When saving the .pb file, a .meta file will also be generated alongside it. This .meta file is a JSON dump of everything in the meta dictionary that contains information necessary for post-processing such as anchors and labels. The created .pb file can be used to migrate the graph to mobile devices (JAVA / C++ / Objective-C++).

With the .pb file and .meta I started to convert this graph to a TensorFlow lite (.tflite) format using “tflite_convert” from TensorFlow Library in UBUNTU:

tflite_convert — graph_def_file=built_graph/logopaca_yolov2-tiny.pb — output_file=built_graph/logopaca_yolov2_tiny_far.lite — input_format=TENSORFLOW_GRAPHDEF — output_format=TFLITE — input_shape=1,416,416,3 — input_array=input — output_array=output — inference_type=FLOAT

Part 5: Deploy on Flutter

Flutter added a support to image streaming in the camera plugin. This gave the capability to capture separated frames in a camera preview. With this ability I was able to connect this streaming with a tflite (TensorFlow Lite) plugin to process instantly this frames with object detection.

Camera Plugin: https://pub.dev/packages/camera

First I called the startImageStream in the camera controller in order to get the CameraImage img (image format), (height), (width) and (planes) what is the bytes of image.

controller.startImageStream((CameraImage img) {
if (!isDetecting) {
isDetecting = true;
int startTime = new DateTime.now().millisecondsSinceEpoch;

The tflite plugin access the TensorFlow Lite API that supports detection (SSD and YOLO), Pix2Pix and Deeplab and PoseNet on both iOS and Android.

The plugin has the function detectObjectOnFrame that receive that parameters and process the recognitions:

Tflite.detectObjectOnFrame(
bytesList: img.planes.map((plane) {
return plane.bytes;
}).toList(),
model: widget.model == yolo ? “YOLO” : “SSDMobileNet”,
imageHeight: img.height,
imageWidth: img.width,
imageMean: widget.model == yolo ? 0 : 127.5,
imageStd: widget.model == yolo ? 255.0 : 127.5,
numResultsPerClass: 1,
threshold: widget.model == yolo ? 0.2 : 0.4,
).then((recognitions) {
int endTime = new DateTime.now().millisecondsSinceEpoch;
print(“Detection took ${endTime — startTime}”);
widget.setRecognitions(recognitions, img.height, img.width);isDetecting = false;
});

For output the recognitions, I’d rather to put the preview into RotatedBox and OverflowBox widget. The first rotated the image accordinly the mobile orientation like landscape and portrait. The second fits the preview exactly with the mobile screen size.

Widget build(BuildContext context) {
if (controller == null || !controller.value.isInitialized) {
return Container();
}
var tmp = MediaQuery.of(context).size;
var screenH = math.max(tmp.height, tmp.width);
var screenW = math.min(tmp.height, tmp.width);
tmp = controller.value.previewSize;
var previewH = math.max(tmp.height, tmp.width);
var previewW = math.min(tmp.height, tmp.width);
var screenRatio = screenH / screenW;
var previewRatio = previewH / previewW;
return RotatedBox(
quarterTurns: MediaQuery.of(context).orientation == Orientation.landscape ? 3 : 0,
child: OverflowBox(
maxHeight:
MediaQuery.of(context).orientation == Orientation.landscape ? screenW / previewW * previewH : screenH,
maxWidth:
MediaQuery.of(context).orientation == Orientation.landscape ? screenW : screenH / previewH * previewW,
child: CameraPreview(controller),),
);
}
}

The Bounding Boxes on Detections I got the base code from ticketdamoa project and made some modifications to improve the device compatibility, and for show the class name on the box.

Widget build(BuildContext context) {
List<Widget> _renderBoxes() {
return results.map((re) {
var _x = re[“rect”][“x”];
var _w = re[“rect”][“w”];
var _y = re[“rect”][“y”];
var _h = re[“rect”][“h”];
var scaleW, scaleH, x, y, w, h;
scaleH = screenW / previewW * previewH;
scaleW = screenW;
var difH = (scaleH — screenH) / scaleH;
x = _x * scaleW;
w = _w * scaleW;
y = (_y — difH / 2) * scaleH;
h = _h * scaleH;
if (_y < difH / 2) h -= (difH / 2 — _y) * scaleH;
return Positioned(
left: math.max(0, x),
top: math.max(0, y),
width: w,
height: h,
child: Container(
padding: EdgeInsets.only(top: 5.0, left: 5.0),
decoration: BoxDecoration(
border: Border.all(
color: Color.fromRGBO(37, 213, 253, 1.0),
width: 3.0,
),
),
child: Text(
“${re[“detectedClass”]} ${(re[“confidenceInClass”] * 100).toStringAsFixed(0)}%”,
style: TextStyle(
color: Color.fromRGBO(37, 213, 253, 1.0),
fontSize: 14.0,
fontWeight: FontWeight.bold,

The Final Result:

Close Distance:

Far Distance:

--

--

Leonardo Lara
Analytics Vidhya