Do you know, you can train your deep learning model in your browser ?

Giscle
Giscle
Published in
6 min readJan 19, 2018

We are about to finish Perception system of our Autonomy stack, we though of sharing all the available Deep learning frameworks and segmentation models. We have been using few of them on everyday basis but there are also few framework which we have not used yet. Some of famous such as TensorFlow, Keras, Pytorch, MxNet etc. very popular, some framework outdated such as theano, Caffe etc. and there are few which can even run in your browser. We have tried to cover all the framework, if we are missing any available please let us know in the comment so, we can can include in this blog. We also have been working on Semantic Segmentation so we have discussed all the available Segmentation models.

Deep Learning Frameworks:

Tensorflow — One of the most popular Deep Learning libraries out there, tensorflow, was developed by the Google Brain team and open sourced in 2015. Positioned as a second-generation machine learning system, tensorflow is a Python-based library capable of running on multiple CPUs and GPUs. It is available on all platforms, desktop, and mobile. It also has support for other languages such as C++ and R and can be used directly to create deep learning models, or by using wrapper libraries on top of it.

Keras — Although TensorFlow is a very good deep learning library, creating models using only tensorflow can be a challenge, as it is a pretty low-level library and can be quite complex to use for a beginner. To tackle this challenge, Keras was built as a simplified interface for building efficient neural networks in just a few lines of code and it can be configured to work on top of TensorFlow. Written in python, keras is very lightweight, easy to use, and pretty straightforward to learn. Because of these reasons, tensorflow has incorporated eras as part of its core API.

Caffe — Built with expression, speed, and modularity in mind, caffe is one of the first deep learning libraries developed mainly by Berkeley Vision and Learning Center (BVLC). It is a C++ library which also has a python interface and finds its primary application in modelling Convolutional Neural Networks. One of the major benefits of using this library is that one can get a number of pre-trained networks directly from the Caffe Model Zoo, available for immediate use. This library is efficient in modelling CNNs or solving image processing problems. Following the footsteps of Caffe, Facebook also recently open-sourced Caffe2, a new light-weight, modular deep learning framework which offers greater flexibility for building high-performance deep learning models.

Torch — is a Lua-based deep learning framework and has been used and developed by big players such as Facebook, Twitter and Google. It makes use of the C/C++ libraries as well as CUDA for GPU processing. Torch was built with an aim to achieve maximum flexibility and make the process of building models extremely simple. More recently, the python implementation of Torch, called PyTorch, has found popularity and is gaining rapid adoption.

PyTorch — is a Python package for building deep neural networks and performing complex tensor computations. While Torch uses Lua, PyTorch leverages the rising popularity of Python, to allow anyone with some basic Python programming language to get started with deep learning. PyTorch improves upon Torch’s architectural style and does not have any support for containers — which makes the entire deep modelling process easier and transparent.

Deeplearning4j — DL4j is a popular deep learning framework developed in Java and supports other JVM languages as well. It is very slick and is very widely used as a commercial, industry-focused distributed deep learning platform. The advantage of using DL4j is that one can bring together the power of the whole Java ecosystem to perform efficient deep learning, as it can be implemented on top of the popular Big Data tools such as Apache Hadoop and Apache Spark.

MXNet — is one of the most languages-supported deep learning frameworks, with support for languages such as R, Python, C++ and Julia. This is helpful because if one doesn’t know any of these languages, he won’t need to step out of his comfort zone at all, to train his deep learning models. Its backend is written in C++ and cuda, and is able to manage its own memory like Theano. MXNet is also popular because it scales very well and is able to work with multiple GPUs and computers, which makes it very useful for the enterprises. This is also one of the reasons why Amazon made MXNet its reference library for Deep Learning too. In November, AWS announced the availability of ONNX-MXNet, which is an open source Python package to import ONNX (Open Neural Network Exchange) deep learning models into Apache MXNet.

CNTK — is an open-source deep learning toolkit to train deep learning models. It is highly optimized and has support for languages such as Python and C++. Known for its efficient resource utilization, one can easily implement efficient Reinforcement Learning models or Generative Adversarial Networks (GANs) using the Cognitive Toolkit. It is designed to achieve high scalability and performance and is known to provide high performance gains when compared to other toolkits like Theano and Tensorflow when running on multiple machines.

Deeplearn.js — With deeplearn.js, onecan now train neural network models right in the browser! Originally developed by the Google Brain team, deeplearn.js is an open-source, JavaScript-based deep learning library which runs on both WebGL 1.0 and WebGL 2.0.

BigDL — is distributed deep learning library for Apache Spark and is designed to scale very well. With the help of BigDL, one can run deep learning applications directly on Spark or Hadoop clusters, by writing them as Spark programs. It has a rich deep learning support and uses Intel’s Math Kernel Library (MKL) to ensure high performance. Using BigDL, one can also load pre-trained Torch or Caffe models into Spark. If one wants to add deep learning functionalities to a massive set of data stored on cluster, this is a very good library to use.

Segmentation Models:

SegNet — is a deep encoder-decoder architecture for multi-class pixelwise segmentation researched and developed by members of the Computer Vision and Robotics Group, at the University of Cambridge, UK.

ICNet — It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. A compressed-PSPNet-based image cascade network (ICNet) incorporates multi-resolution branches under proper label guidance to address this challenge. The system yields real time inference on a single GPU card with decent quality results evaluated on challenging Cityscapes dataset.

RCNN — This approach is combination of two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset.

Full-Resolution Residual Networks (FRRN) — Used for semantic segmentation, ResNet-like architecture that exhibits strong localization and recognition performance. They combined multi-scale context with pixellevel accuracy by using two processing streams within the network: One stream carries information at the full image resolution, enabling precise adherence to segment boundaries. The other stream undergoes a sequence of pooling operations to obtain robust features for recognition. The two streams are coupled at the full image resolution using residuals. Without additional processing steps and without pre-training, our approach performs much better.

Pyramid Scene Parsing Network — the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

The above work has been done by Nidhi Saxena

We started Giscle to bring the change which we wanted as a Student as a Teen as a Citizen and build a future in which we wanted to live. And if you also want to bring the change and build the future, please share your passion at passion@giscle.com

--

--

Giscle
Giscle
Editor for

Computer Vision platform offering three core vision services (Detection, Recognition and Analysis) in the form of easy to integrate APIs and SDKs.