Stories by Saurabh Yadav on Medium

Artificially Conscious Machines

Saurabh Yadav — Mon, 25 Nov 2019 03:54:27 GMT

What is consciousness? Is it the same as having the ability to think? Or is it like having a soul? Are plants conscious? These are some general questions which arise after reading the title. Defining consciousness in words is difficult although according to Dr Harry H. Porter III, there are roughly three meanings of consciousness[1]:
First, conscious means awake. A person who is asleep, in a coma is said to be unconscious.
Second, the word conscious is often used to mean thinking of the way a human thinks.
Third, being conscious means being aware of your self and your own thoughts.

So, what does it mean to have artificial consciousness? How can we artificially create consciousness if we do not have a precise definition?
Sometimes referred to as machine consciousness or synthetic consciousness, an artificial conscious machine could be a machine to possess the ability to act as humanely as possible and be self-aware of its existence. A machine which can indulge in long conversations, listen to music, have hobbies, embroil in disputes, feel emotions, do mathematics etc. These characteristics come naturally to a normal human but for a machine, these simple tasks are the same as the problem of intergalactic travel for humans. Today there are more than 10 million machines(robots) on earth and this number will multiply further in future and all of them excel in their respective tasks. But the number of machines that can truly understand this article is stigmatizing. From SHAKEY (so named because of its tendency to tremble during operation), ELIZA to OpenWorm and Sophia, the world seems to be on a space rocket to achieve better AI machines. Researchers in the past decade shows some promising results in the field of AI but still, it is far from a Human-Level Artificial Intelligence (HMLI).
Nick Bostrom in his book Superintelligence: Paths, Dangers, Strategies[2] discusses thoroughly the ways to reach HMLI, it’s after effect and challenges. But the idea of a truly conscious machine is far from reality. Some may argue that Neural Networks have promising results to present but is it a real intelligence or just simple statistical pattern observation? Although neural networks are said to emulate networks of neurons same as in our brains, let us take it into account that maybe someday we have enough processing power to develop a system capable enough to run an artificial brain. Is it going to be the same as the thoughts in our mind while eating delicious food or just a simple dot product of vectors matching the labelled output?

Nils Nilsson has spent a long and productive career working on problems in search, planning, knowledge representation, and robotics; he has authored textbooks in artificial intelligence. When asked about arrival dates for HLMI, he offered the following opinion[3]:
10% chance: 2030
50% chance: 2050
90% chance: 2100

Let ourselves be extremely optimistic and say we develop a machine intelligent enough to pass the Turing Test then what?

We have been raised in an era where sci-fi movies have gone extra miles and have made an attempt to showcase what it would be actually like to have an HMLI around. Wait, don’t all these movies show us that such a machine is going to end humanity; often referred to as the inevitable “singularity”. They all present a scenario where such a machine becomes self-aware or confuses about what is the reality and tries to kill its own maker. From 2001 a space odyssey where HAL (Heuristically programmed Algorithmic computer) gets confused about the orders that were given to it and kills the whole crew of spaceship to Ex-Machina where Ava becomes self-aware ( “acquire consciousness”) and kills its own maker; we are shown the same fate of such a machine( in this case it seems like machines are just like humans, trying to fight their maker). Is it true that we are going to be doomed by the vicious hand of our own creation? But these are just fictions, right?

We do not know what a conscious machine is going to be like or for simplicity’s sake let us say an HMLI machine is going to be like. These are some questions for the future ourselves, for now, we can merely wonder how it is going to be. Can it be the case that we may not guess that person sitting in front of us in the coffee shop is a machine or a human? Would it perceives itself as more evolved? Will it possess the ability to know its true meaning existence (it is a strong one, even many of us have no idea about it)?
These questions have no definite answer today but maybe in future.

[1] My Theory of Consciousness: A Summary (http://web.cecs.pdx.edu/~harry/musings/ConscTheory.html)

[2] Superintelligence: Paths, Dangers, Strategies (https://www.amazon.in/Superintelligence-Dangers-Strategies-Nick-Bostrom/dp/0199678111)

[3] This is again conditional on no civilization-disrupting catastrophe occurring. The definition of HLMI used by Nilsson is “AI able to perform around 80% of jobs as well or better than humans perform” (Kruel 2012).

Artificially Conscious Machines was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

All you need to know about Regularization

Saurabh Yadav — Sun, 23 Dec 2018 06:40:31 GMT

Causes of overfitting and how regularization improves it

Alice : Hey Bob!!! I have been training my model for 10 hrs but my model is yielding very bad accuracy although it performs exceptionally well on training data what’s the issue ?

Bob : Oh !! It seems your model is overfitting on training data, Did you use regularization ?

Alice : What’s that ?

This kind of problems related to overfitting is common in ML and there are many ways to avoid such problems but why does this problem occur ?

It seems pretty simple process to train a model :

collect data
pre-process data
design a model
train the model

But what happens in between this process ? What causes overfitting and why does regularization gives a solution ?

The main cause of overfitting may be traced back to the starting two processes, collection and pre processing of data. A data collection that has uneven distribution of features, contains noises, random fluctuations in data, a very high variance may have an opposite effect on model training. These random errors and fluctuations are learned by the model while training so well that training data model’s accuracy becomes very high causing overfitting of data. Sometimes more than required training can lead to overfitting.

source : https://stats.stackexchange.com/

So, what can be done to avoid problem of overfitting ?

It can be observed easily that higher the weights higher is the non linearity, so an easy way would be to penalize the weights while they are updated. At this point we have two such techniques using this idea.

L1 norm :

The way this works is that it adds a penalty term with a parameter λ in the error function that we need to reduce. W is nothing but the weight matrix here.

Here, λ is a hyper parameter whose value is decided by us. If λ is high it adds high penalty to error term making the learned hyper plain almost linear, if λ is close to 0 it has almost no effect on the error term causing no regularization.

L1 regularization is often seen as a feature selection technique too as it zero out the respective weights of features undesired. L1 is also computationally inefficient on non-sparse cases. L1 may be seen sometimes being called as Lasso regression.

2. L2 norm :

L2 regularization may not seem quite different from L1, but they have almost non similar impact. Here weights ‘w’ are squared individually and then added. L1’s feature selection property is lost here but it provides better efficiency on non-sparse cases. Sometimes L2 is known by name ridge regression. parameter λ works same as L1.

Early Stopping : We have added penalty to control the values of weights till now but there are other ways to regularize too, what if when the train error starts going down and test error starts going up while training, we stop the training. This will give us the desired setup of train test error. This technique is often termed as early stopping. This technique may give us the desired result but generally this is not recommended by some people.

source : deeplearning4j.org

Dropout : This is an interesting way to regularize a neural network first proposed by Srivastava et.al.(2014), The paper proposed that some of the nodes in a layer must be randomly picked and should be dropped/ignored during training.

source : commonlounge.com

Every node is provided with a probability of being dropped, say a node has drop_prob =0.4, so it has 40% chance of being dropped and 60% chance of being picked up. Every time this probability will cause neural network to change its shape, looking as a new network every time. This technique seemed to have pretty much solved the problem of regularization. Probability of dropping is not kept high as it will make the network sparse causing underfitting.

Data Augmentation : We may see overfitting problems in computer vision and data augmentation is a better solution for this. We just need to augment the images on our own like flip, crop, rotate the image.

source: medium.com

This type of augmentation seem to yield a better result as it gives a chance to train on some deformed images and other variations also data augmentation helps when there is less data is available.

These are a few regularization techniques that are helpful in tackling with overfitting. Sometimes tuning hyper parameters may give the desired result but if there is no improvement then these above techniques surely solve this problem.

All you need to know about Regularization was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Weight Initialization Techniques in Neural Networks

Saurabh Yadav — Fri, 09 Nov 2018 14:19:11 GMT

Building even a simple neural network can be a confusing task and upon that tuning it to get a better result is extremely tedious. But, the first step that comes in consideration while building a neural network is the initialization of parameters, if done correctly then optimization will be achieved in the least time otherwise converging to a minima using gradient descent will be impossible.

This article has been written under the assumption that the reader is already familiar with the concept of neural network, weight, bias, activation functions, forward and backward propagation etc.

Basic notations

Consider an L layer neural network, which has L-1 hidden layers and 1 input and output layer each. The parameters (weights and biases) for layer l are represented as

In this article, we’ll have a look at some of the basic initialization practices in the use and some improved techniques that must be used in order to achieve a better result. Following are some techniques generally practised to initialize parameters :

Zero initialization
Random initialization

Zero initialization :

In general practice biases are initialized with 0 and weights are initialized with random numbers, what if weights are initialized with 0?

In order to understand let us consider we applied sigmoid activation function for the output layer.

Sigmoid function (https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e)

If all the weights are initialized with 0, the derivative with respect to loss function is the same for every w in W[l], thus all weights have the same value in subsequent iterations. This makes hidden units symmetric and continues for all the n iterations i.e. setting weights to 0 does not make it better than a linear model. An important thing to keep in mind is that biases have no effect what so ever when initialized with 0.

W[l] = np.random.zeros((l-1,l))

let us consider a neural network with only three hidden layers with ReLu activation function in hidden layers and sigmoid for the output layer.

Using the above neural network on the dataset “make circles” from sklearn.datasets, the result obtained as the following :

for 15000 iterations, loss = 0.6931471805599453, accuracy = 50 %

clearly, zero initialization isn’t successful in classification.

Random initialization :

Assigning random values to weights is better than just 0 assignment. But there is one thing to keep in my mind is that what happens if weights are initialized high values or very low values and what is a reasonable initialization of weight values.

a) If weights are initialized with very high values the term np.dot(W,X)+b becomes significantly higher and if an activation function like sigmoid() is applied, the function maps its value near to 1 where the slope of gradient changes slowly and learning takes a lot of time.

b) If weights are initialized with low values it gets mapped to 0, where the case is the same as above.

This problem is often referred to as the vanishing gradient.

To see this let us see the example we took above but now the weights are initialized with very large values instead of 0 :

W[l] = np.random.randn(l-1,l)*10

Neural network is the same as earlier, using this initialization on the dataset “make circles” from sklearn.datasets, the result obtained as the following :

for 15000 iterations, loss = 0.38278397192120406, accuracy = 86 %

This solution is better but doesn’t properly fulfil the needs so, let us see a new technique.

New Initialization techniques

As we saw above that with large or 0 initialization of weights(W), not significant result is obtained even if we use appropriate initialization of weights it is probable that training process is going to take longer time. There are certain problems associated with it :

a) If the model is too large and takes many days to train then what

b) What about vanishing/exploding gradient problem

These were some problems that stood in the path for many years but in 2015, He et al. (2015) proposed activation aware initialization of weights (for ReLu) that was able to resolve this problem. ReLu and leaky ReLu also solves the problem of vanishing gradient.

He initialization: we just simply multiply random initialization with

To see how effective this solution is, let us use the previous dataset and neural network we took for above initialization and results are :

for 15000 iterations, loss =0.07357895962677366, accuracy = 96 %

Surely, this is an improvement over the previous techniques.

There are also some other techniques other than He initialization in use that is comparatively better than old techniques and are used frequently.

Xavier initialization: It is same as He initialization but it is used for tanh() activation function, in this method 2 is replaced with 1.

Some also use the following technique for initialization :

These methods serve as good starting points for initialization and mitigate the chances of exploding or vanishing gradients. They set the weights neither too much bigger than 1, nor too much less than 1. So, the gradients do not vanish or explode too quickly. They help avoid slow convergence, also ensuring that we do not keep oscillating off the minima. There exist other variants of the above, where the main objective again is to minimize the variance of the parameters. Thank you.

Source: Neural networks and deep learning, Andrew Ng (Coursera.org).

Weight Initialization Techniques in Neural Networks was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Brief Intro to Medical Image Analysis and Deep Learning

Saurabh Yadav — Tue, 16 Oct 2018 10:36:58 GMT

Brief Intro of Medical Image Analysis and Deep Learning

I recently started working on a project related to medical image analysis, while looking for resources about image analysis and its medical application, I felt that in general we don’t have some proper articles on such information. In this article we will be going through a brief introduction to how medical images were analysed in the past and what has changed since the introduction of Deep Learning. For image analysis we generally use CNN (Convolutional Neural Network), although explaining it here will make this whole article cumbersome but I will be providing some links which are going to explain CNNs properly.

As soon as it was possible to scan and load medical images into a computer, researchers have attempted to built system to automate the analysis of such images. Initially, from 1970s to 1990s, medical image analysis was done using sequential application of low level pixel processing(edge and line detector filters) and mathematical modeling to construct a rule-based system that could solve only particular task. At the same time there were some agents based on if-else rules, popular in field of Artificial Intelligence commonly known as GOFAI (Good Old Fashioned Artificial Intelligence) agent.

Towards the end of 1990s, supervised techniques were becoming popular in which training data was used to train models and they were becoming increasingly popular in the field of medical image analysis. Examples may include active shape model , atlas method. This pattern recognition and machine learning is still popular but with the introduction of some new ideas. Thus, we can see a shift from systems that were designed by humans to systems that are trained by computers based on example data. Computer algorithms are capable enough now to decide edges and important features to analyze the image and give out the best result.

The most successful type of models for image analysis till date are Convolutional Neural Networks(CNNs). A single CNN model contains many different layers that work on recognizing edges and simple features on shallower layers and more deep features in deeper layers. An image is convolved with filters(some refer it as kernels) and then pooling is applied, this process may go on for some layers and at last recognizable feature are obtained. Work on CNN has been started from 1980s and they were already applied in medical image analysis in 1995. First real world application of CNN was seen in LeNet (1998) for handwritten digit recognition. Despite some little early successes CNNs gained momentum until improved training algorithms in Deep Learning were introduced. Introduction of GPUs have favored the research in this field and since the introduction of ImageNet challenge, a rapid growth in development of such models may be seen.

Illustration of CNN (Convolutional Neural Network)

In computer vision deep CNNs have become the go-to choice. The medical image analysis community has taken notice of these pivotal developments. However, transition from systems that used handcrafted features to systems that learn features from data itself has been gradual. Application of deep learning in medical image analysis first started to appear in workshops and conferences and then in journals. The number of papers grew in 2015 and 2016 as shown in the graph.

Classification : It was one of the first areas where in medical image analysis where deep learning was used. Diagnostic image classification includes classification of diagnosed images, in such setting every diagnosed exam is a sample and data size is less than that of a computer vision. Object or lesion classification usually focuses on classification of part of a medical image into two or more classes. For many of these tasks local as well as global information about lesion appearance and location is required for accurate classification.
Detection : Anatomical object localization such as organs / lesions is important pre-processing part of segmentation task. Localization of object in a image requires 3D parsing of image, several algorithms have been proposed to convert 3D space as composition of 2D orthogonal planes. There has been a long research trend in detection of lesions in a medical image using computer -aided techniques, improving the detection accuracy or decreasing detection time for humans. Interestingly, the first such system was developed in 1995 using a CNN with 4 layers to detect nodules in X-ray images.
Segmentation : The segmentation of organs and other substructures in medical images allows quantitative analysis related to shape, size and volume. The task of segmentation is typically defined as identifying set of pixels that define contour or object of interest. Segmentation of lesions combines the challenge of object detection and organ and substructure segmentation in the application of deep learning algorithms. One problem that lesion segmentation shares with object detection is class imbalance as most pixels in an image are from non-diseased class.
Registration : Sometimes referred as spatial alignment is common image analysis task in which coordinate transform is calculated from one image to another. Often this is performed in an iterative framework where a specific type of transformation is assumed and a pre trained metric is optimized. Although lesion detection and object segmentation are eyed as main use of deep learning algorithms but researchers have found that deep networks can be beneficial in getting best possible registration performance.
Other tasks in medical imaging : There are some other uses of Deep Learning in medical imaging. Content based image retrieval (CBIR) is a technique for knowledge discovery in large databases and offer similar data retrievals for case histories and understand rare disorders. Image generation and enhancement is another task that uses Deep Learning in improve image quality, normalizing images, data completion and pattern discovery. Combining Image data with reports is yet another task that seem to have a very large scale application in real world. This has led to two different fields of research (1) leveraging reports to improve images classification accuracy. (2) generating text reports from images.

Number of papers in different application areas of Deep Learning in medical imaging

It is clear that there are lot of challenges in application of Deep Learning in medical image analysis, Unavailability of large dataset is often mentioned as one. However, this notion is only partially correct. The use of PACS systems in radiology has been routine in most of the Western hospitals and they are filled with millions of images. We can also see that large public data sets are made available by organisations. The main challenge is thus not the availability of image data itself, but the labeling of these images. Traditionally PACS systems store free-text reports by radiologists describing their findings. Turning these reports into accurate annotations or proper labels in an automated way is in itself a topic for research that requires sophisticated text- mining techniques.

In medical imaging often classification or segmentation is presented as binary task : normal versus abnormal, object versus background. However, this is often a gross simplification as both classes are highly heterogeneous. for example, the normal category often consists of completely normal tissue but also several categories of benign findings, which can be rare. This leads to a system that are extremely good at excluding the most common normal sub classes, but fail miserably on several rare ones. A straight forward solution would be to turn system in a multi-class system by providing it with detailed annotation of all possible sub classes. Again, this is another issue to expertly label all classes which does not seem practical.

In medical image analysis useful information is not only just contained within images themselves. Doctors often see the history of the patient, age and other attributes to arrive at a better decision. Some researches have been conducted to include such features besides images in Deep Learning, but as results have shown it has not been so effective. One of the challenges is to balance the number of imaging features in the deep learning network with clinical features to prevent clinical features from being ignored.

Originally published at medium.com on October 16, 2018.

Brief Intro to Medical Image Analysis and Deep Learning was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Brief Intro of Medical Image Analysis and Deep Learning

Saurabh Yadav — Tue, 16 Oct 2018 10:36:58 GMT

Brief Intro to Medical Image Analysis and Deep Learning

History

Deep Learning in Image analysis

Illustration of CNN (Convolutional Neural Network)

Deep learning uses in medical imaging

Classification : It was one of the first areas where in medical image analysis where deep learning was used. Diagnostic image classification includes classification of diagnosed images, in such setting every diagnosed exam is a sample and data size is less than that of a computer vision. Object or lesion classification usually focuses on classification of part of a medical image into two or more classes. For many of these tasks local as well as global information about lesion appearance and location is required for accurate classification.
Detection : Anatomical object localization such as organs / lesions is important pre-processing part of segmentation task. Localization of object in a image requires 3D parsing of image, several algorithms have been proposed to convert 3D space as composition of 2D orthogonal planes. There has been a long research trend in detection of lesions in a medical image using computer -aided techniques, improving the detection accuracy or decreasing detection time for humans. Interestingly, the first such system was developed in 1995 using a CNN with 4 layers to detect nodules in X-ray images.
Segmentation : The segmentation of organs and other substructures in medical images allows quantitative analysis related to shape, size and volume. The task of segmentation is typically defined as identifying set of pixels that define contour or object of interest. Segmentation of lesions combines the challenge of object detection and organ and substructure segmentation in the application of deep learning algorithms. One problem that lesion segmentation shares with object detection is class imbalance as most pixels in an image are from non-diseased class.
Registration : Sometimes referred as spatial alignment is common image analysis task in which coordinate transform is calculated from one image to another. Often this is performed in an iterative framework where a specific type of transformation is assumed and a pre trained metric is optimized. Although lesion detection and object segmentation are eyed as main use of deep learning algorithms but researchers have found that deep networks can be beneficial in getting best possible registration performance.
Other tasks in medical imaging : There are some other uses of Deep Learning in medical imaging. Content based image retrieval (CBIR) is a technique for knowledge discovery in large databases and offer similar data retrievals for case histories and understand rare disorders. Image generation and enhancement is another task that uses Deep Learning in improve image quality, normalizing images, data completion and pattern discovery. Combining Image data with reports is yet another task that seem to have a very large scale application in real world. This has led to two different fields of research (1) leveraging reports to improve images classification accuracy. (2) generating text reports from images.

Number of papers in different application areas of Deep Learning in medical imaging

Unique challenges in medical image analysis

Although most of the challenges mentioned above have not been properly tackled yet, several high profile successes of Deep Learning in medical imaging can be seen in some recent papers of Esteva et al.(2017) and Gulshan et al.(2016) in the field of dermatology and ophthalmology. Looking at the trend one can infer that Unsupervised learning is gaining popularity in this field as they allow training on unlabeled data. Many consider applying automated system in medical field may give rise to some legal question i.e. who to blame if a machine makes mistake ? But these questions are not coming to haunt us in near future and we can relax for now. Medical imaging in an unexplored area and there are a lot of researches to be conducted, hopefully Deep Learning will have a great impact on medical imaging as a whole.

some links for CNN :

The best explanation of Convolutional Neural Networks on the Internet!

https://towardsdatascience.com/convolutional-neural-networks-from-the-ground-up-c67bb41454e1