Co-founder of https://www.critiq.tech. ML, CS, & math enthusiast. https://www.linkedin.com/in/andre-ye-501746150/

Here’s a function: *f*(*x*). It’s expensive to calculate, not necessarily an analytic expression, and you don’t know its derivative.

Your task: find the global minima.

This is, for sure, a difficult task, one more difficult than other optimization problems within machine learning. Gradient descent, for one, has access to a function’s derivatives and takes advantage of mathematical shortcuts for faster expression evaluation.

Alternatively, in some optimization scenarios the function is cheap to evaluate. If we can get hundreds of results for variants of an input *x* in a few seconds, a simple grid search can be employed with good results.

Or, an entire host of non-conventional non-gradient optimization methods can be used, like particle swarming or simulated annealing. …

Neural networks are notoriously bad at handling tabular data: it seems that their spotlight has primarily been on specialized forms of data like images, text, and sound. The convolutional neural network is one such example.

By converting non-image data, or even sequential data, into an *image*, convolutional neural networks can utilize their special properties of being computationally efficient and locally focused. Furthermore, it is able to leverage the unique insights and nonlinearities of unsupervised learning.

Study into convolutional neural networks (CNNs) has risen dramatically in recent years as the computing power and data is being increasingly made available. CNNs, however, are exclusively used for *image data* — and why not? …

Take a moment to locate the nearest major city around you.

If you were to travel there now, which mode of transportation would you use? You may take a car, a bus, or a train. Perhaps you’ll ride a bike, or even purchase an airplane ticket.

Regardless of how you get to the location, you incorporate *probabilit*y into the decision-making process. Perhaps there’s a 70 percent chance of rain or a car crash, which can cause traffic jams. If your bike tire is old, it may pop - this is certainly a large probabilistic factor.

On the other hand, there are *deterministic* costs — for instance, the cost of gas or an airplane ticket — as well as deterministic rewards — like substantially faster travel times taking an airplane (depending on the location) or perhaps a free Uber/Lyft ride. …

Deep learning has a big problem: it needs to gobble up massive quantities of data before it can generalize reasonably well and become useful. This is in fact one of the limitations of deep learning that restrict its applications in many fields where data is either not abundant or difficult to obtain.

Contrast this with humans — who, while being portrayed on the losing side of a human versus machine intelligence battle — can learn complex concepts with only a few training examples. A baby who has no idea of what a cat or a dog is can learn to classify them after seeing a few images of a cat and a few images of a dog. …

Deep neural networks have a big problem — they’re constantly hungry for data. When there is too little data — an amount that would be acceptable for other algorithms — deep neural networks have tremendous difficulty generalizing. This phenomenon highlights a gap between human and machine cognition; humans can learn complex patterns with few training examples (albeit at a slower rate).

While research in self-supervised learning is growing to develop structures in which labels are completely unnecessary (labels are cleverly found in the training data itself), its use cases are restricted.

Semi-supervised learning, another quickly growing field, utilizes latent variables learned through unsupervised training to boost the performance of supervised learning. This is an important concept, but its scope is limited to use cases where there is a relatively large unsupervised-to-supervised data ratio, and where the unlabeled data is compatible with labeled data. …

Kaggle is the data scientist’s go-to place for datasets, discussions, and perhaps most famously, competitions with prizes of tens of thousands of dollars to build the best model.

With all the flurried research and hype around deep learning, one would expect neural network solutions to dominate the leaderboards. It turns out, however, that neural networks — while indeed very powerful algorithms — have very limited applications, being useful really only in image recognition, language modelling, and occasionally sequence prediction.

Instead, top winners of Kaggle competitions routinely use *gradient boosting*. …

Here is a set of one-dimensional data: your task is to find a way to perfectly separate the data into two classes with one line.

What happens to a human mind when it is hallucinating?

Most narcotics that produce hallucinations — experiences in which you see, hear, feel, or smell something that does not exist — do so by shooting the brain with ‘pleasure chemicals’. Hallucinogenic drugs like PCP and LSD overwhelm serotonin receptors and lead to distorted visual perception of shapes & colors, as well as altered senses (sounds, smells, tastes).

With similar logic, one can simulate hallucinations — referred to as ‘deep dreams’ — in neural networks, by ‘overstimulating’ it. The results are often dreamlike, vibrant, psychedelic images:

Convolutional neural networks — CNNs — form the basis for image recognition, which is undoubtedly one of the most important applications of deep learning. Unfortunately, much of research in deep learning is done in the ‘perfect-world’ constraints of datasets in pursuit of a few percentage points in accuracy. Thus, we’ve developed architectures that work tremendously well in theoretical tests but not necessarily so in the real world.

Adversarial examples or inputs (think *adversary*: *enemy*) are indistinguishable from regular images to the human eye, but can completely fool a variety of image recognition architectures. …

Neural networks *are* fundamentally supervised — they take in a set of inputs, perform a series of complex matrix operations, and return a set of outputs. As the world is increasingly populated with unsupervised data, simple and standard unsupervised algorithms can no longer suffice. We need to somehow apply the deep power of neural networks to unsupervised data.

Luckily, creative applications of self-supervised learning — artificially creating labels from data that is unsupervised by nature, like tilting an image and training a network to determine the degree of rotation — have been a huge part in the application of unsupervised deep learning. …