And how can you do it, too

There are three things I am incredibly passionate about in my life: mathematics, machine learning, and teaching those to others. One form of expression for this is writing. I love writing: I feel like I am capable of taking complex ideas and explain them simply. I have learned this skill during five years of teaching mathematics to university students.

In the past year, I have been publishing on Medium consistently, with varying success. On average, my posts attracted around 10,000 views. Some reached 50,000 after a few months, some are still stuck below 1000.

Last week, I published my 38th article on Medium. It was a massive 4000 words long reading, laying a roadmap towards understanding the mathematics behind neural networks from the ground up. …


Understanding the inner workings of neural networks from the ground-up

Knowing the mathematics behind machine learning algorithms is a superpower. If you have ever built a model for a real-life problem, you probably experienced that being familiar with the details can go a long way if you want to move beyond baseline performance. This is especially true when you want to push the boundaries of state of the art.

However, most of this knowledge is hidden behind layers of advanced mathematics. Understanding methods like stochastic gradient descent might seem difficult since it is built on top of multivariable calculus and probability theory.

With proper foundations, though, most ideas can be seen as quite natural. If you are a beginner and don’t necessarily have formal education in higher mathematics, creating a curriculum for yourself is hard. In this post, my goal is to present a roadmap, taking you from absolute zero to a deep understanding of how neural networks work. …


An introduction to weight pruning, quantization, and knowledge distillation

Modern state-of-the-art neural network architectures are HUGE. For instance, you have probably heard about GPT-3, OpenAI’s newest revolutionary NLP model, capable of writing poetry and interactive storytelling.

Well, GPT-3 has around 175 billion parameters.

To give you a perspective about how large this number is, consider the following. A $100 bill is approximately 6.14 inches wide. If you start laying down the bills right next to each other, the line will stretch 169,586 miles. For comparison, Earth’s circumference is 24,901 miles, measured along the equator. So, it would take ~6.8 round trips until we ran out of the money.

Unfortunately, as opposed to money, more is sometimes not better when it comes to the number of parameters. Sure, more parameters seem to mean better results, but also more massive costs. According to the original paper, GPT-3 required 3.14E+23 flops of training time, and the computing cost itself is in the millions of dollars. …


An introduction to knowledge distillation

If you have ever used a neural network to solve a complex problem, you know that they can be enormous in size, containing millions of parameters. For instance, the famous BERT model has about ~110 million.

To illustrate the point, this is the number of parameters for the most common architectures in NLP, as summarized in the recent State of AI Report 2020 by Nathan Benaich and Ian Hogarth.

Image for post
Image for post
The number of parameters in given architectures. Source: State of AI Report 2020 by Nathan Benaich and Ian Hogarth

In Kaggle competitions, the winner models are often ensembles, composed of several predictors. …


Image for post
Image for post
Yes, you are good enough, but you have to put in the work. Photo by Hello I'm Nik 🎞 on Unsplash

What can the musings of an old mathematician teach to a data scientist?

Ever since the inception of data science, there have been some debates that are still unsettled. If you follow the discussions in the community, you have probably seen many pieces and posts about the following questions.

Do you need formal education or hands-on experience?

Should you focus on theory or practice?

Specialize in one area or strive to be a generalist?

The answers you might find typically lie on an extreme part of the spectrum. (Especially when the conversation takes place on Twitter, which is totally unsuitable for expressing complex arguments due to its restricted format.) …


Image for post
Image for post
Photo by chuttersnap on Unsplash

How to get started with one of the highest in demand data science skills

If you enroll in an average machine learning or data science course, chances are, you are only going to hear about algorithms. Some are more practical and teach you how to use certain frameworks and train models, but the majority don’t go beyond that.

However, this is only a small part of the entire machine learning pipeline. As an engineer or data scientist, your task rarely begins and ends with method development. Rather, most time is spent with data engineering and model serving infrastructure management.

Image for post
Image for post
Structure of a machine learning system. Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, et al.

As the community of professionals soon realized this, an increasingly large effort was placed to manage machine learning operations throughout the entire life cycle. …


Start tracking your experiments and be incredibly efficient

Let’s travel back in time a few decades.

Linus Pauling, the only scientist who won two unshared Nobel Prizes and one of the greatest chemists of all time, was a well-organized person. Among other things, he was known for meticulously keeping notebooks, containing his experiments, conclusions, and ideas.

During his life’s work, he left 46 research notebooks, which is an impressive number.

Pauling was not the only scientist who did this: Charles Darwin, Alexander Graham Bell, Thomas Edison, and practically every scientist before and after their time.

Image for post
Image for post
A page from one of Alexander Graham Bell’s notebooks. Source: Library of Congress

Notebooks provide an excellent tool to help reproduce experiments, formulate hypotheses, and draw conclusions. When an experiment has several sensitive parameters (like humidity, temperature, and light conditions for a plant biologist), reproducing results are impossible without keeping track. …


Image for post
Image for post
Photo by Roman Synkevych on Unsplash

Use API marketplaces to turn your code into a business

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Times are changing rapidly for developers.

A few years ago, if you wanted to make money with code, you had two solutions. The classical low risk-low reward method was to get a 9–5 job. However, if you wanted more control, the only other way was to go self-employed and take a much higher risk.

This is not the case anymore. …


Image for post
Image for post
Photo by Radu Florin on Unsplash

How many hours have you spent debugging because of dynamic typing?

When I first started with Python, I was coming from a C background. I had no extensive software development experience back then, and the taste of freedom provided by dynamic typing was so sweet. Functions leveraging polymorphism and duck typing allowed me to do a lot with a little.

Later, as my experience grew and I became involved in large scale projects, it dawned on me that this freedom is a blessing and a curse. As contributors grow and the code is being pushed closer to production-grade, not having static typing or type checking can lead to nasty surprises.

This was a feeling shared between many in the Python ecosystem. However, keeping the freedom allowed by dynamic typing, but mitigating its negative impact is difficult. …


How to reason about matrices by looking at graphs

To study structure,
tear away all flesh so
only the bone shows.

Linear algebra. Graph theory. If you are a data scientist, you have encountered both of these fields in your study or work at some point. They are part of a standard curriculum, frequently used tools in the kit of every engineer.

What is rarely taught, however, is that they have a very close and fruitful relationship. Graphs can be used to prove strong structural results about matrices easily and beautifully.

To begin our journey, first, we shall take a look at how a matrix can be described with a graph. …

About

Tivadar Danka

I want to democratize machine learning. Co-founder @telestoAI. Math PhD with an INTJ personality.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store