5 questions to the core developer
A short interview with the core DS engineer of AMAI
I have worked with a large number of core developers in products of various scales: most often they don’t perform at conferences, don’t take online courses, don’t write thousands of publications and don’t spend the whole working day in telegram chats. But the main conclusion is that the closer you are to the development of a core product, the less you are exposed to the illusions of your field.
The following is a short interview with AMAI core developer Andrey, who joined the company as a junior without any experience and in 2 years has developed a model for a TTS engine with incredibly realistic voice quality in one of the most technologically complex AI areas.
What background do you have?
I studied mathematics and got a master degree in algebraic methods of information security. During the study, there were courses on machine learning methods, but I didn’t go to classes, I just showed the teacher what I was doing. I started studying neural networks as a hobby in the 4th year of my bachelor’s degree, at the same time I wrote a senior thesis on this topic. I have never participated in conferences, at the very beginning of my education I took three weeks of Vorontsov’s course, but then they offered to solve the integral there, and I gave up: I had been solving them for a whole year before, so now I have an injury.
How much does the mathematical background help you write neural networks?
And, ironically, various theorems from classical statistics do not work. I don’t even remember appropriate reference of type: “Let’s allow the proposition, we’ll get some result of string theory, and that’s probably why convolutional neural networks work.” The universal approximation theorem states that a single-layer network can approximate any continuous function. Where are we going with this? Nothing.
GAN algorithms refer to the “Nash equilibrium”, but no one knows how to achieve it. Publications that reduce “Gradient descent” or neural networks to something biological also look like fitting the task to an answer.
In this sense, I think if Strong AI is made in the next decade, we still won’t have an understanding of how it works. Just like image classifiers are developed now, but without much interpretation. Last year has taken a big step forward, but this understanding is not of required level.
What is important not to miss when starting work on a DS project?
So, first off, is the collection of datasets, the search for some artifacts. Then you need to generate all the necessary metrics lest there be any doubt “I wonder if the previous model was better or worse?”. To do this, you can, for example, run the dataset through the model and look at the most problematic samples. If we talk about the progress of the model over a long period, then this is a broad issue: in terms of TTS, these are subjective metrics through evaluation by people, which is costly, so other machine learning models are easier.
Which machine learning models are suitable specifically for TTS?
Feed forward, Convolutional, LSTM, GRU, Transformer. All other models, apparently, don’t have any special advantages in the tasks that I did.
What kind of experience is important for a DS specialist?
The experience of inventing your own models for solving tasks, original solutions and combinations of known things. It helps me to communicate in my line of work, to search for information. And reading publications. I read about two publications a week, sometimes more. There are many more abstracts and blogs. From recent publications, I liked these: