Machine learning statistics and calculation of love

Lei feng’s network (search for “Lei feng’s network” public attention): transfer from statistics in this article are. Original from Zhang assured the teachers in the ninth China r conferences and sorted out in two lectures at Shanghai Jiaotong University. Zhang assured the teacher is Professor of computer science and engineering, Shanghai Jiao Tong University, part-time Professor of Shanghai Jiao Tong University data centre for scientific research, double the discipline of computer science and technology, and the statistics of doctoral students guidance tutor. Prior to joining the Shanghai Jiaotong University, Professor, school of computer science, Zhejiang University and Zhejiang University Science Center Adjunct Professor of statistics. Teachers mainly engaged in artificial intelligence, machine learning and applied statistics teaching and research in the field so far in international academic journals and published more than 70 papers on important computer science Conference, is the United States, “mathematical reviews” guest commentator, flagship publication International machine learning Executive Editorial Board of the Journal of Machine Learning Research. The open class of the introduction to machine learning and statistical machine learning, attracted widespread attention.

Recent strong rise of artificial intelligence and machine learning, particularly this past AlphaGo and Korea player Li Shishi section of the man-machine war, once again we appreciate the vast potential of artificial intelligence and machine learning technologies, but also deeply touched me. Faced with this unprecedented technological revolution, as more than 10 years has been engaged in statistical machine learning, teaching and research scholar, would like to take this opportunity to share with you some of my personal thoughts and reflections.

My speech will mainly consist of two parts, in the first part, begin with very contained in machine learning, in particular disciplines, such as statistics, computer science, operations research and optimization links, as well as its relations with industry, entrepreneurship community complement each other. In the second part, try to use “multiple” and “Adaptive” and “average” concepts such as numerous and colorful simplicity machine learning model and calculation method of some research ideas or thoughts behind.

Part one: review and reflection

1, what is the machine learning

Needless to say, big data, and AI is the most fashionable word, they will bring profound changes in our lives in the future. Data is the fuel, smart is the goal, while machine learning is a rocket, leading to the smart way. Machine learning gurus Mike Jordan and Tom Mitchell think machine learning is the intersection of computer science and statistics, and is at the core of artificial intelligence and data science.

“It is one of today’s rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science” — M. I. Jordan

Informally, from data mining machine learning is useful value. The data itself is dead, it does not automatically give a useful information. How to find out the value of things? The first step is to an abstract representation for data, based on modeling and estimation of model parameters, which is calculated, in order to tackle the problem of massive data, we need to design some efficient measures.

I interpret this process to matrix + statistics + optimization + machine learning algorithms. First of all, when is defined as an abstract data representation, tend to form a matrix, or a figure, and in fact also can be understood as the matrix. Statistics is the primary tool for modeling and ways, and most of the model is defined as a problem, in particular, the frequency of statistical method is actually a optimization problem. Of course, the random sampling methods for Bayesian model of computation involved. But when it comes to big data problems specific to, need some efficient way, in computer science, algorithms and data structures in a number of good tips can help us solve this problem.

Learn from Marr’s three levels on the definition of computer vision, machine learning too, I was divided into three levels: elementary, intermediate and advanced. Primary stage is the data capture and feature extraction. Intermediate stage is data processing and analysis, it and contains three a aspects, first is application problem oriented, simple to said, it main application has some model and method solution some actual problem, we can understanding for data mining; second, according to application problem of need, proposed and development model, and method and algorithm and research support they of mathematics principle or theory based,, I understanding this is machine learning subject of core content. Third, by reasoning to reach some kind of intelligence. Finally, the high stage is intelligent and cognitive, namely achieving smart goals. From here, we can see that data mining and machine learning is essentially the same, the difference is more grounded in data mining database, while machine learning is closer to the intelligent side.

2, statistics and computing

Machines usually have the computing power and intuition to solve the problem, and statisticians is longer than the theoretical analysis, strong modeling capability and, therefore, there is good complementarity.

Boosting, SVM is a statistical machine learning and sparse learning community, nearly 20 years or in the last ten years, the most active of the direction, now it’s hard to say who is doing more of them. For example, SVM’s theory is in fact very early by Vapnik and came up, but the computer industry has invented an effective algorithm, and then has a lot of code have been open source for everyone to use, and SVM classification algorithm becomes a benchmark model. For example, KPCA is composed of computer scientists presented a nonlinear dimensionality reduction method, which is equivalent to the classical MDS. While the latter in the statistical community is very early, but if there are no newly discovered computers, some good things may be buried.

Machine learning has now become a mainstream direction of statistics, many famous statistics at recruited Dr as a teacher in the field of machine learning. Calculation in the statistics has become more and more important, traditional multivariate statistical analysis is matrix-calculation tools, modern high-dimensional statistics is to optimize the calculation tool. The other hand, the computer science advanced statistical courses, core courses such as statistics “experience”.

We’ll look at what kind of machine learning in computer science. Recently this has not been published in the book “the Foundation of Data Science, by Avrim Blum, John Hopcroft, and Ravindran Kannan,” co-author John Hopcroft Turing Award winner. In the front part of this book, refers to the development of computer science can be divided into three stages: early, middle and modern. Early is to make computers up and running, its focus lies in the development of programming languages, compiler theory, operating systems, as well as research supporting their mathematical theories. Medium-term is for computers to become useful, handy. Focus on algorithms and data structures. The third phase is to let computers have a wide range of applications, the development emphasis from discrete mathematics to probability and statistics. That we have seen, the third stage is actually of interest to machine learning.

Now the computer industry called machine learning “universal subjects”, it’s ubiquitous. On one hand, machine learning has its own disciplinary system; it also on the other hand, there are two important features of radiation. One is for applied science provide ways and means to solve the problem. Popular point that, for an applied science, machine learning is difficult translate mathematics into some pseudo code that allows engineers to write programs. Second, for some traditional disciplines, such as statistics, theoretical computer science, operations research optimization finds new research questions.

3, machine learning and development implications

History tells us of machine learning: development of a discipline requires a pragmatic approach. Trendy concepts to the popularity of the subject and name has a certain role, but discipline is the research question, methods, techniques and support base, and generate value for society.

Machine learning is a cool name, simply follow literally, its purpose is to enable the machine to like people with the ability to learn. But in front we by see of, in its 10 years of gold development period, machine learning territories and no too much to hype “intelligent”, but more to concern Yu introduced statistics, to established subject of theory based, oriented data analysis and processing, to no supervision learning and has supervision learning for two big main of research problem, proposed and development has series model, and method and calculation algorithm,, effective to solution industry by faced of some actual problem. In recent years, owing to data-driven and dramatically increase of computing power, a number of infrastructure-oriented machine learning has been developed, advanced neural network of strong rise to industry brought a profound change and opportunity.

Machine learning also illustrate the importance and necessity of interdisciplinary. To know each other, however this is not a simple noun or concept can be, is the need for real melted through. Professor Mike Jordan is the leading computer scientist and leading Statistician, so he is able to bear the burden of establishing statistical machine learning. And he’s very pragmatic, not to mention devoid of concepts and frameworks. The way he follows from the bottom up, that is, models, methods, algorithms and other specific issues, then step by step systematic. Professor Geoffrey Hinton is the world’s most famous cognitive psychologist and computer scientist. Although his early achievements, reputation for excellence in the academic world, but he has been active in the front line, write your own code. Many of the ideas he is simple, practical and very effective, so is known as a great thinker. Thanks to his wise and the practice, advanced learning technology ushered in a revolutionary breakthrough. Cath Kidston galaxy note 3 cover

Machine learning is also compatible and accept the discipline. We can say that the machine learning from academia, industry, business community (or race), work together to create. Academia is the engine industry is driven is the vitality and future of the industry. Academia and industry should have their respective responsibilities and Division of labour. Academic responsibility is to build and develop discipline of machine learning, training the specialists in the field of machine learning and large projects, projects should be more market-driven, from industry to implement and complete.

Part II: a few simple ideas

In this section, my attention returned to the study of machine learning. Machine learning with substantial content, and new methods and new technologies are continually being made, were found. Here, I try to “tunnel” and “Adaptive” and “average” concepts such as numerous and colorful simplicity machine learning model and calculation method for some of the ideas and thinking behind. I hope all of you understand some machine learning models, methods and inspirations for future research.

1. multistage (Hierarchical)

First of all, let us concerns “tunnel” the idea. Specifically, we look at three examples.

The first example is the underlying data model, it is a multilevel model. As an extension of probabilistic graphical models, implicit data model is one of the most important methods of multivariate data analysis. Latent variable has three important characteristics. First, it can be replaced by independent relevance than the weak correlation between strong independent border. Famous de Finetti theorems support this. The theorem says, a set of interchangeable random variables if and only if one of the parameters under the given conditions, they can be expressed as a set of conditions that mixture of random variables. This gives a set of interchangeable random variables a multi-level representation. From a distribution of a parameter, then this parameter, independent random variables drawn from a distribution group. Second, through the introduction of technology to simplify the calculation of latent variable, such as the expectation-maximization algorithm and more generalized data expansion technology that is based on this idea. Specifically, some complex, such as t-distribution, Laplace distribution can be represented by a Gaussian scale mixture to simplify calculations. Third, itself may have some latent variable to explain the physical meaning, this application meets the scene. For example, latent Dirichlet allocation (LDA) model, implicit variable has a certain theme.

The first example is the underlying data model, it is a multilevel model. As an extension of probabilistic graphical models, implicit data model is one of the most important methods of multivariate data analysis. Latent variable has three important characteristics. First, it can be replaced by independent relevance than the weak correlation between strong independent border. Famous de Finetti theorems support this. The theorem says, a set of interchangeable random variables if and only if one of the parameters under the given conditions, they can be expressed as a set of conditions that mixture of random variables. This gives a set of interchangeable random variables a multi-level representation. From a distribution of a parameter, then this parameter, independent random variables drawn from a distribution group. Second, through the introduction of technology to simplify the calculation of latent variable, such as the expectation-maximization algorithm and more generalized data expansion technology that is based on this idea. Specifically, some complex, such as t-distribution, Laplace distribution can be represented by a Gaussian scale mixture to simplify calculations. Third, itself may have some latent variable to explain the physical meaning, this application meets the scene. For example, latent Dirichlet allocation (LDA) model, implicit variable has a certain theme.

Laten Dirichlet Allocation

The second example, we’ll look at multi-level Bayesian models. At the time of MCMC sampling estimation, hyper-parameter always needs to be human given at the top of, naturally, convergence of MCMC algorithm is dependent on these given the hyper-parameters, if the selection of these parameters do not have a good experience, then a possible we add a layer, layer, the more hyper-parameter dependence will decline.

Hierarchical Bayesian Model

The third example, deep learning is contained in multiple ideas. If all nodes of all the flat, then all connections, is a fully connected graph. Depth of the CNN network you can connect as a structure of regularization. Regularization theory is a very central idea of statistical learning. CNN and RNN is the depth of two neural network models, respectively, are mainly used in image processing and natural language processing. Studies have shown that hierarchical structure has a stronger ability to learn.

Deep Learning

2. adaptation (Adaptive)

We adapt to the technical ideas, we went through several examples of this role. Cath Kidston note 3 cases

First example is adaptive importance sampling techniques. Importance sampling method can often improve the performance of sampling, while the Adaptive important sampling performance further improved.

The second example, Adaptive selection. Given a matrix a, we want to select some columns of a matrix c, and then use CC^+A to approximate the original matrix a, and hope that the approximation error as small as possible. This is an NP hard problem. In fact, through an adaptive approach, produced very small part of the C_1, thus constructing a residual, by this definition of a probability distribution, then with probability and then pick the part of C_2, composition c C_1 and C_2 together.

A third example, is adaptive random iterated algorithm. Consider a regularization of the empirical risk minimization problem, when training data very much, batch calculations can be very time-consuming, so usually in a random way. Stochastic gradient or random dual gradient algorithm can get an unbiased estimate of the parameters. Through adaptive technology can reduce the variance of the estimate.

The fourth example, is Boosting method. Adaptive adjustment weights of each sample, specifically, the improvement the wrong sample of weight, reduce the sample weights.

3. the average (Averaging)

In fact, boosting contains on average, that is, I want to discuss the technical ideas. Simply put, boosting is integrated with a set of weak classifiers, forming a strong classifier. The first benefit is that you can reduce the risk of fitting. Second, you can reduce the risk of falling into local. Third, you can extend the hypothesis space. Bagging is also a classical ensemble learning algorithm, and it divided into several groups of training data and training model in the small data sets, respectively, through a combination of these models to the strong classifiers. This is a two-level integrated learning.

Classic Anderson acceleration technology is the average so as to accelerate convergence. Specifically, it is a process of stacking, this stacking process obtained by solving a residual error least-weighted portfolio. The benefits of this technology, it is not too much calculation, often can also make the numerical iteration becomes more stable.

Another example of using the average is distributed computing. Many cases of distributed computing are not synchronized, asynchronous, asynchronous to do the time? The simplest are separate, and at some point the average of all results, which was distributed to each worker, and then run independently, and so on. It’s like a warm boot process.

As we have seen, these ideas are usually grouped together, such as boosting model. Adaptive multistage, peace of our thinking is straightforward, but it is also very useful.

### Like this:

Like Loading…

*Originally published at **cutedisney.wordpress.com** on August 5, 2016.*