Incremental Learning with Support Vector Machines (ISVM)
Support Vector Machines (SVMs) are a popular tool for learning with large amounts of high dimensional data. However, learning incrementally from previous SVM results may sometimes be preferable, as computing an SVM is very costly in terms of time and memory consumption or because the SVM may be used in an online learning setting. This paper presents an approach for incremental learning with Support Vector Machines that improves existing approaches. Empirical evidence proves that this approach can effectively deal with changes in the target concept resulting from the incremental learning setting.
I continue to develop a subject of Online Incremental Learning as it was mentioned in my article about “An Introduction to Online Machine Learning.” The ability to incrementally learn from batches of data is an important feature that makes a learning algorithm more applicable to real-world problems. Incremental learning may be used to keep memory and time consumption of the learning algorithm at a manageable level or because one needs to make predictions at a time when the whole data is not yet available (online setting). The most important question of incremental learning is, whether the target concept may change between the learning steps or is assumed to be constant. The first case is called concept drift, the second case is true incremental learning. The practical difference between both kinds of learning is, that in the concept drift setting, old examples may be misleading, as they are examples of an old target concept, that may be quite different from the concept one is trying to learn. In the case of true incremental learning, all examples contain the same information about the target concept. As a consequence, one can judge the performance of an incremental learning algorithm simply by comparing its results to the results of the learning algorithm trained all data simultaneously as the gold standard. This article will deal with incremental learning.
Support Vector Machines have been successfully used for learning with large and high dimensional data sets. This is because the generalization property of an SVM does not depend on all the training data but only a subset thereof, the so-called Support Vectors. Unfortunately, the training of SVMs itself can very time to be consuming, especially when dealing with noisy data. As the number of Support Vectors typically is very small compared to the number of training examples, SVMs promises to be an effective tool for incremental learning by compressing the data of the previous batches to their Support Vectors. This approach to incremental learning with Support Vector Machines has been investigated and it has been shown that incrementally trained SVMs to compare very well to their non-incrementally trained equivalent.
The problem of drifting concepts in incremental Support Vector Machine learning has been addressed and it has been experimentally validated that SVMs handle drifting concepts well with respect to the criteria of stability of the result during the learning steps, improvement of the prediction accuracy during the progress of the training and recoverability from errors resulting of the drifting concepts. Another approach to the handling of drifting concepts performance estimator has been used to detect whether a drift in the underlying concept did occur, at which point the old data was being discarded and training took place only on the new data.
Support Vector Machines
Support Vector Machines (SVMs) are based on the work of Vladimir Vapnik in statistical learning theory. Statistical learning theory deals with the question, how a function from a class of functions can be found, that minimizes the expected risk
for a loss function L, when the distribution of the examples P(x) and their classifications P(y|x) is unknown and has to be estimated from finitely many examples.
The SVM algorithm solves this problem by minimizing the regularized risk, which is a weighted sum of the empirical risk concerning the data and a complexity term ||w||²
In their basic formulation, SVMs find a linear decision function y = f(x) = sign(w*x+b) that both minimizes the prediction error on the training set and promises the best generalization minimizes the prediction error on the training set and promises the best generalization performance. Given the examples (x1, y1), … ,(xn, yn) this is done by solving the following optimization problem:
subject to
The hyperplane vector w has a representation in terms of the training examples (xi, Yi) and their lagrangian multipliers (alpha_i), that is calculated during the optimization process:
The optimal constant C for the learning problem is usually determined by some model selection technique, e.g., cross-validation.
New Incremental Learning Algorithm
As stated above, the SV-incremental algorithm suffers because the Support Vectors do not describe the whole data set but only the decision function induced by it. To make up for this problem in the incremental learning algorithm, one must make an error on the old Support Vectors (which represent the old learning set) more costly than an error on a new example. Fortunately, this can easily be accomplished in the Support Vector algorithm. Let (xi, Yi) be the old Support Vectors and (x’i, y’i) be the new examples. Then
This modification of the SVM problem can be viewed as training the SVM concerning a new loss function. A natural choice for L is to let L be the number of examples in the previous batch divided by the number of Support Vectors.
This comes from approximating the average error of an arbitrary decision function overall examples by the average error over just the Support Vectors. In other words: Every Support Vector represents a constant fraction of all examples. This algorithm will be called the SV-L-incremental algorithm.
– — –
Join my thriving community of 25,000+ subscribers in The AI Vanguard newsletter! 🔥 If you’re developing an AI or data-driven product/service, seize the opportunity to sponsor an upcoming issue and showcase your business. 🚀 Reach out to dany.butvinik@gmail.com for sponsorship details.
– — –
Reference
[1] Incremental Learning with Support Vector Machines. Stefan Ruping (2002)