Chapter 2 — Concept Learning — Part 2

8 min readFeb 1, 2020

In the earlier story, we looked into the FIND-S algorithm and also the limitations of it. Now, in this story, we shall see how can we address the limitations of FIND-S algorithm.

1. Definitions — Consistent and Version Space

We begin with a few basic definitions :

Definition — Consistent —

A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for each example (x, c(x)) in D.

Note the difference between definitions of consistent and satisfies

A training example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x is a positive or negative example of the target concept.

An example x is said to consistent with hypothesis h iff h(x) = c(x)

In the previous story, we discussed about FIND-S algorithm with an example. The set of training examples D are below.

Also we found the hypothesis h = < Sunny, Warm, ?, Strong, ?, ?>. Now for each example (x, c(x)) in D, we will evaluate h(x) equals c(x).

(<Sunny, Warm, Normal, Strong, Warm, Same>,<yes>) → h(x)=c(x)
(<Sunny, Warm, High, Strong, Warm, Same>,<yes>) → h(x)=c(x)
(<Rainy, Cold, High, Strong, Warm, Change>,<No>) → h(x)=c(x)
(<Sunny, Warm, High, Strong, Cool, Change>,<yes>) → h(x)=c(x)

Now we can say, hypothesis h is consistent with a set of training examples D

Lets say, we have a hypothesis h1 = < ?, ?, ?, Strong, ?, ?>, is this hypothesis consistent with set of training example D ?

In case of training example (3), h(x) != c(x). So hypothesis h1 is not consistent with D.

Lets say, we have a hypothesis h2 = < ?, Warm, ?, Strong, ?, ?>, is this hypothesis consistent with set of training example D ?

All the training examples hold h(x) = c(x). So hypothesis h2 is consistent with D.

Definition — Version space —

The version space, denoted VS_H,D with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D

In above example, we have two hypothesis from H and they are consistent with D.

h=< Sunny, Warm, ?, Strong, ?, ?> and h2=< ?, Warm, ?, Strong, ?, ?>

So this set of hypothesis { h, h1} is called a Version Space.

2. The List-Then-Eliminate algorithm

One obvious way to represent the version space is simply to list all of its members. This leads to a simple learning algorithm, which we might call the List-Then-Eliminate algorithm. The algorithm is as follows :

The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all hypotheses in H and then eliminates any hypothesis found inconsistent with any training example.

In principle, List-Then-Eliminate algorithm can be applied whenever the hypothesis space H is finite. However, since it requires exhaustive enumeration of all hypotheses in practice it is not feasible.

Representation for Version Spaces —

we can represent the version space in terms of its most specific and most general members.

For the above enjoysport training examples D, we can output the below list of hypothesis which are consistent with D. In other words, the below list of hypothesis is a version space.

In the list of hypothesis, there are two extremes representing general (h1 and h2) and specific (h6) hypothesis. Lets define these 2 extremes as general boundary G and specific boundary S.

Definition — G

The general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D.

Definition — S

The specific boundary S, with respect to hypothesis space H and training data D, is the set of minimally general (i.e., maximally specific) members of H consistent with D.

Version Space representation theorem —

Let X be an arbitrary set of instances and Let H be a set of Boolean-valued hypotheses defined over X. Let c: X →{O, 1} be an arbitrary target concept defined over X, and let D be an arbitrary set of training examples {(x, c(x))). For all X, H, c, and D such that S and G are well defined,

The below figure shows the version space for enjoysport concept learning including both general and specific boundary sets.

From Version Space representation theorem, there exist a hypothesis s belongs to specific boundary set S, and there exist a hypothesis g belongs to general boundary set G, a hypothesis h belongs hypothesis space H holding the relationship “g is more-general-than-or-equal-to h and h is more-general-than-or-equal-to s” forms a version space.

3. Candidate Elimination algorithm

The Candidate-Elimination algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples.
It begins by initializing the version space to the set of all hypotheses in H; that is, by initializing the G boundary set to contain the most general hypothesis in H as
G0 ← { <?, ?, ?, ?, ?, ?> }
and initializing the S boundary set to contain the most specific hypothesis as
S0 ← { <0, 0, 0, 0, 0, 0> }
These two boundary sets delimit the entire hypothesis space, because every other hypothesis in H is both more general than S0 and more specific than G0.
As each training example is considered, the S and G boundary sets are generalized and specialized, respectively, to eliminate from the version space any hypotheses found inconsistent with the new training example.
After all examples have been processed, the computed version space contains all the hypotheses consistent with these examples and only these hypotheses.

Lets look into the algorithm steps :

4. Candidate elimination algorithm with an example

Here are the training examples D

CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of all hypotheses in H.
Initializing the G boundary set to contain the most general hypothesis in H G0 = <?, ?, ?, ?, ?, ?>
Initializing the S boundary set to contain the most specific (least general) hypothesis S0 <0,0,0,0,0,0>
First training example — its a positive example and when its presented to the CANDIDATE-ELIMINTION algorithm, it checks the S boundary and finds that it is overly specific and it fails to cover the positive example. The boundary is therefore revised by moving it to the least more general hypothesis that covers this new example
No update of the G boundary is needed in response to this training example because G0 correctly covers this example

When the second training example is presented, it has a similar effect of generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 = G0

Now the third training example — its a negative example and when its presented to the CANDIDATE-ELIMINTION algorithm, it reveals that the G boundary of the version space is overly general, that is, the hypothesis in G incorrectly predicts that this new example is a positive example. The hypothesis in the G boundary must therefore be specialized until it correctly classifies this new negative example.

Given that there are six attributes that could be specified to specialize G2, why are there only three new hypotheses in G3 ?
For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization of G2 that correctly labels the new example as a negative example, but it is not included in G3.
The reason is, this hypothesis is excluded as it is inconsistent with the previously encountered positive examples.
Now the fourth training example — its a positive example and when its presented to the CANDIDATE-ELIMINTION algorithm, it further generalizes the S boundary of the version space. It also results in removing one member of the G boundary, because this member fails to cover the new positive example

After processing these four examples, the boundary sets S4 and G4 delimit the version space of all hypotheses consistent with the set of incrementally observed training examples. The entire version space, including those hypotheses bounded by S4 and G4.

This learned version space is independent of the sequence in which the training examples are presented (because in the end it contains all hypotheses consistent with the set of examples).
As further training data is encountered, the S and G boundaries will move monotonically closer to each other, delimiting a smaller and smaller version space of candidate hypotheses.

5. Remarks on Version Space and Candidate elimination algorithm

Will the Candidate elimination algorithm Converge to the Correct Hypothesis ?

The version space learned by the Candidate elimination algorithm will converge toward the hypothesis that correctly describes the target concept, provided

(1) there are no errors in the training examples, and

(2) there is some hypothesis in H that correctly describes the target concept.

What will happen if the training data contains errors ?.

Suppose, for example, that the second training example above is incorrectly presented as a negative example instead of a positive example.

Lets run the candidate elimination algorithm on this data and see the result.

After processing all the training examples, the algorithm removes the correct target concept from the version space.
The S and G boundary sets eventually converge to an empty version space if sufficient additional training data is available. Such an empty version space indicates that there is no hypothesis in H consistent with all observed training examples.

A similar symptom will appear when the training examples are correct, but the target concept cannot be described in the hypothesis representation. For example, if the target concept is a disjunction of feature attributes and the hypothesis space supports only conjunctive descriptions.

Previous — Chapter 2 — Concept Learning — Part 1
Next — Chapter 2 — Inductive bias — Part 3

References —

Machine Learning — Tom Mitchell