# Prerequisites before starting Mathematics for Data Science

Machine learning models need vector calculus, probability, and optimisation, like the ingredients of a delicious meal. Applied machine learning is essentially about combining these mathematical ingredients in clever ways to create useful (delicious?) models.

# SETS

Sets are one of the most fundamental concepts in mathematics. They are so fundamental that they are not defined in terms of anything else. On the contrary, other branches of mathematics are defined in terms of sets, including linear algebra. Put simply, sets are well-defined collections of objects. Such objects are called elements or members of the set. The players of a cricket team, a list of student marks in a class, and the IPL teams are all examples of sets. The captain of the cricket team, the first student in the marks list, and Mumbai Indians team are all examples of “members” or “elements” of their corresponding sets. We denote a set with an upper case italic letter as A. In the context of linear algebra, we say that a line is a set of points, and the set of all lines in the plane is a set of sets. Similarly, we can say that vectors are sets of points, and matrices sets of vectors.

# BELONGING AND INCLUSION

We build sets using the notion of belonging. We denote that a belongs (or is an element or member of) to A with the Greek letter epsilon as:

a ∈ A

Another important idea is inclusion, which allow us to build subsets. Consider sets A and B. When every element of A is an element of B, we say that A is a subset of B, or that B includes A. The notation is:

A ⊂ B or B ⊃ A

Belonging and inclusion are derived from axiom of extension: two sets are equal if and only if they have the same elements. This axiom may sound trivially obvious but is necessary to make belonging and inclusion rigorous.

# SET SPECIFICATION

In general, anything we assert about the elements of a set results in generating a subset. In other words, asserting things about sets is a way to manufacture subsets. Take as an example the set of all dogs, that I’ll denote as D. I can assert now “d is black”. Such an assertion is true for some members of the set of all dogs and false for others. Hence, such a sentence, evaluated for all member of D, generates a subset: the set of all black dogs. This is denoted as:

B={ d ∈ D : d is black }

or

B={ d ∈ D| d is black }

The colon (::) or vertical bar (||) read as “such that”. Therefore, we can read the above expression as: all elements of d in D such that d is black. And that’s how we obtain the set B from A.

Set generation, as defined before, depends on the axiom of specification: to every set A and to every condition S(x) there corresponds a set B whose elements are exactly those elements a∈A for which S(x) holds.

A condition S(x) is any sentence or assertion about elements of A. Valid sentences are either of belonging or equality. When we combine belonging and equality assertions with logic operators (not, if, and or, etc), we can build any legal set.

# ORDERED PAIRS

Pairs of sets come in two flavors: unordered and ordered. We care about pairs of sets as we need them to define a notion of relations and functions (from here I’ll denote sets with lower-case for convenience, but keep in mind we’re still talking about sets).

Consider a pair of sets x and y. An unordered pair is a set whose elements are x,y and (x,y)=(y,x). Therefore, presentation order does not matter, the set is the same.

In machine learning, we usually do care about presentation order. For this, we need to define an ordered pair (I’ll introduce this at an intuitive level, to avoid to introduce too many new concepts). An ordered pair is denoted as (x,y), with x as the first coordinate and y as the second coordinate. A valid ordered pair has the property that (x,y)≠(y,x).

# RELATIONS

From ordered pairs, we can derive the idea of relations among sets or between elements and sets. Relations can be binary, ternary or N-ary. Here we are just concerned with binary relationships. In set theory, relations are defined as sets of ordered pairs, and denoted as R. Hence, we can express the relation between x and y as:

x R y

Further, for any z∈R, there exist x and y such that z=(x,y).

From the definition of R, we can obtain the notions of domain and range. The domain is a set defined as:

dom R={x: for some y (x R y)}

This reads as: the values of x such that for at least one element of y, x has a relation with y.

The range is a set defined as:

ran R={y: for some x (x R y)}

This reads: the set formed by the values of y such that at least one element of x, x has a relation with y.

# FUNCTIONS

Consider a pair of sets X and Y. We say that a function from X to Y is a relation such that:

dom f= X and

such that for each x ∈ X there is a unique element of y ∈ Y with (x,y) ∈ f.

A function “transform” or “maps” or “sends” x onto y, and for each “argument” x there is a unique value y that f “assumes” or “takes”.

We typically denote a relation or function or transformation or mapping from X onto Y as:

f : X→Y

or

f(x) = y

The simple way to see the effect of this definition of a function is with a chart. In Fig. 1, the left-pane shows a valid function, i.e., each value f(x) maps uniquely onto one value of y. The right-pane is not a function, since each value f(x) maps onto multiple values of y. This is also called the vertical line test.