Geek Culture
Published in

Geek Culture

Prerequisites before starting Mathematics for Data Science

Machine learning models need vector calculus, probability, and optimisation, like the ingredients of a delicious meal. Applied machine learning is essentially about combining these mathematical ingredients in clever ways to create useful (delicious?) models.


Sets are one of the most fundamental concepts in mathematics. They are so fundamental that they are not defined in terms of anything else. On the contrary, other branches of mathematics are defined in terms of sets, including linear algebra. Put simply, sets are well-defined collections of objects. Such objects are called elements or members of the set. The players of a cricket team, a list of student marks in a class, and the IPL teams are all examples of sets. The captain of the cricket team, the first student in the marks list, and Mumbai Indians team are all examples of “members” or “elements” of their corresponding sets. We denote a set with an upper case italic letter as A. In the context of linear algebra, we say that a line is a set of points, and the set of all lines in the plane is a set of sets. Similarly, we can say that vectors are sets of points, and matrices sets of vectors.


We build sets using the notion of belonging. We denote that a belongs (or is an element or member of) to A with the Greek letter epsilon as:

a ∈ A

Another important idea is inclusion, which allow us to build subsets. Consider sets A and B. When every element of A is an element of B, we say that A is a subset of B, or that B includes A. The notation is:

A ⊂ B or B ⊃ A

Belonging and inclusion are derived from axiom of extension: two sets are equal if and only if they have the same elements. This axiom may sound trivially obvious but is necessary to make belonging and inclusion rigorous.


In general, anything we assert about the elements of a set results in generating a subset. In other words, asserting things about sets is a way to manufacture subsets. Take as an example the set of all dogs, that I’ll denote as D. I can assert now “d is black”. Such an assertion is true for some members of the set of all dogs and false for others. Hence, such a sentence, evaluated for all member of D, generates a subset: the set of all black dogs. This is denoted as:

B={ d ∈ D : d is black }


B={ d ∈ D| d is black }

The colon (::) or vertical bar (||) read as “such that”. Therefore, we can read the above expression as: all elements of d in D such that d is black. And that’s how we obtain the set B from A.

Set generation, as defined before, depends on the axiom of specification: to every set A and to every condition S(x) there corresponds a set B whose elements are exactly those elements a∈A for which S(x) holds.

A condition S(x) is any sentence or assertion about elements of A. Valid sentences are either of belonging or equality. When we combine belonging and equality assertions with logic operators (not, if, and or, etc), we can build any legal set.


Pairs of sets come in two flavors: unordered and ordered. We care about pairs of sets as we need them to define a notion of relations and functions (from here I’ll denote sets with lower-case for convenience, but keep in mind we’re still talking about sets).

Consider a pair of sets x and y. An unordered pair is a set whose elements are x,y and (x,y)=(y,x). Therefore, presentation order does not matter, the set is the same.

In machine learning, we usually do care about presentation order. For this, we need to define an ordered pair (I’ll introduce this at an intuitive level, to avoid to introduce too many new concepts). An ordered pair is denoted as (x,y), with x as the first coordinate and y as the second coordinate. A valid ordered pair has the property that (x,y)≠(y,x).


From ordered pairs, we can derive the idea of relations among sets or between elements and sets. Relations can be binary, ternary or N-ary. Here we are just concerned with binary relationships. In set theory, relations are defined as sets of ordered pairs, and denoted as R. Hence, we can express the relation between x and y as:

x R y

Further, for any z∈R, there exist x and y such that z=(x,y).

From the definition of R, we can obtain the notions of domain and range. The domain is a set defined as:

dom R={x: for some y (x R y)}

This reads as: the values of x such that for at least one element of y, x has a relation with y.

The range is a set defined as:

ran R={y: for some x (x R y)}

This reads: the set formed by the values of y such that at least one element of x, x has a relation with y.


Consider a pair of sets X and Y. We say that a function from X to Y is a relation such that:

dom f= X and

such that for each x ∈ X there is a unique element of y ∈ Y with (x,y) ∈ f.

A function “transform” or “maps” or “sends” x onto y, and for each “argument” x there is a unique value y that f “assumes” or “takes”.

We typically denote a relation or function or transformation or mapping from X onto Y as:

f : X→Y


f(x) = y

The simple way to see the effect of this definition of a function is with a chart. In Fig. 1, the left-pane shows a valid function, i.e., each value f(x) maps uniquely onto one value of y. The right-pane is not a function, since each value f(x) maps onto multiple values of y. This is also called the vertical line test.

Fig . 1

For f:X→Y, the domain of f equals to X, but the range does not necessarily equals to Y. Just recall that the range includes only the elements for which Y has a relation with X.

The ultimate goal of machine learning is learning functions from data, i.e., transformations or mappings from the domain onto the range of a function. This may sound simplistic, but it’s true. The domain X is usually a vector (or set) of variables or features mapping onto a vector of target values. Finally, I want to emphasise that in machine learning the words transformation and mapping are used interchangeably, but both just mean function.

This is all I’ll cover about sets and functions. My goals were just to introduce: (1) the concept of a set, (2) basic set notation, (3) how sets are generated, (4) how sets allow the definition of functions, (5) the concept of a function. Set theory is a monumental field, but there is no need to learn everything about sets to understand linear algebra.

In the upcoming blogs, I will cover all topics on linear algebra needed for data science.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store