# Prerequisites before starting Mathematics for Data Science

Machine learning models need vector calculus, probability, and optimisation, like the ingredients of a delicious meal. Applied machine learning is essentially about combining these mathematical ingredients in clever ways to create useful (delicious?) models.

# SETS

Sets are one of the most fundamental concepts in mathematics. They are so fundamental that they are not defined in terms of anything else. On the contrary, other branches of mathematics are defined in terms of sets, including linear algebra. Put simply, **sets are well-defined collections of objects**. Such objects are called **elements or members** of the set. The players of a cricket team, a list of student marks in a class, and the IPL teams are all examples of sets. The captain of the cricket team, the first student in the marks list, and Mumbai Indians team are all examples of “members” or “elements” of their corresponding sets. We denote a set with an upper case italic letter as **A**. In the context of linear algebra, we say that a line is a set of points, and the set of all lines in the plane is a set of sets. Similarly, we can say that *vectors* are sets of points, and *matrices* sets of vectors.

# BELONGING AND INCLUSION

We build sets using the notion of **belonging**. We denote that *a belongs* (or is an *element* or *member* of) to ** A** with the Greek letter epsilon as:

a ∈ A

Another important idea is **inclusion**, which allow us to build *subsets*. Consider sets ** A** and

**. When every element of**

*B***is an element of**

*A***, we say that**

*B***is a**

*A**subset*of

**, or that**

*B*

*B**includes*

**. The notation is:**

*A*A ⊂ B **or** B ⊃ A

Belonging and inclusion are derived from **axiom of extension**: *two sets are equal if and only if they have the same elements*. This axiom may sound trivially obvious but is necessary to make belonging and inclusion rigorous.

# SET SPECIFICATION

In general, anything we assert about the elements of a set results in **generating a subset**. In other words, asserting things about sets is a way to manufacture subsets. Take as an example the set of all dogs, that I’ll denote as ** D**. I can assert now “

**is black”. Such an assertion is true for some members of the set of all dogs and false for others. Hence, such a sentence, evaluated for**

*d**all*member of

**, generates a subset:**

*D**the set of all black dogs*. This is denoted as:

B={ d ∈ D : d is black }

or

B={ d ∈ D| d is black }

The colon (::) or vertical bar (||) read as “such that”. Therefore, we can read the above expression as: *all elements of **d** in ***D ***such that **d** is black*. And that’s how we obtain the set **B** from** A**.

Set generation, as defined before, depends on the **axiom of specification**: *to every set ***A*** and to every condition *S(x)* there corresponds a set ***B*** whose elements are exactly those elements *a∈A* for which *S(x)* holds.*

A condition S(x) is any *sentence* or *assertion* about elements of **A**. Valid sentences are either of *belonging* or *equality*. When we combine belonging and equality assertions with logic operators (not, if, and or, etc), we can build any legal set.

# ORDERED PAIRS

Pairs of sets come in two flavors: *unordered* and *ordered*. We care about pairs of sets as we need them to define a notion of relations and functions (from here I’ll denote sets with lower-case for convenience, but keep in mind we’re still talking about sets).

Consider a pair of sets ** x** and

**. An**

*y***unordered pair**is a set whose elements are

*x,y*and (

*x,y)=(y,x).*Therefore, presentation order does not matter, the set is the same.

In machine learning, we usually do care about presentation order. For this, we need to define an **ordered pair** (I’ll introduce this at an intuitive level, to avoid to introduce too many new concepts). An **ordered pair** is denoted as ** (x,y)**, with

**as the**

*x**first coordinate*and

**as the**

*y**second coordinate*. A valid ordered pair has the property that

**(x,y)≠(y,x)**.

# RELATIONS

From ordered pairs, we can derive the idea of **relations** among sets or between elements and sets. Relations can be binary, ternary or N-ary. Here we are just concerned with binary relationships. In set theory, **relations** are defined as *sets of ordered pairs*, and denoted as **R**. Hence, we can express the relation between **x** and **y **as:

x **R **y

Further, for any z∈R, there exist x and y such that z=(x,y).

From the definition of R, we can obtain the notions of **domain** and **range**. The **domain** is a set defined as:

**dom** R={x: for some y (x R y)}

This reads as: the values of x such that for at least one element of y, x has a relation with y.

The **range** is a set defined as:

**ran** R={y: for some x (x R y)}

This reads: the set formed by the values of y such that at least one element of x, x has a relation with y.

**FUNCTIONS**

Consider a pair of sets X and Y. We say that a **function** from X to Y is a relation such that:

**dom** f= **X **and

such that for each **x ∈ X** there is a unique element of **y ∈ Y **with **(x,y) ∈ f.**

A function “*transform*” or “*maps*” or “*sends*” x onto y, and for each “*argument*” x there is a unique value y that f “*assumes*” or “*takes*”.

We typically denote a relation or function or transformation or mapping from X onto Y as:

**f : X→Y**

or

**f(x) = y**

The simple way to see the effect of this definition of a function is with a chart. In **Fig. 1**, the left-pane shows a valid function, i.e., each value f(x) *maps* uniquely onto one value of y. The right-pane is not a function, since each value f(x) *maps* onto multiple values of y. This is also called the *vertical line test.*

For f:X→Y, the *domain* of f equals to X, but the *range* does not necessarily equals to Y. Just recall that the *range* includes only the elements for which Y has a relation with X.

**The ultimate goal of machine learning is learning functions from data**, i.e., transformations or mappings from the *domain* onto the *range* of a function. This may sound simplistic, but it’s true. The *domain* X is usually a vector (or set) of *variables* or *features* mapping onto a vector of *target* values. Finally, I want to emphasise that in machine learning the words transformation and mapping are used interchangeably, but both just mean function.

This is all I’ll cover about sets and functions. My goals were just to introduce: (1) **the concept of a set**, (2) **basic set notation**, (3) **how sets are generated**, (4) **how sets allow the definition of functions**, (5) **the concept of a function**. Set theory is a monumental field, but there is no need to learn everything about sets to understand linear algebra.

In the upcoming blogs, I will cover all topics on linear algebra needed for data science.