A brief introduction to Graph theory
First part : main concepts and complete graphs
Graph theory is a branch of data science and big data. For example, it is a powerful tool for unsupervised learning since it can be used for classification, clustering, data visualization of complex networks and many more. But first things first, let’s start with main concepts and definitions.
Definitions
First of all, what is a graph ? A graph G is an ordered pair G=(V, E) where V represents a set of Vertices and E a set of Edges. In practice, we can add W as the Weights associated to each edge so that G=(V, E, W). In this article, we will consider this last definition of a graph.
Graphs can be both directed or undirected. In undirected graphs, edges go both ways but in directed graphs, edges are directed from one vertex to another.
A complete graph is a particular graph where every vertex is linked to every other vertex (see illustration below). A complete graph is undirected.
Illustrations
Let’s illustrate this concept of graphs with a railway map :
- each train station represents a vertex
- each path from a train station to another is an edge
- the cost of the ticket (associated to each path) to go from one train station to another is the weight
The weights can represent many things. In our previous example the weight is the cost of the ticket but it could also be the distance between each train station or any other indicator.
Graphs in practice
For simplification purposes, we will now only consider complete graphs. But keep in mind that there are many types of graphs.
In practice, graphs are often represented as a Cartesian coordinate plane :
- Vertices are points of the plan with (x,y) coordinates
- Edges are vectors of the plan
- Edges’ weights are euclidean distance
Example : undirected complete graph of four vertices
This complete graph “G” has 4 vertices and 6 edges. From left to right, the vertices’ coordinates are A (0,0), B (2,2), C (2,5), D (4,0).
If we consider that the weights associated to each edge is the euclidean distance (rounded to plain integer), we can now fully represent this graph as :
G = ((A,B,2), (A,C,5), (A,D,4), (B,C,3), (B,D,2), (C,D,5))
Each element of the list above is an edge. You can read its first element as :
“There is an edge (both ways) between vertices A and B and the associated weight to this edge is 2.”
Note : in the above diagram, the x-axis and the y-axis do not have the same scales. That’s why the distance between A and D looks bigger than the distance between A and C (or C and D) but in fact is not.
Now that you got the main concepts about graphs we will see how to implement and visualize them in python in my next articles.
About me : french student in data sciences, always up to new challenges. Feel free to add me on LinkedIn, Twitter or Medium.