Derivation of mathematical definition of convexity

Sarv Parteek Singh
4 min readMay 13, 2020

The formal definition of a convex function seems discombobulating at a first glance, and the intuition behind it is often lost behind the formalism in most standard texts. Here, we will work our way towards the formal definition of a convex function starting with the basic definition involving no mathematical notation.

Basic definition
“A real-valued function defined on an n-dimensional interval is called convex (or convex downward or concave upward) if the line segment between any two points on the graph of the function lies above or on the graph” [1].
Let us examine this definition for a function f(x) whose domain is the set of all real numbers (x ∈ ℝ). The definition implies that the green line segment in Fig 1 is always above or on the black curve representing the function f(x) between the end points of the green line segment.

Fig 1: Classical representation of a convex function. For all points x* between x₁ and x₂, the black curve lies on or below the green line segment.

Mathematically, if we can prove that any point x* between x₁ and x₂ (x* ∈ [x₁, x₂]) will always result in f(x*) h(x*), where h(x) defines the line joining f(x₁) and f(x₂), then we have proved that the function is convex.

Interlude: Describing points on a line
Before we can proceed with the mathematical representation of the definition of convexity, we need to understand how to describe points on a line.

Fig 2: Description of an intermediate point between 2 points on the real axis

Let us consider two points A and B on the real axis (Fig 2). Let their distance from the origin O be x₁ and x₂ respectively. Now, consider a point C in between A and B, which divides the line segment AB in the ratio 1-λ: λ, λ ∈ [0,1]. Let the length of AB be l.
Now,
x* = x₁+ (1-λ)l
⇒ x* = x₁+ (1-λ)(x₂-x₁)
⇒ x*
= λx₁+ (1-λ)x₂ … (1)

Back to convexity
Now that we are equipped with the ability to describe position of points on a line segment in terms of the positions of its end points, let us get to Fig 3, which is an annotated version of Fig 1.

Fig 3: Classical convex function. For convexity, value at S ⩽ value at R

Let us assume that point C divides the line segment AB in the ratio 1-λ: λ for λ ∈ [0,1] . Thus, eq. (1) holds here.
Now, recall that to arrive at a generic mathematical definition of convex functions, we want to use f(x*) h(x*) . To do this, we need to find the values of these terms.

f(x*): If we extend a vertical line upwards from C, it intersects the function f(x) at S and line segment PQ at R. So, the value of f(x) at S is f(x*). As x* is given by (1), we get
f(x*) = f( λx₁+ (1-λ)x₂) …(2)

h(x*): The point R divides the line segment PQ in the same ratio as point C divides the line segment AB. In order to see this, let us draw a line segment PS through P, parallel and equal to AB (Fig 4). Let CR intersect PS at point T. Then, T divides PS in the ratio 1-λ: λ.

Fig 4: CR divides AB and PQ in the ratio 1-λ:λ

In triangles SPQ and TPR,
∠SPQ =∠TPR
∠QSP =∠RTP = π/2 rad
Therefore, by AA similarity, ΔSPQ ~ ΔTPR.
As ratios of sides of similar triangles are equal, we get

(AC = PT and CB = TS, as ACTP and CBST are rectangles)
Thus, R divides PQ in the same ratio as C divides AB i.e. 1-λ: λ.
Using (1),
h(x*) = λ f(x₁) + (1-λ) f(x₂) …(3)

Using (2), (3) and f(x*) h(x*), the definition of convexity turns into
f( λx₁+ (1-λ)x₂) λ f(x₁) + (1-λ) f(x₂)

For higher dimensions, this definition is very similar, and is formally given as [1]:
Let X be a convex set in a vector space and let f: X →ℝ be a function.
f is called convex if:
∀x₁, x₂ ∈ X, ∀ λ ∈ [0, 1]: f(λ x₁ + (1-λ) x₂) ⩽ λ f(x₁) + (1-λ)f(x₂)
f
is called strictly convex if:
∀x₁ ≠ x₂ ∈ X, ∀ λ ∈ [0, 1]: f(λ x₁ + (1-λ )x₂) <λ f(x₁) + (1-λ)f(x₂)
A function f is said to be (strictly) concave if -f is (strictly) convex.

References
[1] https://en.wikipedia.org/wiki/Convex_function

--

--