In Mathematics Words Matter

8 min readJan 9, 2023

T: And how did you get involved in education research? What were the reasons in the first place? Because as I can see you are way too busy with other scientific work.
H: When you have four kids that have trouble learning physics you wonder about what the problem is, and then you see that most students have similar difficulties. Well, I got seriously interested in investigating the problem of learning physics in 1976. I was a full professor at that time and I had finished my decade of teaching all graduate courses exclusively, so I started my turn to teach introductory physics courses. And I began lengthy talks about it with an experienced colleague — actually the chair who hired me in the first place. He was one of the most dedicated teachers I have ever known.
T: And his name?
H: His name was Richard Stoner, and his office right next to mine. He was so excited about teaching that he would regale me about it almost daily. He showed me all his examinations and data on student performance. This data was unique, because he believed that there was too much emphasis on quantitative problem solving in the usual physics course. So he designed very interesting qualitative questions for students to answer with qualitative arguments. He was frustrated because he could not write an examination on which the class average was better than 40%. So he would come to me and talk about the mistakes students made. I got very curious. I thought there must be something systematic going on.

The above is from an interview with mathematical physicist David Hestenes in the Eurasia Journal of Mathematics, Science, and Technology Education, a very interesting interview.

Professor Hestenes eventually took on two grad students in physics education, one a very seasoned high school teacher with a substantial amount of very high-quality data. The result was their formulation of their Model Theory of Cognition, based on their research and modern cognitive science. Their results are very much consistent with other theories, such as that put forth by cognitive linguist George Lakoff and cognitive psychologist Raphael Nunez in their Where Mathematics Comes From: How the Embodied Mind Brings Mathematics Into Being.

At any rate, we navigate and learn about our world by constructing mental models of that world, and we have evolved the ability to represent those mental models with symbolic language, transforming them into conceptual models, which enables us to communicate unambiguously between minds. But sometimes, even in Professor Hestenes’ work, I find these extremely poor word choices and symbolic errors, or what I would term error anyway. Perhaps this is just a manifestation of Gilbert Simondon’s Process of Individuation, his problematic, where potentials wanting to be actualized exist in tension; or perhaps it is a manifestation of Open-Ended Intelligence, which depends intimately on Simondon’s philosophy.

Two examples I will briefly explore: the term “differential” as used in Multivariable Calculus is inconsistent with Scalar Calculus, a special case of Multivariable Calculus; the term “disc” from topology becomes problematic under attempts to import it to Analytic Geometry and Calculus.

In standard textbooks on Multivariable Calculus, they define the Differential by (we’re working with vectors here):

Defiinition 3.4 (Differential) Let f : U ⊆ R^n → R^m, U open. Fix x ∈ U. Suppose that there is a linear transformation (f_x)’, also R^n → R^m, such that
f(x + h) = f(x) + (f_x)’(h) + r(h) (3.1)
where
lim_(h→0) r(h)/|h| = 0. (3.2)
Then (f_x)’is called the differential of f at x. We say that f is differentiable at x.

Okay, now apply this to scalar calculus, scalar calculus being a special case of vector calculus. We can define a linear transformation, R → R, by:

(f_x)’(h) = f′(x)h;

= f′(x)dx;

= dy.

And it’s really simple to show that:

lim_(h → 0) r(h)/h = lim_(h → 0) (f(x + h) − f(x) − f′(x)h)/h;

= lim_(h → 0) (f(x + h) − f(x))/h − lim_(h → 0) (f′(x)h)/h;

= f′(x) − f′(x);

= 0;

as it should.

But this tells us quite clearly that what is called the “differential” should actually be called the “differential generator.” Because it is clearly NOT the differential. In calculus the differential is a geometric object.

Consider the derivative, defined by

f′(x) = dy/dx.

In this simple equality, dy stands for “differential in y” and dx for “differential in x.” It is a straightforward process to conceptually show that this is a standard fraction which describes the slope of the line tangent to f(x) at a point (x,y). Since this is a standard fraction, dy = f′(x)dx and this relates the differential to the integral, the definite integral simply telling us that the area under a curve f′(x) is the sum of all of the dy’s. That’s the Fundamental Theorem of Calculus.

These concepts go all of the way back to Gottfried Leibniz and includes the idea of an infinitesimal. Infinitesimals were not put on a solid foundation until the 1960’s, when Robinson established his non-standard analysis, but it is best to use infinitesimals to understand what is happening here. In his 1940 classic, Differential and Integral Calculus, Ross Middlemiss inserts Chapter 13, The Differential, in between his unit on derivation and his unit on integration; he utilizes the concept of the infinitesimal. He defines an infinitesimal as any variable quantity which is approaching zero as a limit. He then discusses the relative order of infinitesimals. Let α and β be infinitesimals, then:

If lim α/β = 0, α is of higher order than β;

If lim α/β = k ≠ 0, α is of the same order as β;

If lim α/β = ∞, α is of lower order than β.

These apply to the differential of a function because inherent to the definition of the derivative are the infinitesimals, △x and △y; in the definition of the derivative

f′(x) = lim_(h → 0) (f(x + h) − f(x))/h

both are approaching zero as a limit. Here, h = △x.

To be completely clear, let y = f(x) be differentiable on (a,b) and let P =(x,y), Q = (x + △x,y + △y) be two points on the curve within (a,b). Let f′(x) describe the tangent line to the curve at the point P, let L = (x + △x,y), and let T be the point where f′(x) intersects the line LQ. He calls △x and △y increments and these can have a principal component and a secondary component. For instance, in the above situation, △x = PL and △y = LQ; △x has only the principal component, but △y has the principal component LT and the secondary component TQ. As infinitesimals, the secondary component is of higher order than the principal component, hence, becomes negligible when the increment of the independent variable, △x most generally, is sufficiently small, i. e. sufficiently close to its zero limit. Furthermore, the principal component of the increment described by the independent variable, △x most generally, is, provided it is defined directly and not parametrically, the entire component. We see this in the case above.

So, to reiterate, in the above example, △x = PL, △y = LQ, and we let dx (the differential in x) and dy (the differential in y) indicate the principal components of these increments; these principal components are the differential of the function y = f(x), in x and y respectively, at the point (x,y) on the curve. So dx = △x, since it has no secondary component, and dy = LT (notice the difference between dy and △y). But since f′(x) describes the slope at P, f′(x) = LT/PL= dy/dx, and the differential in terms of the dependent variable y is, dy = f′(x)dx. In the special case of f(x) =x, since f′(x) = 1, dy = dx. Special cases matter!

So, you see here, we are free to treat dy/dx as a standard fraction and this enables the chain rule and the dy = f′(x)dx. Now we just let our f′(x) be the f(x) we are integrating, do you see? So the definite integral is just the summation of all of the dy’s, where the dy’s are taken from the x-axis to the curve, on the interval of integration! This is the Fundamental Theorem of Calculus.

Let P = {x_i}, i ∈ [1,n], be a regular partition of [a,b], the interval of integration, and let F(x) be the antiderivative of our function f(x). Then

F(b) − F(a) = F(x_n) − F(x_0);

= (F(x_n) − F(x_(n−1)) + (F(x_(n−1) − F(x_(n−2)) +…+ (F(x_1) − F(x_0));

= ∑ (F(x_i) − F(x_(i−1)).

By the Mean Value Theorem, we can find a c_i in the interval [x_(i−1),x_i] such that

F(x_i) − F(x_(i−1)) = F′(c_i)(x_i − x_(i−1)) = f(c_i)△x.

But then, by substitution

F(b) − F(a) = ∑ f(ci)△x.

Taking the limit as n → ∞

F(b) − F(a) = lim_(n → ∞) ∑ f(ci)△x;

= ∫ f(x)dx, on the interval [a,b].

So, as you can see, the notation just tells us exactly what we are doing when we take the definite integral. This is an unambiguous conceptual model and the “differential” from Multivariable Calculus must be consistent with that of Scalar Calculus. Currently it is not.

With the circle, since topological considerations have become popular, we see people trying to import terms from topology into Analytic Geometry and Calculus, the word “disc” in particular. Technically speaking, a circle is a one-dimensional manifold, and from the Wikipedia space for Dimension we have:

The uniquely defined dimension of every connected topological manifold can be calculated. A connected topological manifold is locally homeomorphic to Euclidean n-space, in which the number n is the manifold’s dimension.
For connected differentiable manifolds, the dimension is also the dimension of the tangent vector space at any point.

This leads to the idea that a circle has no area, but in Analytic Geometry and Calculus we are taught that circles have area. Some have tried to update this situation obviously without putting a great deal of thought into it. They try to say that “discs” have area, but “disc” is a term of art from topology and it does not make any sense to say, “a square disc has area” or “a triangular disc has area,” etc. This situation is easily resolved with the tiniest effort when one realizes that Analytic Geometry already has the requisite term of art: sheet! A hyperboloid of two sheets? And it makes complete sense to say “a circular sheet has area,” “a square sheet has area,” “a triangular sheet has area,” etc. So simple, but try suggesting this to mathematicians and they get all bent out of shape, like you’re the one being irrational. I find the situation annoying.

And like in Geometric Algebra, taking v as a vector, 1/v does not make sense because it is nowhere defined. Hence, it does not make sense to say that v^(-1) = 1/v. The reciprocal of a vector in Geometric Algebra is v^(-1) = v/v², where 1/v² is a scalar. This IS defined but you cannot cancel vectors in that definition, not legitimately. This is defined because with the geometric product v² = v \cdot v + v \wedge v = v \cdot v. It equals the dot product, if you don’t know latex . . .

Okay, rant over.

In Mathematics Words Matter

Written by Wes Hansen