In my first job out of college, Princeton University to be precise, I got to be a full time high school teacher in Jersey City, New Jersey. I could probably claim to have taught in the “inner city” however Kennedy Boulevard is wide and spacious and the school was a pristine gem, run by Dominicans, so I don’t expect any medals for bravery. The students were all girls, young women (ladies) to be precise.
I started well into the school year, in October. Some of the nuns had died in a car crash, as my own father would years later. They were in desperate need of someone looking for school teacher work, and there I was, attending graduate school classes at St. Peter’s College, expressing to everyone my fondest intentions. We made a good match. I only quit because I wanted to seek my fortune. I wasn’t ready to settle down. That’s somewhat par for the course among teachers of that age, only four or five years beyond high school.
This was way back in the 1980s, when the World Wide Web was still a gleam in Ted Nelson’s eye. He wrote Computer Lib / Dream Machines, which introduced me to the MEMEX of Dr. Vannevar Bush. This first director of the National Science Foundation, serving as a contemporary of president Truman’s, had fairly accurately envisaged today’s search engines.
Fast forward and we’re swimming in mountains of data, with screaming fast GPUs ready to discover the eigenpatterns. We only need to figure out the right Machine Learning algorithms. Ensembles of these critters, with bagged and boosted data to train on, get set free in the wilderness to predict on their own, with some degree of trust invested, thanks to training and testing. The Bayesians will talk about how sure of themselves they are, but we may also just go by track record.
A lot of folks were shocked at the speed with which the engineering community rebranded around Big Data and Data Science. The data scientists would be the ones to orchestrate Machine Learning, the panning for gold. The gold would then fund an Internet of Things. Toasters would talk to pacemakers. Cars would drive off on some AI-envisioned mission. Humans would be left on the side of the road, to fend for themselves as epiphenomena. Whatever consciousness was, it wasn’t competitive.
That was the shocking dystopia coming from pockets in the Silicon Valley at least (I was in the Silicon Forest). Others applied a more positive spin, but the “lipstick on a pig” meme from the Sarah Palin campaign still lingered: people were becoming more skeptical of the dystopia they were being manipulated into introjecting. Some were suffering from heartburn and acid reflux, having ingested a Taste of Tomorrow. Engineers realized they needed to keep working on their spin.
Where was I in all this? I was a Windows refugee, my language no longer supported by Microsoft. From programming medical research computers in a major hospital, I had gone over to open source. Getting a conservative Catholic hospital to embrace anything “open” was an uphill battle in those days, and I failed. Medical institutions are eager to protect the confidentiality of their patients. I certainly never shared anything private I got to look at. My job was to scrub away identities so that medical researchers could do their data science without compromising confidentiality. I admit, this was important work. But then Visual FoxPro died and the new tools were still too new.
I got back into teaching at that point. Coming full circle, I’m back to being a high school calculus teacher, except these days we call it code school. Students flock to us in order to pick up some coding skills, which come with enough math skills to promote further math learning in whatever direction the career goes. Coding gives students the traction needed to master more of the calculus, or at least appreciate its role in the algorithms behind the spreadsheets and Pandas dataframes.
These days, I sometimes tweet to high school teachers about our parallel code school curriculum, suggesting more of a merger. Why learn the same skills in so many redundant tracks? Why not synergize around the concept of Vector for example? These extend to many-dimensional as features, records, the raw material of Machine Learning. In three dimensions we get XYZ. That’s a strong bridge right there. Why not use NumPy?
From some starting point, clearly still in error, the algorithm follows Monuments to Calculus, these various gradient descent pathways, such that the margin of error, or variance (deviance) becomes less and less. Support Vector Machines were and still are among the star players. Today we have Deep Learning with many hidden layers, some of them convolutional.
In other words, if your calculus students are asking “what is all this good for teacher?” you now have an answer. We use the Greek letter “nabla” to signify our calculus-savvy approach. We come in for a landing thanks to hyperparameter settings and a learning rate (another hyperparameter).
Have students do a little research on the importance of Machine Learning to today’s cyber-economy, and they should be persuaded. But then make sure you both tolerate and encourage skepticism as well. Machine Learning, and even more so Deep Learning, are sometimes oversold.
In sum, we’re prepared to tour AI as a vista, with Monuments to Calculus a kind of Mt. Rushmore. Thanks to Newton and Laplace, Leibniz and Ada, we’ve got our “calculus engines” more than ever attuned to real world data, and are inheriting all the prediction challenges these entail. Everyone longs for a Crystal Ball.
Best of all, Machine Learning begins right where the calculus begins, with y = mx + b. The “m” for “slope” (delta y over delta x) gets flipped over, to become “w” for “weight”, whereas “b” stands for “bias” and so works as is. Linear Regression, followed by Logistic Regression, becomes the new thoroughfare, through which calculus students flow to into Data Science. Code Schools have already become some of the guardians along this path.
Another lesson to take back to the high schools and code schools, from the front lines of Machine Learning, is that we start from different places with our beliefs, and although we may converge, we’re not obligated to agree on every point. Bayesianism helped make room for what some call “subjectivist” statistics. One’s own belief system, qua belief system, remains in view, adding a meta level we all should have, given the plethora of namespaces. Gone are the days when “mathematically correct” meant “in a zero sum game”.
Machine Learning allows different “search engines” (predictors, discriminators, relevancy filters) to come down on different sides of some fence. Search engines do not deterministically all fetch the same kettle of fish. How others have searched before makes for priors and in Bayesianism we’re free to start with differing priors.
The task, then, is to have some overarching or presiding algorithm solicit the opinions of an ensemble. Random Forests provide Decision Trees for this purpose. The presiding algorithm (some “main”) collapses a weighted sum. That’s very quantum mechanical. How did the light beam know which path would keep Action to a minimum? Did it try them all?
I’m referring to a Principle of Least Action, which we find in early thermodynamics. According to ISEPP president Terry Bristol, we have the French speakers (Maupertuis, the Carnots) to thank for some of the earliest formulations of this generalize principle. Action per time interval comes back in energy units (mvd/t). Integrating energy over a time interval (delta t) takes us back to units of Action (mvd). The jump to Planck’s constant is not that great a logical leap (E = hf where f = 1/t). We jump from thermodynamics to quantum mechanics.
The link here is optimization, maximizing efficiency in several dimensions. Talking about the tradeoff between kinetic and potential energy keeps it to two dimensions and that’s wonderful. We need to keep it simple. More Monuments to Calculus figure in.
Long time readers of my stuff know that I’m partial to tetrahedral wedge deltas, and a five-fold symmetrical way of dividing the sphere. Ed Popko’s Divided Spheres could be a reference at this point. The high frequency geodesic ball, closest packed with other such balls, is where our code school geometry often begins. Using tetrahedrons to approximate a sphere corresponds to the ancient Greek method for obtaining Pi, the ratio between a circle’s diameter and its circumference (same as perimeter).
Our code school geometry might go with Python. Mine certainly does. That’s the language I jumped to when Visual FoxPro went down. I learned to become productive in this language and to make it a glue between productivity tools. Microsoft Office would not be necessary. On the other hand Python was quite capable of working with COM objects and talking dot NET. Windows and Python continued to get along. I ended up on an Apple, sharing a bash shell with Linux. But now Windows is supporting bash as well. The Python community absorbed quite a few FoxPro refugees, we now see in retrospect.
I bring in Python to round out my sense of how we might go forward with the calculus. Conventional notations are here to stay, but then so is dot notation pretty much mainstream by this time, with most Object Oriented Languages (OOLs) supporting it. Going back and forth between textbook typography and operational mathematics is what Mathematica and MathCAD are all about as well. Open source solutions are not the only solutions. Think of your school, and your server. Make choices appropriate to your predicted future and optimize curriculum for your students. As a curriculum writer, such as myself, that’s about the best one can do.
One final point: in bridging from the calculus to modern day stats (data science) I am not suggesting a competitive relationship with the already well-established link to Newtonian mechanics and all the solar systems astronomy that entails. Kepler’s Laws are about the Principle of Least Action just as much. We’re not talking either/or here. Astronomers use Machine Learning on the job, so if anything we’re talking about feeding two for the price of one.