Work Less, Not Faster: When Grade-School Math is Actually Useful

Published in

Atelier de Sécurité

8 min readJan 3, 2021

Many of us are familiar with a concept called Moore’s Law, in which chip performance is expected to double every 18 months or so. This “law” is now more scripture than fact as improvements in chip performance have reached diminishing returns. A new problem that now surfaces is how the sheer volume of data that is produced, along with heavy processing of said data (for things like deep learning), puts tremendous strain on our finite resources.

Data is Growing Faster Than Computing Power

There is an inconvenient truth in technology: The amount of data keeps growing exponentially, while the increases in…

www.newsweek.com

The computing power needed to train AI is now rising seven times faster than ever before

In 2018, OpenAI found that the amount of computational power used to train the largest AI models had doubled every 3.4…

www.technologyreview.com

You might argue that one could buy more cloud computing resources or on-site hardware (like graphics cards for mining rigs), but ultimately those are expensive resources. If you’re like most of us without infinite slush funds, you’ll need to learn to make do with the resources you have.

Disclaimer: I am not a trained computer scientist or mathematician.

I have, however, been in the trenches of analyst grunt work in which I’ve had to do my best with a run of the mill Windows laptop (if you were lucky, you might get a 64-bit one), trying to crank out precariously large operations in Excel. Nudge the program the wrong way, and it’ll crash, nullifying the hours you spent staring at the pinwheel of death.

It was this experience that led me to understand that we don’t just need better machines — we need more performant methods. I thus started picking up SQL then Python for my analytics tasks, and even then I could see that these tools demanded understanding of optimized methods. SQL queries should use indexes, and arithmetic operations in Python should be ‘vectorized’ whenever possible. You don’t want to be that person making the server crash because of your shitty code (trust me, I’ve been there).

Performant code allows you to accomplish your goals with less, which in my mind makes a good programmer a tremendous asset.

Push it to the Limit

Keep this running for some retro BGM

I recently came across a LeetCode-esque problem that went a bit like this:

Consider two numbers A and B, which both get reduced by 1 until the smaller of the two reaches zero
Find how many pairs of A & B result in a remainder of 0 when dividing the larger by the smaller

Essentially, there are two series of numbers with the one starting with the larger number having a constant offset relative to the smaller:

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9]

In the example above, we’d have 4 pairs which result in a remainder of 0. For those unfamiliar with arithmetic operators in programming, we can use the % (modulo) operator to return the integer remainder of a division, and == (equal to) to test whether that remainder is zero.

16 % 8 == 0 would return True.

The Tortoise Doesn’t Win Here

For smaller sets A and B, a ‘for loop’ that iterates over each pair and counts how many result in zero would work fine. As they get larger and larger, however, (the validation tests include numbers in the billions), this method isn’t going to cut it because Python has to iterate over each individual pair one at a time. The challenge includes a pretty strict time limit which disqualified this brute-force method (list comprehensions are still for loops!).

Multitasking is Not Enough

Performing the modulus calculation using the vectorization method we mentioned earlier proved to be quite helpful in our timed trials, resulting in ~30x less time spent crunching our series (turned into NumPy arrays). This is due to Python being able to operate on larger blocks (loops still occur in some form) of elements.

However, this had its own problems as the arrays within the context of the challenge could only be stored within memory, causing my Jupyter kernels to die when I tested the very large numbers. My attempts to chop up the arrays into smaller ones to be looped over kept the kernel alive, but still couldn’t reach the speed requirements for the challenge.

Laziness is a Virtue

Don’t get me wrong, good programmers are still probably hard workers. What made the difference regarding this challenge, however, was a critical piece of advice from my friend gh0st_sec: “Do less work instead of working faster”.

In the greater context of life as opposed to the silo of programming challenges: understand what really needs to get done to accomplish the objective, and focus on that rather than killing yourself to crank out what you think has to get done.

And a good way to do that, at least for math-oriented problems, is to think about the math itself.

Elegance > Brute Force

Let’s take another look at the problem itself:

For the values small, small-1, small-2, ..., 0 and big, big-1, big-2, ..., 0 , we’re looking for how many times big_n % small_n == 0 .
We know that there is a constant difference of big — small for all elements

Given the constant diff = big — small , we can convert big % small == 0 to (small + diff) % small = 0 .

That expression can be interpreted to say that (small + diff) / small equals some integer a. By performing basic algebra and multiplying both sides by small, we get small + diff = a * small , which can be simplified to diff = (a — 1) * small . Because a and small are both integers, a — 1 must also be an integer, thus:

In the cases where big % small == 0, small must be a factor of diff. Besides the cases where A==B (diff==0 ), the result we’re looking for are the factors of diff that are smaller than the first small .

Let’s go back to our example (A=10, B=18):

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9]

The diff is eight, and the pairs where big % small == 0 have values of small that are factors of 8 (8, 4, 2, 1).

An algorithm written with this logic resulted in < 10x speed for smaller numbers, and allowed us to process the huge ones with ease. One could say that its complexity is near O(1) because we focus on operating on the single diff value, and our processing time less impacted by the size of small.

Note: I have intentionally not included any real code here to avoid any spoiling the challenge for anyone who might be working on it.

How We Think About Math & Programming

The math I used to arrive at my solution was not high-level at all, but was not obvious to me perhaps due to the way I was taught it. My K-12 math education in the US mostly consisted of memorizing equations and plugging away at problems until they became muscle memory. Unfortunately, that leaves math quite the dry topic, only good for crunching numbers like last quarter’s sales. It’s no wonder we don’t have enough kids going into STEM.

Back to Basics.. or Not

I recently felt the need to brush up on my math to become a better analyst and purchased Algebra by I.M. Gelfand, which surprised me with the rigor of the material. Although it does review some very basic knowledge, it quickly escalated in to proof-based problem-solving that I had never come across in my grade school education.

Take, for example, this problem:

Two integers differ by 2. Multiply them and add 1 to the product. Prove that the result is a perfect square (the square of an integer). For example,
3 * 5 + 1 = 16 = 4^2
13 * 15 + 1 = 196 = 14^2

It is not enough as a ‘proof’ to plug in a couple examples and claim that it proves true for all related integers. Let’s look at how Gelfand works through it:

Solution 1: Let n denote the smaller number. The other number is (n + 2). Their product is n(n + 2) = n^2 + 2n . Adding 1 , we get n^2 +2n +1 = (n + 1)^2 (the formula for the square of the sum).

Solution 2: Let n denote the bigger number. The other number is (n -2). Their product is n(n — 2) = n^2-2n . Adding 1 , we get n^2 -2n +1 = (n-1)^2 (the formula for the square of the difference).

Solution 3: If we want to be fair and not choose between the bigger and the smaller number, let us denote by nthe number halfwar between the numbers. Then the smaller number is n-1, the bigger one is n+1, and the product is (n+1)(n-1)=n^2–1 (the difference of squares formula), which is a perfect square minus one.

Simple, but an elegant expression covering all the cases.

Calculation vs Contemplation

Algebra being treated in this way may have a lot to do with Gelfand’s Soviet perspective on education which focuses more on abstract thinking than on simply how to solve a specific type of problem. From my recent experience of connecting this type of thinking to the programming challenge above, I can only wonder if this style of education is contributes to the former USSR’s strength in producing programmers and hackers.

Masha Gershman: What Can We Learn From The Russian Approach To Math Education?

subscribe to TED Radio Hour podcast Part 4 of the TED Radio Hour episode Don’t Fear Math. About Masha Gershman’s TED…

www.npr.org

Russian students dominate at the computer programming olympics - and American computer science…

When Alexander Iverson, 20, was in first or second grade, he learned about subtraction and the order of operations…

www.salon.com

“Working an integral or performing a linear regression is something a computer can do quite effectively. Understanding whether the result makes sense — or deciding whether the method is the right one to use in the first place — requires a guiding human hand. When we teach mathematics we are supposed to be explaining how to be that guide. A math course that fails to do so is essentially training the student to be a very slow, buggy version of Microsoft Excel.”
― Jordan Ellenberg, How Not to Be Wrong: The Power of Mathematical Thinking

Gelfand’s proof-based abstractions use deep understanding of the topic to simplify an otherwise complex problem. I believe that in the same manner, programming should not be a high-powered way of doing the same inefficient work — rather, combined with a deeper understanding of the problem at hand, it can be used to create simpler solutions for tough challenges.