Big Diff

Mahati Meeru Nilaya
Frontend Weekly
Published in
5 min readOct 25, 2016

Diff algorithms: History & Functionality for Newbies

A diff algorithm is basically a way to compare two disparate sets of data and turn data A into data B. The driving aspect of many programming languages and more recently, many front-end based user-interface libraries.

A diff algorithm is not a singular unique entity. The algorithm itself probably consists of some combination of lower order sorting and comparing mechanisms many programmers, statisticians, linguists, poets, musicians, and mathematicians are familiar with. A diff algorithm could even use something as basic as a bubble sort, some sort of find method and some sort of delete method!

There isn’t any singular ‘true’ diff algorithm or the ‘best’ diff algorithm. There are simply really good ones and ones that are rather slow and bad. (The bubble sort involved would probably rate in the later category) There are hundreds of diff algorithms though the ones that frameworks like ReactJS utilize are incredibly versatile, fast, precise, and generally considered the ‘presiding’ algorithm.

A bit of History & Vocab

Edit Distance:
The number of “moves” it takes to convert one string into another.
Converting "Bad" into "Mad would have a very short or low Edit Distance. Presumably this would simply require the deletion of the 'B' and insertion of an 'M'.
But converting "Post-Colonialism" into "heteronormativity" would have a long or high Edit Distance.

In programming this algorithm that calculates the edit distance between two strings is usually called the Wagner Fisher algorithm but…its been invented by tons of people across the world at different times so its really rather colonially-appropriationist to give it singular naming rights.

So what does this matter?

Well, back in 1976 the first diff algorithm paper in computer programming was published for Unix. This was considerably faster than the “Wagner Fisher algorithm” and could be used for more than simple strings. This didn’t just calculate the Edit distance between two strings but went on to make the conversion in that optimal number of moves! Exciting! This algorithm, called the Hunt-McIlroy diff was a godsend to then slowly emerging field of biotech, particularly in DNA sequencing and comparisons.

Textbooks before this period probably couldn’t have fun facts like what percent of our genome is exactly the same as various other animals!

Modern Diffs & ReactJS

The diff algorithm utilized by ReactJS does far more than compare and sort through two different strings. The algorithm governing this framework is utilized so that changes to the DOM are made incredibly efficiently and with distinct precision.

Say a team of provenance researchers and non-shitty Dan Brown type art historians wanted to pull off the biggest art heist ever and convert a painting in the school of Peter Paul Rubens into a true Rubens painting. Think of of the DOM as the school painting of Daniel and the Lion’s Den and not Ruben’s actual original.

School Painting

Original

Obviously we need to change Daniel, some of the expressions on the faces of the various lions and some overall positioning. Deleting Daniel, the faces of the lions, and other smaller elements would potentially be catastrophic as we might not be able to imitate the style well enough again! Also…it would take AGES!

With a framework like React, we can pinpoint the exact things we need to change:

  • the positioning of Daniel’s hands
  • 3 faces of lions, particularly their jaw and brow structure
  • the light from the cave

and modify ONLY those elements.

How does React DO THAT?

Magic.

(Mostly, in a math sense)

So this is not a React tutorial. That may happen…later. Instead, what is being asked is — How does the algorithm governing React work well enough that, given good structural organization by the developer and actual directions, the algorithm is able to scalpel and change these very distinct features of the DOM?

To go back to the Ruben’s example, lets assume that both the actual artist, Peter Paul Ruben, and our imitator painted the various elements of the painting in layers (like the layers of color in Homage to the square)
If we convert these layers into more easily accessible graphical info, they would look like this:

Lets say that the figure of Daniel is on the second level, a parent, and the distinct faces of the lions are on the 3rd level, children.

React uses a tree edit distance algorithm, a tree alignment distance algorithm, and a tree inclusion algorithm to grab Daniel in the parent level and the lions on the child level and swap them out with the corresponding from the true painting!

But how exactly do the tree based algorithms work?
Unfortunately that may be a deviation for another blog and another blog post as it relies heavily on deep mathematical understanding.

The ReactJS github page actually links you to the academic paper that outlines some of their core algorithms.

Philip Bille Paper: Algorithms

The paper is edited so that it does not go so far as to publish the actual full form algorithm. But there is a detailed discussion of the basic algorithmic structure, the common concerns and outlines the top four ‘almost-good-enough’ algorithms for edit distance and alignment distance.
Below is a sample of the basic structure for the edit distance and a sample solution:

The paper also assumes awareness off some of the more recent diff algorithms available such as:
Xdelta — very fast for binary files
Rsync — binary files over network links
bsdiff — compiled executable files

Based on this internal structure and functionality, its easy for one using ReactJS in the programming to see how a lot of this internal structure governs syntax and organization!

--

--

Mahati Meeru Nilaya
Frontend Weekly

An Archaeological Art Historian turned Web Developer seeking to creatively integrate art, design, and code!