The Idea of Data Alchemy

Charles M Rice
3 min readDec 13, 2016

--

Charles Rice | Freelance Data Scientist | Analytic Translator

It is little remarked that Isaac Newton spent much of his life as an alchemist. The smoother pebble he sought on the beach all those years was the Great Work of alchemy, the Philosopher’s Stone — a compound that would create permanent transmutations (“projection” in the language of alchemists) of base matter like lead into more valuable metals like gold or silver.

Today, we have cast aside such notions as transmutation and alchemy as quaint, mystical pseudoscience. (Not to mention expensive: while it is now possible to turn atoms of lead into atoms of gold, the cost to transmute an atom of gold far exceeds the spot price for an ounce.) But a new branch of science is on the rise. Described breathlessly as the “sexiest job of the 21st century,” data science promises to do what Newton could not: turn lead into gold or, at least, turn leads into cash. And in truth it bears a striking resemblance to the Great Work of alchemy.

Our wired world has become a massive data generator. Every time you surf the Web, or pay with a credit card, or even drive a car you make a small contribution to the terabytes of new data created every day.

But that data sitting inert in the Cloud, spread over hundreds of servers around the world, is useless. The immediately useful bits — the location of your Uber pick-up, your shipping address from Gilt — are clear and used immediately. But those bits are a tiny portion of the whole. The rest is worse than the lead the alchemists tried to transmute — at least lead could be used for musket balls or pencils. At the volumes in which it currently exists, data is dross, the tailings left over from mining, when every useful bit of ore has been extracted, piled in heaps near the mill or turned into landfill.

The data alchemist looks at these great heaping mounds of inert data, moldering in the cloud, and sees opportunity. Using his alembic, still, and retort — statistical analysis, coding, and substantive expertise — he takes data and turns it into knowledge, intelligence, and insight.

Let me share an example of what I mean.

During General Assembly’s Data Science Immersive course, our instructors came up with a rather ingenious project. The (in)famous Michelin Red Guide was coming to Washington, which meant that some restaurants in DC would receive one, two, or even three coveted stars. Our task as apprentice data scientists was to predict which restaurants in town would get those stars, and how many stars they would get.

For an expert in hospitality and fine dining, the likely recipients were obvious. A handful of restaurants, well-known as the best or most innovative in town, would receive the stars, and the only question was which ones and at which level. The director of Michelin North America even said that there would not be any surprises at the 2- and 3-star level.

We students were not experts in fine dining, or hospitality. All we had was the internet, a strong command of statistics, and some very powerful tools. But from the mass of found, unstructured data (I will go into the particulars of successful approaches to this problem in a future post) we created precise, accurate predictions of which restaurants would receive stars, and how many stars they would receive. Some predictions cast too wide a net, some too imprecise, but on the whole the predictions were very close to the target.

Broad and imprecise predictions are not quite as bad as the explosions, poisonings, and insanities generated by the early alchemists, and the benefits of predicting uncertain outcomes with high degrees of accuracy and precision is infinitely more valuable than merely turning lead into gold. But the contemporary data scientist is still much closer to his alchemical ancestor than not: his interest lies in turning something copious and cheap into something rare and valuable, and his methods are often just as mysterious as the bubbling stills and retorts of Isaac Newton’s laboratory.

Charles Rice is a data scientist and recovering speechwriter in Washington DC. He’s pursuing a data-focused career and interested in freelance, contract, and full-time opportunities.

Originally published at https://www.linkedin.com on December 13, 2016.

--

--

Charles M Rice

Research Engineer @BlockScience. Pythonic Data Scientist. Complex Systems Detangler. All-Round Word Wrangler.