How I Reduced My Query’s Run Time From 30 Min. To 30 Sec. In 1 Hour

The query optimization steps a senior data engineer took to reduce the process time of a query processing 1 billion+ rows.

Zach Quinn
Learning SQL

--

Snail crawling over an object.
Photo by Pascal van de Vendel on Unsplash

Despite the terrifying prospect of increasingly sophisticated cyberattacks, one of the quickest ways to break data infrastructure isn’t at all malicious. All you have to do is introduce your ingestion process to something new, typically in a table’s schema.

A new type. A new field. Or, a field disappearing altogether.

The following query optimization case doesn’t begin with frantic Slack messages signaling a crippled pipeline. It isn’t a response to a carelessly added field upstream. It simply starts with a request fellow data engineers get weekly: “Can you add this new field?”

Since this use case was inspired by work, I can’t provide details about the specific data and request. I can tell you, however, that this was a string field that served as a supplemental id. Prior to adding this field, the view I had previously created would execute in less than 15 seconds since it was processing a very small (less than 10,000 rows) amount of data daily.

This quickly increased to 30 minutes.

--

--

Zach Quinn
Learning SQL

Journalist—>Sr. Data Engineer; new stories every Monday.