Refactoring A Python ETL Pipeline (With Example)

My step-by-step process for revising an existing Python pipeline fetching data from the Reddit API.

Zach Quinn
Pipeline: Your Data Engineering Resource

--

Create a job-worthy data portfolio. Learn how with my free project guide.

House with walls knocked out and wires dangling.
Refactoring is like renovation for your code. Photo by Jørgen Larsen on Unsplash.

Revise and Refactor Your Python ETL Pipelines

Revisiting a Python script you’ve written months or years ago is like discovering an assignment from high school: You cringe at what you should have known and you marvel at how far you’ve come.

If you’ve had any experience with coding, particularly in Python, you’ll know that programming is an iterative process.

The best problems are solved incrementally and the best code is written and rewritten a bit at a time.

Unlike your high school term paper or a college thesis gathering dust in the attic, production scripts can and should periodically be revisited and revised to make the code more concise, readable and configurable.

This practice is a software engineering process known as refactoring.

The point of refactoring is to maintain the functionality of a piece of code while improving its readability.

--

--

Responses (2)