Refactoring A Python ETL Pipeline (With Example)
My step-by-step process for revising an existing Python pipeline fetching data from the Reddit API.
Create a job-worthy data portfolio. Learn how with my free project guide.
Revise and Refactor Your Python ETL Pipelines
Revisiting a Python script you’ve written months or years ago is like discovering an assignment from high school: You cringe at what you should have known and you marvel at how far you’ve come.
If you’ve had any experience with coding, particularly in Python, you’ll know that programming is an iterative process.
The best problems are solved incrementally and the best code is written and rewritten a bit at a time.
Unlike your high school term paper or a college thesis gathering dust in the attic, production scripts can and should periodically be revisited and revised to make the code more concise, readable and configurable.
This practice is a software engineering process known as refactoring.
The point of refactoring is to maintain the functionality of a piece of code while improving its readability.