TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Python for Data Engineers

17 min readOct 21, 2023

--

Photo by Boitumelo on Unsplash

In this story I will speak about advanced data engineering techniques in Python. No doubt, Python is the most popular programming language for data. During my almost twelve-year career in data engineering, I encountered various situations when code had issues. This story is a brief summary of how I resolved them and learned to write better code. I will show a few techniques that make our ETL faster and help to improve the performance of our code.

List comprehensions

Imagine you are looping through a list of tables. Typically, we would do this:

data_pipelines = ['p1','p2','p3']
processed_tables = []
for table in data_pipelines:
processed_tables.append(table)

But instead, we could use list comprehensions. Not only they are faster, they also reduce the code making it more concise:

processed_tables = [table for table in data_pipelines]

For example, looping through a super large file with data to transform (ETL) each row has never been easier:

def etl(item):
# Do some data transformation here
return json.dumps(item)

data = u"\n".join(etl(item) for item in json_data)

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

💡Mike Shakhomirov
💡Mike Shakhomirov

Written by 💡Mike Shakhomirov

Data Engineer, Data Strategy and Decision Advisor, Keynote Speaker | linktr.ee/mshakhomirov | @MShakhomirov

Responses (8)