Ingesting Malformed Text Files in Python: Common Errors and Fixes

How data engineers can identify, reconfigure and address malformed CSV files in batch pipelines.

The following is dedicated to my past self, who tried, through brute force and willpower, to fix malformed files. I hope you glean something useful so that you avoid a similar situation and save yourself some wasted working hours.

Glasses framing scrolling text on a computer screen.
Photo by Alex Chumak on Unsplash

--

--

--

Offering original and aggregated data engineering content for working and aspiring data professionals. Content posted here generally falls into one of three categories: Technical tutorials, industry news and visualization projects fueled by data engineering.

Recommended from Medium

Introduction to the Hexagonal Architecture in Java | Design Pattern

Using AudioManager to Manage the Audio Effect- Game Dev Series 118

ConstraintLayout in Jetpack Compose, Android Edge to Edge, ViewModel and more!

Python 3.9 Updates in 2 Minutes

Is CI part of a basic developer set-up?

StatefulSet Usage and Pitfalls

ASIS CTF 2019 Andex write-up

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Zach Quinn

Zach Quinn

DE @ Forbes. Pipeline: A Data Engineering Resource. Editor: Learning SQL. Opinions are my own.

More from Medium

An Overlooked Source of GitHub Actions Deployment Errors

What a nice story! Yes, that was just a story, but based on reality :)

ETL Three different ways

Fine dining table with wine glasses, and gourmet salmon.

Data Ingestion into s3 using Python boto3