Important Python Topics for Data Engineering

19 Must-do Topics in Python Every Beginner Should Know

Nnamdi Samuel
Art of Data Engineering
4 min readJan 24, 2024

--

Photo by Trust “Tru” Katsande on Unsplash

Python programming language is vast! As far as data engineering is concerned, you do not need to know everything about it.

Many data engineers and other professionals use Python for their day-to-day activities for data extraction and ingestion, data transformation and cleaning, data orchestration, and also to automate repetitive tasks.

In this article, I’ve highlighted important topics you should know! Learn how to use them and become a professional as you practice.

  1. Variables: In Python, you can assign values to variables. Variables are dynamically typed, meaning you don’t need to declare their type explicitly.
x = 10
name = "John"

2. Data Types: Python has various data types, including integers, floats, strings, booleans, lists, tuples, sets, and dictionaries.

age = 25
height = 5.9
name = "Alice"
is_student = True
fruits = ["apple", "orange", "banana"]

3. Operators: Python supports various operators for arithmetic, comparison, logical operations, etc.

result = 10 + 5
is_greater = 20 > 15

4. if statements: used for conditional execution.

if x > 0:
print("Positive")
elif x < 0:
print("Negative")
else:
print("Zero")

5. Loops: ‘for’ and ‘while’ loops for iterating over sequences or executing code repeatedly.

for fruit in fruits:
print(fruit)

counter = 0
while counter < 5:
print(counter)
counter += 1

6. Functions: blocks of reusable code. You can define your functions or use built-in ones.

def greet(name):
return "Hello, " + name + "!"

result = greet("Alice")
print(result)

7. Modules: Python files containing reusable code. You can import modules to use their functions and variables.

import math
print(math.sqrt(25))

8. Exception Handling: Handling errors and exceptions using try and except.

try:
result = 10 / 0
except ZeroDivisionError:
print("Cannot divide by zero.")

9. Lists: Ordered, mutable collection of elements.

fruits = ["apple", "orange", "banana"]

Common Operations:
Accessing elements: fruits[0]
Adding elements: fruits.append(“grape”)
Removing elements: fruits.remove(“orange”)

10. Tuples: Ordered, immutable collection of elements.

coordinates = (10, 20)

Common Operations:
Accessing elements: ‘coordinates[0]’
It is useful for representing fixed sets of values.

11. Sets: Unordered, mutable, or immutable collection of unique elements.

unique_numbers = {1, 2, 3, 4, 5}

Common Operations:
Adding elements: ‘unique_numbers.add(6)’
Set operations: union, intersection, difference.

12. Dictionaries: Unordered collection of key-value pairs.

student = {"name": "Alice", "age": 25, "grade": "A"}

Common Operations:
Accessing values: ‘student[“name”]’
Adding or updating values: ‘student[“grade”] = “B” ’

13. Strings: Ordered sequence of characters.

message = "Hello, Python!"

Common Operations:
String concatenation: greeting = “Hello, “ + name + “!”
String methods for manipulation.

14. Arrays (NumPy): Homogeneous, multidimensional arrays for numerical operations.

import numpy as np
matrix = np.array([[1, 2, 3], [4, 5, 6]])

15. Opening and Closing Files: To work with files, you first need to open them. The open() function is used for this purpose.

# Opening a file for reading
file_path = "example.txt"
file = open(file_path, "r") # "r" stands for read mode

# Opening a file for writing
file_path = "output.txt"
file = open(file_path, "w") # "w" stands for write mode

# Opening a file for appending
file_path = "log.txt"
file = open(file_path, "a") # "a" stands for append mode

# Closing a file
file.close()

16. Reading from Files: There are different methods for reading from files.

 # Read the entire file: 
content = file.read()

# Read one line at a time:
line = file.readline()

# Read all lines into a list:
lines = file.readlines()

17. Writing to Files

# To write to a file, you can use the write() method.
file.write("Hello, this is a line of text.")

# If you want to write multiple lines, you can use a loop or provide a list of strings
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]
file.writelines(lines)

18. Handling Exceptions
When working with files, it’s important to handle potential exceptions, such as when a file doesn’t exist or when there are issues with file permissions. Using a try-and-except block is a good practice

try:
with open("example.txt", "r") as file:
content = file.read()
except FileNotFoundError:
print("File not found.")
except PermissionError:
print("Permission error.")

19. Working with Different File Formats
JSON: The ‘json’ module is useful for reading and writing JSON data.
CSV: Python’s ‘csv’ module is handy for working with CSV files.
XML: The ‘xml.etree.ElementTree’ module is commonly used for parsing XML files.

import json
import csv
import xml.etree.ElementTree as ET

# Example for JSON
with open("data.json", "r") as json_file:
data = json.load(json_file)

# Example for CSV
with open("data.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file)
for row in csv_reader:
print(row)

# Example for XML
tree = ET.parse("data.xml")
root = tree.getroot()

Python’s versatility makes it a valuable tool for data engineers working on diverse projects and data processing tasks.

As you grow in your profession, there may be a need to learn more topics. Get as many resources as required and learn as you go!

Thank you for reading! If you found this interesting, follow me and subscribe to my latest articles. Catch me on LinkedIn and follow me on Twitter

--

--

Nnamdi Samuel
Art of Data Engineering

Data Engineer💥Voracious Reader and a Writer || Chemical Engineer