The Ultimate Collection: 125 Python Packages for Data Science, Machine Learning, and Beyond

All about python packages

Senthil E

Published in

Analytics Vidhya

23 min readMar 22, 2023

Introduction:

Python, one of the world’s most popular programming languages, boasts a vast ecosystem of modules and packages, with over 350,000 available to developers. This rich collection of resources empowers Python developers to tackle a diverse range of tasks, from data analysis and machine learning to web development and automation. This article provides the most important modules in areas such as data science, machine learning, web development, and more.

Most downloaded PyPI packages.

1. Calendar:

import calendar

cal = calendar.TextCalendar()
cal.prmonth(2023, 3)

March 2023
Mo Tu We Th Fr Sa Su
       1  2  3  4  5
 6  7  8  9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31

General calendar-related functions.
Display calendars and handle dates.
Support for leap years, weekdays, and month ranges

2. Collections:

from collections import Counter

words = ["apple", "banana", "apple", "orange", "banana", "apple"]
word_count = Counter(words)

print(word_count)

Counter({'apple': 3, 'banana': 2, 'orange': 1})

Specialized container datatypes.
Counter, defaultdict, OrderedDict, namedtuple, and deque.
More efficient and flexible alternatives to built-in types.

3. bisect:

import bisect

sorted_list = [1, 3, 4, 4, 6, 8]
position = bisect.bisect_left(sorted_list, 4)

print(position)

#2

Array bisection algorithms for sorted sequences.
Binary search, insertion, and more.
Efficiently find and maintain sorted order in lists.

4. heapq:

import heapq

nums = [4, 7, 2, 5, 1, 3]
heapq.heapify(nums)

smallest = heapq.heappop(nums)
print(smallest)
#1

Heap queue algorithms (priority queues).
Maintain a sorted collection of items with efficient insertions and removals.
Useful for scheduling, priority-based tasks, and more.

5.json:

import json

data = {"name": "John", "age": 30}
json_string = json.dumps(data)
decoded_data = json.loads(json_string)

print(json_string)
print(decoded_data)

#{"name": "John", "age": 30}
#{'name': 'John', 'age': 30}

Encode and decode JSON data.
Serialize and deserialize Python objects to JSON format.
Store and exchange data in a lightweight, human-readable format.

6. configparser:

import configparser

config = configparser.ConfigParser()
config.read("example.ini")

name = config.get("section", "name")
age = config.getint("section", "age")

print(name, age)

Configuration file parser
Read and write data from INI files
Manage application settings and user preferences

7. sched:

import sched
import time

def print_event(event_name):
    print(f"Event: {event_name}")

s = sched.scheduler(time.time, time.sleep)
s.enter(5, 1, print_event, ("Event 1",))
s.enter(10, 1, print_event, ("Event 2",))
s.run()

#Event: Event 1
#Event: Event 2

General-purpose event scheduler
Schedule and execute tasks at specific times or intervals
Perform timed operations, such as periodic updates or reminders

8. random:

import random

random_float = random.random()
random_int = random.randint(1, 10)
random_choice = random.choice(["apple", "banana", "orange"])

print(random_float)
print(random_int)
print(random_choice)

#0.8998656473050128
#6
#orange

Generate random numbers and make random choices
Support for uniform, Gaussian, and other distributions
Shuffle and sample from sequences

9. secrets:

import secrets

token = secrets.token_hex(16)
p
print(token)

#97f3108ff85aef4b0b00c3c2154ae873

10. difflib:

import difflib

text1 = "The quick brown fox jumps over the lazy dog"
text2 = "The quick red fox jumps over the lazy dog"

differ = difflib.Differ()
diff = list(differ.compare(text1.split(), text2.split()))

print("\n".join(diff))
The
  quick
- brown
+ red
  fox
  jumps
  over
  the
  lazy
  dog

Helpers for computing differences between sequences
Create and display diffs for text, lists, and other data
Identify changes, corrections, and updates in the data

11. timeit:

import timeit

def slow_function():
    return sum(range(100000))

execution_time = timeit.timeit(slow_function, number=100)

print(f"Execution time: {execution_time} seconds")

#Execution time: 0.2170043410001199 seconds

Measure the execution time of small code snippets
Test performance and optimize code
Compare the speed of different solutions and implementations

12. pbt:

import pdb

def buggy_function(x):
    y = x * 2
    pdb.set_trace()
    return y + x

result = buggy_function(10)
print(result)

Python debugger
Set breakpoints, step through code, and inspect variables
Debug and troubleshoot code interactively

13. xml.etree.ElementTree:

import xml.etree.ElementTree as ET

xml_string = """
    <person>
        <name>Alice</name>
        <age>30</age>
        <city>New York</city>
    </person>
"""

root = ET.fromstring(xml_string)
name = root.find("name").text
age = int(root.find("age").text)
city = root.find("city").text

print(name, age, city)

#Alice 30 New York

XML processing library
Parse, manipulate, and generate XML documents
Read and write XML data in a structured and hierarchical format

14. HTMLParser:

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print(f"Start tag: {tag}")

    def handle_endtag(self, tag):
        print(f"End tag: {tag}")

    def handle_data(self, data):
        print(f"Data: {data.strip()}")

parser = MyHTMLParser()
parser.feed("<html><head><title>Example</title></head><body><p>Hello, world!</p></body></html>")

Start tag: html 
Start tag: head 
Start tag: title 
Data: Example 
End tag: title 
End tag: head 
Start tag: body 
Start tag: p 
Data: Hello, world! 
End tag: p 
End tag: body 
End tag: html

Basic HTML and XHTML parser
Extract information and data from HTML documents
Build custom web scrapers and data extraction tools

15. re:

import re

text = "The quick brown fox jumps over the lazy dog"
pattern = r"\b\w{5}\b"

five_letter_words = re.findall(pattern, text)

print(five_letter_words)

# ['quick', 'brown', 'jumps']

Regular expression operations
Search, match, and manipulate text based on patterns
Validate and clean data, extract information, and perform advanced text manipulation

16. argparse:

import argparse

parser = argparse.ArgumentParser(description="Example command line program.")
parser.add_argument("--input", type=str, required=True, help="Input file")
parser.add_argument("--output", type=str, help="Output file")

args = parser.parse_args()

print(f"Input file: {args.input}")
print(f"Output file: {args.output}")

Command-line option and argument parsing
Create user-friendly command-line interfaces for your scripts and programs
Handle options, arguments, and flags with ease and flexibility

17. logging:

import logging

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")

logging.debug("This is a debug message")
logging.info("This is an info message")
logging.warning("This is a warning message")
logging.error("This is an error message")
logging.critical("This is a critical message")

WARNING:root:This is a warning message
ERROR:root:This is an error message
CRITICAL:root:This is a critical message

Flexible event logging system
Log messages with different severity levels to various outputs
Debug, monitor, and analyze your applications and systems

18. decimal:

from decimal import Decimal

a = Decimal("0.1")
b = Decimal("0.2")
c = a + b

print(c)

#0.3

Decimal fixed-point and floating-point arithmetic
Perform precise and accurate calculations with decimal numbers
Suitable for financial applications, scientific simulations, and other numerically sensitive tasks

19. fractions:

from fractions import Fraction

a = Fraction(1, 3)
b = Fraction(1, 6)
c = a + b

print(c)

#1/2

Rational number arithmetic
Perform calculations with fractions and exact rational numbers
Useful for exact arithmetic and applications with a focus on accuracy and precision

20. sqlite3:

import sqlite3

conn = sqlite3.connect("example.db")
c = conn.cursor()

c.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)")
c.execute("INSERT INTO users (name, age) VALUES (?, ?)", ("Alice", 30))
conn.commit()

for row in c.execute("SELECT * FROM users"):
    print(row)

conn.close()

# (1, 'Alice', 30)

Manage and interact with SQLite databases directly from your Python code
Store, query, and manipulate data in a lightweight, serverless, and self-contained format

21. requests:

import requests

response = requests.get("https://api.example.com/data")

if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"Error: {response.status_code}")

HTTP library for making requests
Interact with RESTful APIs, download files, and scrape web content
Simplify HTTP requests with a user-friendly and feature-rich API

22. flask:

from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello, World!"

@app.route("/api/data")
def api_data():
    data = {"name": "Alice", "age": 30}
    return jsonify(data)

if __name__ == "__main__":
    app.run()

* Serving Flask app '__main__'
 * Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
INFO:werkzeug:Press CTRL+C to quit

Lightweight web application framework
Build web applications, APIs, and microservices quickly and easily
Offers a flexible and extensible architecture for your web projects

23. Pytest:

# Install pytest using pip:
# pip install pytest

# Create a test file named "test_example.py" with the following content:

def add(a, b):
    return a + b

def test_add():
    assert add(1, 2) == 3
    assert add(-1, 1) == 0
    assert add(0, 0) == 0

# To run the tests, use the `pytest` command-line tool:
# pytest test_example.py

Testing framework for Python applications
Write and organize tests for your code, libraries, and projects
Discover, execute, and report on tests with ease and flexibility

24. scipy:

import numpy as np
from scipy.optimize import minimize

def objective_function(x):
    return x[0] ** 2 + x[1] ** 2

initial_guess = np.array([1, 1])
result = minimize(objective_function, initial_guess)

print(result.x)

[-1.07505143e-08 -1.07505143e-08]

Scientific computing library
Provides a wide range of algorithms and tools for optimization, integration, interpolation, and more
Builds on the NumPy library to offer advanced functionality for scientific applications and research

25. os:

import os

print(os.listdir("."))

#['.config', 'example.db', 'sample_data']

Interact with the operating system
File and directory management
Environment variables and process information

26. glob:

import glob

print(glob.glob("*.txt"))

Find all pathnames matching a specified pattern
Wildcard pattern matching
The simple way to list files in a directory

27. itertools:

import itertools

for combo in itertools.combinations("ABC", 2):
    print(combo)

('A', 'B')
('A', 'C')
('B', 'C')

Iterator building blocks
Combinatoric generators, such as permutations and combinations
Memory-efficient looping

28. time:

import time

start_time = time.time()
time.sleep(2)
end_time = time.time()

print(f"Elapsed time: {end_time - start_time} seconds")

Elapsed time: 2.002392292022705 seconds

Time access and conversions
Measure performance and delays
Handle time zones and daylight saving time

29. datetime:

from datetime import datetime, timedelta

now = datetime.now()
one_week_from_now = now + timedelta(weeks=1)

print(now)
print(one_week_from_now)

2023-03-19 20:02:27.337600
2023-03-26 20:02:27.337600

Manipulate dates and times
Perform arithmetic with dates
Format and parse dates and times

30. hashlib:

import hashlib

message = "Hello, world!"
hashed = hashlib.sha256(message.encode("utf-8")).hexdigest()

print(hashed)

# 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3

Cryptographic hashing and message digest algorithms
Support for SHA, MD5, and other hash functions
Create secure hashes and check message integrity

31. urllib:

from urllib.request import urlopen
from urllib.parse import urlparse

url = "https://www.medium.com"
response = urlopen(url)
parsed_url = urlparse(url)

print(response.read())
print(parsed_url)

URL handling modules
Open URLs, encode and decode data, and parse URLs
Work with HTTP, HTTPS, and FTP protocols

32. flake8:

# example.py
def add(a, b):
    return a + b

print(add(1, 2))

Run flake8 example.py to check for style and quality
Enforce PEP 8 code style
Catch errors and improve code readability

33. pathlib:

from pathlib import Path

current_dir = Path(".")
for file in current_dir.iterdir():
    print(file)

#.config
#example.db
#sample_data

Object-oriented filesystem paths
Simplifies file and directory operations
Works on Windows, macOS, and Linux

34. smtplib:

import smtplib

from_email = "you@example.com"
to_email = "recipient@example.com"
message = f"Subject: Hello\n\nHello, {to_email}!"

with smtplib.SMTP("smtp.example.com", 587) as server:
    server.starttls()
    server.login(from_email, "your-password")
    server.sendmail(from_email, to_email, message)

Send emails using the Simple Mail Transfer Protocol (SMTP)
Authenticate with email servers
Send plain-text and HTML emails

35. email:

from email.message import EmailMessage
import smtplib

msg = EmailMessage()
msg.set_content("Hello, world!")
msg["Subject"] = "Greetings"
msg["From"] = "you@example.com"
msg["To"] = "recipient@example.com"

with smtplib.SMTP("smtp.example.com", 587) as server:
    server.starttls()
    server.login("you@example.com", "your-password")
    server.send_message(msg)

Create, manipulate, and parse email messages
Complements the smtplib and imaplib modules

35. yaml:

import yaml

data = {"name": "John", "age": 30}
yaml_data = yaml.dump(data)

print(yaml_data)

age: 30
name: John

Read and write YAML (YAML Ain’t Markup Language) files
Human-readable data serialization format
Supports complex data structures and custom data types

36. platform:

import platform

print(platform.system())
print(platform.python_version())

Linux
3.9.16

Access system and platform information
Retrieve the OS version, hardware details, and Python version
Write cross-platform code

37. math:

import math

print(math.sqrt(9))
print(math.sin(math.pi / 6))

3.0
0.49999999999999994

Basic mathematical functions and constants
Trigonometry, logarithms, exponentiation, and more
Floating-point arithmetic and rounding functions

38. Statistics:

import statistics

data = [1, 2, 3, 4, 5, 6]

print(statistics.mean(data))
print(statistics.median(data))
print(statistics.stdev(data))

3.5
3.5
1.8708286933869707

Basic statistical functions
Calculate the mean, median, mode, variance, and standard deviation
Handle data sets with missing or infinite values

39. queue:

import queue

q = queue.Queue()

for i in range(5):
    q.put(i)

while not q.empty():
    print(q.get())

0
1
2
3
4

FIFO (first-in, first-out), LIFO (last-in, first-out), and priority queues

40. tempfile:

import tempfile

with tempfile.NamedTemporaryFile(mode="w+t") as temp_file:
    temp_file.write("Hello, world!")
    temp_file.seek(0)
    print(temp_file.read())

Hello, world!

Create temporary files and directories
Automatically clean up resources after use
Named and unnamed, file-like objects and file system entries

41. uuid:

import uuid

random_uuid = uuid.uuid4()
print(random_uuid)

4cef1f15-10ee-4dbb-84ec-c6527925258b

Universally unique identifiers (UUIDs)
Generate and manipulate 128-bit identifiers
Useful for generating unique keys, identifiers, or tokens

42. zipfile:

import zipfile

with zipfile.ZipFile("archive.zip", "w") as zf:
    zf.write("file.txt")

Use the ‘zipfile’ module to create and extract ZIP archives.

43. csv:

import csv

with open("data.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["Name", "Age", "City"])
    writer.writerow(["Alice", 30, "New York"])

Use the ‘csv’ module to read and write CSV files.

44. copy:

import copy

original_list = [[1, 2], [3, 4]]
shallow_copy = copy.copy(original_list)
deep_copy = copy.deepcopy(original_list)

Use the ‘copy’ module to create shallow and deep copies of lists or other mutable objects.

45. atexit:

import atexit

def goodbye():
    print("Goodbye!")

atexit.register(goodbye)

Use the ‘atexit’ module to register functions to be called when the program exits.

46. pickle:

import pickle

data = {"a": 1, "b": 2}
serialized = pickle.dumps(data)
deserialized = pickle.loads(serialized)

Use the ‘pickle’ module to serialize and deserialize Python objects, which can be useful for storing data or communication between processes.

47. pprint:

import pprint

data = {"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]}
pprint.pprint(data, indent=4)

{'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}

Use the ‘pprint’ module to pretty-print complex data structures for better readability.

48. fileinput:

import fileinput

with fileinput.input(files=('file1.txt', 'file2.txt')) as f:
    for line in f:
        print(f.filename(), f.lineno(), line, end='')

Use the ‘fileinput’ module to read multiple files line by line, treating them as a single input stream.

49. doctest:

def add(a, b):
    """
    >>> add(1, 2)
    3
    """
    return a + b

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Use the ‘doctest’ module to test your code by writing examples in your function’s docstring, allowing you to keep documentation and tests close to the code.

50. inspect:

import inspect

def my_function():
    pass

print(inspect.getsource(my_function))

def my_function():
    pass

Use the ‘inspect’ module to retrieve information about live objects, such as their source code, documentation, or call stack.

51. locale:

import locale

locale.setlocale(locale.LC_ALL, '')
formatted_number = locale.format_string("%d", 1234567, grouping=True)

Use the ‘locale’ module to work with locale-specific formatting of numbers, dates, and currency.

52. traceback:

import traceback

try:
    1 / 0
except ZeroDivisionError:
    traceback.print_exc()

Traceback (most recent call last):
  File "<ipython-input-43-507d9690673b>", line 4, in <module>
    1 / 0
ZeroDivisionError: division by zero

Use the ‘traceback’ module to print and format exception tracebacks, which can be useful for debugging and logging.

53. zlib:

import zlib

data = b"example data" * 100
compressed = zlib.compress(data)
decompressed = zlib.decompress(compressed)

Use the ‘zlib’ module to compress and decompress data using the DEFLATE algorithm, which can be useful for reducing storage space or network bandwidth usage.

54. fnmatch:

import fnmatch

filenames = ["file1.txt", "file2.pdf", "file3.txt"]
txt_files = [name for name in filenames if fnmatch.fnmatch(name, "*.txt")]

Use the ‘fnmatch’ module to filter filenames or other strings using shell-style wildcards, which can be helpful when working with file lists.

55. sys:

import sys

python_version = sys.version
script_name = sys.argv[0]

Use the ‘sys’ module to access system-specific parameters and functions, such as command-line arguments, Python version, or the current interpreter’s path.

56. shutil:

import shutil

shutil.copy("source.txt", "destination.txt")
shutil.move("old_location.txt", "new_location.txt")

Use the ‘shutil’ module to perform high-level file operations, such as copying or moving files and directories.

57. typing:

from typing import List, Tuple

def my_function(numbers: List[int]) -> Tuple[int, int]:
    return min(numbers), max(numbers)

Use the ‘typing’ module to annotate your functions and classes with type hints, which can improve code readability and facilitate static type checking with tools like Mypy.

58. pkgutil:

import pkgutil

for importer, module_name, _ in pkgutil.iter_modules():
    print(module_name)

Use the ‘pkgutil’ module to work with Python packages, such as listing all installed modules or iterating through a package’s modules.

59. array:

import array

arr = array.array("i", [1, 2, 3, 4, 5])

Use the ‘array’ module to create and manipulate arrays of fixed-size numeric types, which can be more memory-efficient and faster than using lists for large amounts of numerical data.

60. shelve:

import shelve

with shelve.open("my_shelf") as db:
    db["data"] = {"key": "value"}

with shelve.open("my_shelf") as db:
    print(db["data"])

Use the ‘shelve’ module to create and work with persistent dictionaries, which store key-value pairs on disk and can be used as a simple database for small-scale applications.

61. numpy:

import numpy as np

array = np.array([1, 2, 3, 4, 5])
mean = np.mean(array)

NumPy is a powerful library for numerical computing in Python, providing support for arrays, matrices, and various mathematical operations.

62. pandas:

import pandas as pd

data = {"A": [1, 2, 3], "B": [4, 5, 6]}
df = pd.DataFrame(data)

pandas is a popular library for data manipulation and analysis, providing data structures like DataFrame and Series for handling tabular and time-series data.

63. matplotlib:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
plt.plot(x, y)
plt.show()

matplotlib is a plotting library for creating static, interactive, and animated visualizations in Python, such as line, scatter, and bar plots.

64. seaborn:

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
sns.boxplot(x="day", y="total_bill", data=tips)
plt.show()

seaborn is a statistical data visualization library built on top of matplotlib, providing a high-level interface for creating informative and attractive statistical graphics.

65. sci-kit learn:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression().fit(X_train, y_train)

scikit-learn is a popular library for machine learning in Python, providing tools for classification, regression, clustering, and various other learning tasks.

66. statsmodel:

import statsmodels.api as sm
import numpy as np

X = np.random.rand(100)
y = 2 * X + 0.5 * np.random.randn(100)
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()

statsmodels is a library for estimating statistical models and performing statistical tests, offering a wide range of statistical models such as linear regression, logistic regression, and time series analysis.

67. plotly:

import plotly.express as px

data = px.data.iris()
fig = px.scatter(data, x="sepal_width", y="sepal_length", color="species")
fig.show()

Plotly is a library for creating interactive, web-based visualizations using Python, such as line charts, bar charts and scatter plots, with support for advanced features like animations and 3D plots.

68. bokeh:

from bokeh.plotting import figure, output_file, show

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

output_file("line.html")
p = figure(title="Line plot example", x_axis_label="x", y_axis_label="y")
p.line(x, y, legend_label="y=2x", line_width=2)
show(p)

Bokeh is a library for creating interactive, web-based visualizations in Python, providing a flexible and high-level interface for creating complex, feature-rich plots.

69. folium:

import folium

m = folium.Map(location=[45.523, -122.675], zoom_start=13)
folium.Marker([45.524, -122.674], popup="Portland, Oregon").add_to(m)
m.save("map.html")

folium is a library for creating interactive maps using Python and the popular JavaScript library Leaflet, allowing you to easily visualize spatial data and add various map layers and markers.

70. geopandas:

import geopandas as gpd

world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world.plot()

geopandas is a library for working with geospatial data in Python, extending the functionality of pandas by adding support for geospatial data types and operations.

71. wordcloud:

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = "Python is a great language for data analysis and visualization"
wc = WordCloud(background_color="white").generate(text)
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.show()

wordcloud is a library for creating word clouds, which are a popular way to visualize the frequency of words in a text corpus.

72. pydot:

import pydot

graph = pydot.Dot(graph_type="digraph")
node_a = pydot.Node("A")
node_b = pydot.Node("B")
edge = pydot.Edge(node_a, node_b)
graph.add_edge(edge)
graph.write_png("graph.png")

pydot is a library for creating and manipulating graph descriptions in the DOT language, which is a popular language for representing directed and undirected graphs.

73. tensorflow:

import tensorflow as tf

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation="softmax")
])

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5)

TensorFlow is an open-source library for machine learning and artificial intelligence, providing a flexible platform for defining and running computational graphs, with support for deep learning models.

74. pytorch:

import torch
import torch.nn as nn
import torch.optim as optim

X = torch.randn(100, 20)
y = torch.randint(0, 2, (100,))

model = nn.Sequential(nn.Linear(20, 10), nn.ReLU(), nn.Linear(10, 2))
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

for epoch in range(10):
    optimizer.zero_grad()
    y_pred = model(X)
    loss = criterion(y_pred, y)
    loss.backward()
    optimizer.step()

PyTorch is an open-source machine learning library based on the Torch library, providing tensor computation and deep learning capabilities with strong GPU acceleration support.

75. xgboost:

import xgboost as xgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

param = {"objective": "reg:squarederror", "eval_metric": "rmse"}
bst = xgb.train(param, dtrain, num_boost_round=100, evals=[(dtest, "test")])

y_pred = bst.predict(dtest)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

XGBoost is a scalable and high-performance gradient boosting library for tree-based models, providing a flexible and efficient solution for supervised learning tasks.

76. lightgbm:

import lightgbm as lgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

dtrain = lgb.Dataset(X_train, label=y_train)
dtest = lgb.Dataset(X_test, label=y_test)

param = {"objective": "binary", "metric": "binary_logloss"}
bst = lgb.train(param, dtrain, num_boost_round=100, valid_sets=[dtest])

y_pred = np.round(bst.predict(X_test))
accuracy = accuracy_score(y_test, y_pred)

LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be efficient and scalable for large datasets and high-performance tasks.

77. catboost:

from catboost import CatBoostClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=2, verbose=0)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

CatBoost is a high-performance gradient boosting library, designed specifically for categorical feature handling and improving performance on datasets with categorical features.

78. Beautiful Soup:

from bs4 import BeautifulSoup
import requests

url = "https://www.medium.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

headings = soup.find_all("h1")
for heading in headings:
    print(heading.text)

BeautifulSoup is a library for parsing HTML and XML documents, providing an easy-to-use interface for navigating, searching, and modifying the parse tree.

79. Scrapy:

import scrapy
from scrapy.crawler import CrawlerProcess

class ExampleSpider(scrapy.Spider):
    name = "example"
    start_urls = ["https://www.example.com"]

    def parse(self, response):
        headings = response.css("h1::text").getall()
        print(headings)

process = CrawlerProcess()
process.crawl(ExampleSpider)
process.start()

crapy is an open-source web crawling framework.

80. nltk:

import nltk

nltk.download("punkt")
text = "Natural Language Processing is an interesting field of study."
tokens = nltk.word_tokenize(text)
print(tokens)

nltk (Natural Language Toolkit) is a library for working with human language data (text), providing tools for text processing, classification, tokenization, stemming, and more.

81. spaCy:

import spacy

nlp = spacy.load("en_core_web_sm")
text = "Apple Inc. is a technology company based in Cupertino, California."
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_)

spaCy is an open-source library for advanced Natural Language Processing tasks, providing support for part-of-speech tagging, named entity recognition, and various other NLP tasks.

82. genism:

from gensim.models import Word2Vec
from gensim.utils import simple_preprocess

sentences = ["Machine learning is a subset of artificial intelligence.",
             "Deep learning is a subfield of machine learning."]
tokenized_sentences = [simple_preprocess(s) for s in sentences]

model = Word2Vec(tokenized_sentences, min_count=1)
print(model.wv["machine"])

gensim is a library for unsupervised topic modeling and natural language processing, providing implementations for popular algorithms like Word2Vec, FastText, and LDA.

83. pymongo:

import pymongo

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["example_db"]
collection = db["example_collection"]

data = {"name": "John Doe", "age": 30, "city": "New York"}
result = collection.insert_one(data)
print(result.inserted_id)

pymongo is a Python driver for MongoDB, allowing you to work with MongoDB databases using Python-like syntax and data structures.

84. openpyxl:

import openpyxl

workbook = openpyxl.Workbook()
sheet = workbook.active
sheet["A1"] = "Hello"
sheet["B1"] = "World"

workbook.save("example.xlsx")

openpyxl is a library for reading and writing Excel files (xlsx), allowing you to work with Excel spreadsheets using Python.

85. xlrd:

import xlrd

workbook = xlrd.open_workbook("example.xls")
sheet = workbook.sheet_by_index(0)

for row in range(sheet.nrows):
    print(sheet.row_values(row))

xlrd is a library for reading data and formatting information from Excel files (xls and xlsx), allowing you to extract data from Excel spreadsheets using Python.

86. xlwt:

import xlwt

workbook = xlwt.Workbook()
sheet = workbook.add_sheet("Sheet1")
sheet.write(0, 0, "Hello")
sheet.write(0, 1, "World")

workbook.save("example.xls")

xlwt is a library for writing data and formatting information to Excel files (xls), allowing you to create Excel spreadsheets using Python.

87. PyPDF2:

import PyPDF2

with open("example.pdf", "rb") as file:
    reader = PyPDF2.PdfFileReader(file)
    print(f"Number of pages: {reader.numPages}")

    page = reader.getPage(0)
    print(page.extractText())

PyPDF2 is a library for working with PDF files, allowing you to extract text, metadata, and other information from PDF documents using Python.

88. pdfminer:

from pdfminer.high_level import extract_text

text = extract_text("example.pdf")
print(text)

pdfminer is a library for extracting text, metadata, and other information from PDF files, providing a more advanced and customizable interface for working with PDF documents using Python.

89. pytesseract:

from PIL import Image
import pytesseract

image = Image.open("example.png")
text = pytesseract.image_to_string(image)
print(text)

pytesseract is an OCR (Optical Character Recognition) library for Python, allowing you to extract text from images using the Tesseract OCR engine.

90. Pillow:

from PIL import Image

image = Image.open("example.jpg")
image.thumbnail((100, 100))
image.save("thumbnail.jpg")

Pillow is a fork of the Python Imaging Library (PIL) that provides extensive file format support, an efficient internal representation, and powerful image processing capabilities for Python.

91. holoviews:

import holoviews as hv
from bokeh.resources import INLINE
hv.extension("bokeh", logo=False)

x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

curve = hv.Curve((x, y), "x", "y")
curve.opts(width=400, height=300, line_color="red")

HoloViews is a high-level visualization library for creating interactive plots with concise expressions, allowing you to create complex and flexible visualizations without writing large amounts code.

92. dask:

import dask.array as da

x = da.random.random((10000, 10000), chunks=(1000, 1000))
y = x + x.T
z = y[::2, 5000:].mean(axis=1)

result = z.compute()
print(result)

Dask is a parallel computing library for Python that allows you to parallelize operations on large data structures like arrays, dataframes, and lists, providing an alternative to NumPy, pandas, and other libraries for handling large-scale data.

93. pyarrow:

import pyarrow as pa
import pyarrow.parquet as pq

data = pa.Table.from_pandas(pd.DataFrame({"A": range(5), "B": range(5, 10)}))
pq.write_table(data, "example.parquet")

PyArrow is a cross-language development platform for in-memory data, providing tools for working with Apache Arrow, a standardized columnar memory format for high-performance analytics.

94. sympy:

from sympy import symbols, Eq, solve

x, y = symbols("x y")
eq1 = Eq(3 * x + 4 * y, 12)
eq2 = Eq(x - y, 2)

solutions = solve((eq1, eq2), (x, y))
print(solutions)

SymPy is a Python library for symbolic mathematics, allowing you to perform algebraic manipulations, calculus, linear algebra, and more using symbolic expressions.

95. redis:

import redis

r = redis.Redis(host="localhost", port=6379, db=0)
r.set("name", "John Doe")
print(r.get("name"))

redis-py is a Python client for Redis, a high-performance in-memory data store, providing an easy-to-use interface for working with Redis data structures like strings, hashes, lists, sets, and sorted sets.

96. lxml:

from lxml import etree

xml_data = "<root><element>text</element></root>"
root = etree.fromstring(xml_data)
element = root.find("element")
print(element.text)

lxml is a library for processing XML and HTML in Python, providing a fast and easy-to-use interface for parsing, validating, and manipulating XML and HTML documents.

97. Opencv:

import cv2

img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
resized_img = cv2.resize(img, (100, 100))
cv2.imwrite('resized_image.jpg', resized_img)

OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning software library, providing a wide range of functionalities for image and video processing, including object detection, feature extraction, and image transformation.

98. IMDbpy:

from imdb import IMDb

ia = IMDb()

movie = ia.get_movie('0133093')  # The Matrix (1999)
print(movie.summary())

IMDbPY is a Python package for accessing the IMDb’s movie database, providing an easy-to-use interface for retrieving information about movies, people, characters, and companies.

99. Hugging Face Transformers:

from transformers import pipeline

summarizer = pipeline("summarization")
text = "Hugging Face is a company based in New York and Paris that provides state-of-the-art natural language processing models."
summary = summarizer(text, max_length=25, min_length=5, do_sample=False)

print(summary[0]["summary_text"])

Hugging Face Transformers is a library for working with state-of-the-art natural language processing models, such as BERT, GPT, and RoBERTa, providing an easy-to-use interface for tasks like text classification, summarization, and generation.

100. joblib:

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from joblib import dump, load

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.3)

model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

dump(model, "random_forest_digits.joblib")

loaded_model = load("random_forest_digits.joblib")
print("Loaded model score:", loaded_model.score(X_test, y_test))

joblib is a library for saving and loading trained models, as well as parallelizing tasks for faster computation.

101. SQLAlchemy:

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Base = declarative_base()
engine = create_engine('sqlite:///mydb.sqlite3')

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String)

Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()

SQL toolkit and Object-Relational Mapper (ORM)
Simplifies database operations

102. MLFlow:

import mlflow
mlflow.start_run()
mlflow.log_param("param_name", param_value)
mlflow.log_metric("metric_name", metric_value)
mlflow.end_run103.ensorFlow Extended (TFX)

End-to-end platform for deploying production ML pipelines
Code snippet: import tensorflow_data_validation as tfdv
Offers a set of libraries for data validation, transformation, and serving
Integrates with TensorFlow for machine learning model training()

The platform for managing the ML lifecycle
Provides tools for tracking experiments, packaging code, and sharing results

103. TensorFlow Extended (TFX):

End-to-end platform for deploying production ML pipelines
Code snippet: import tensorflow_data_validation as tfdv
Offers a set of libraries for data validation, transformation, and serving
Integrates with TensorFlow for machine learning model training

104. Prefect:

Workflow management system for building, scheduling, and monitoring data pipelines
Code snippet: from prefect import task, Flow
Provides a Pythonic way to define tasks and their dependencies
Supports distributed execution and offers a UI for monitoring

105. Kedro:

Open-source framework for creating reproducible, maintainable, and modular data science code
Code snippet: from kedro.pipeline import Pipeline
Facilitates organizing code into pipelines and nodes
Integrates with various data processing and ML libraries

106. Apache Airflow:

Workflow management platform for scheduling and monitoring data pipelines
Code snippet: from airflow import DAG
Allows creating, scheduling, and monitoring workflows using Directed Acyclic Graphs (DAGs)
Supports a wide variety of operators for integrating various services

107. PyTorch Lightning:

Lightweight PyTorch wrapper for high-performance AI research
Code snippet: import pytorch_lightning as pl
Simplifies training, evaluation, and deployment of PyTorch models
Offers built-in support for distributed training and mixed-precision

108. Optuna:

Automatic hyperparameter optimization framework
Code snippet: import optuna
Supports various optimization algorithms
Offers easy integration with popular ML libraries

109. Ray:

Distributed computing framework for parallel and distributed Python applications
Code snippet: import ray
Enables scaling and parallelizing Python applications easily
Provides a simple API for implementing distributed algorithms

110. ONNX:

Open Neural Network Exchange, a format for interchangeable AI models
Code snippet: import onnx
Allows models to be trained in one framework and run in another
Supports

111. Scikit-image: Image processing library:

from skimage import io, filters
image = io.imread('image.png')
edges = filters.sobel(image)

112. Celery: A distributed task queue for Python:

from celery import Celery
app = Celery('tasks', broker='pyamqp://guest@localhost//')
@app.task
def add(x, y):
    return x + y

113. TensorBoard: A visualization toolkit for TensorFlow:

from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
writer.add_scalar("Training loss", loss, global_step=step)
writer.close()Conclusion:

114. Boto3: The Amazon Web Services (AWS) SDK for Python:

import boto3
s3 = boto3.resource

115. Click: A library for creating beautiful command-line interfaces:

import click
@click.command()
@click.option('--count', default=1, help='Number of greetings.')
def hello(count):
    for _ in range(count):
        click.echo('Hello, World!')

116. Keras-tuner:

A library for hyperparameter tuning of Keras models:

from kerastuner.tuners import RandomSearch
tuner = RandomSearch(
    build_model,
    objective='val_loss',
    max_trials=5,
    executions_per_trial=3,
    directory='my_dir',
    project_name='helloworld')

117. Imbalanced-learn:

A Python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance.

from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_resampled, y_resampled = smote.fit_resample(X, y)

118. Tqdm:

A fast, extensible progress bar for loops and other iterable objects.

from tqdm import tqdm
import time
for i in tqdm(range(10)):
    time.sleep(0.1)

119. Tabulate:

A library for pretty-printing tabular data.

from tabulate import tabulate
table = [["Alice", 24], ["Bob", 19]]
headers = ["Name", "Age"]
print(tabulate(table, headers=headers))

120. Pendulum:

A library to work with dates and times more easily.

import pendulum
now = pendulum.now()
print(now.to_date_string())

121. PuLP:

A linear programming library in Python.

from pulp import LpProblem, LpVariable, LpMaximize
prob = LpProblem("My Problem", LpMaximize)
x = LpVariable("x", 0, 4)
y = LpVariable("y", -1, 1)

122. Graphviz:

A library for creating, manipulating, and rendering graphs.

from graphviz import Digraph
g = Digraph('G', filename='hello.gv')
g.edge('Hello', 'World')
g.view()

123. PySpark:

The Python library for Spark, an open-source, distributed computing system that provides high-level API for distributed data processing.

from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").appName("Word Count").getOrCreate()

124. Elasticsearch-py:

The official Elasticsearch client library for Python.

from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

125. Shap:

A library to explain the output of any machine learning model using Shapley values.

import shap
explainer = shap.Explainer(model)
shap_values = explainer(X_test)

Conclusion:

In conclusion, Python’s vast array of modules and packages showcases its adaptability and versatility across a wide range of applications, including data analysis, machine learning, web development, and more. With over 350,000 packages at their disposal, developers can leverage Python’s extensive ecosystem to efficiently tackle complex problems and accelerate their projects. This article has provided an overview of some of the most powerful and widely-used Python modules, demonstrating their potential in various domains.

References:

GitHub repositories for some of the modules mentioned in the article:

Matplotlib: https://matplotlib.org/
Seaborn: https://seaborn.pydata.org/
Plotly: https://plotly.com/python/
NumPy: https://numpy.org/
Pandas: https://pandas.pydata.org/
SciPy: https://www.scipy.org/
Scikit-learn: https://scikit-learn.org/
TensorFlow: https://www.tensorflow.org/
Keras: https://keras.io/
PyTorch: https://pytorch.org/
NLTK: https://www.nltk.org/
SpaCy: https://spacy.io/
Gensim: https://radimrehurek.com/gensim/
OpenCV: https://opencv.org/
Requests: https://docs.python-requests.org/
Beautiful Soup: https://www.crummy.com/software/BeautifulSoup/
Scrapy: https://scrapy.org/
Flask: https://flask.palletsprojects.com/
Django: https://www.djangoproject.com/
SQLAlchemy: https://www.sqlalchemy.org/
https://pypistats.org/top
https://pypi.org/
https://hugovk.github.io/top-pypi-packages/
https://pypistats.org/
Python Standard Library-https://docs.python.org/3/library/