Photo by Cameron Offer on Unsplash

The Ultimate Collection: 125 Python Packages for Data Science, Machine Learning, and Beyond

All about python packages

Senthil E
Published in
23 min readMar 22, 2023

--

Introduction:

Python, one of the world’s most popular programming languages, boasts a vast ecosystem of modules and packages, with over 350,000 available to developers. This rich collection of resources empowers Python developers to tackle a diverse range of tasks, from data analysis and machine learning to web development and automation. This article provides the most important modules in areas such as data science, machine learning, web development, and more.

Most downloaded PyPI packages.

Contents:

Let's see the important packages used in python.

1. Calendar:

import calendar

cal = calendar.TextCalendar()
cal.prmonth(2023, 3)

March 2023
Mo Tu We Th Fr Sa Su
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
  • General calendar-related functions.
  • Display calendars and handle dates.
  • Support for leap years, weekdays, and month ranges

2. Collections:

from collections import Counter

words = ["apple", "banana", "apple", "orange", "banana", "apple"]
word_count = Counter(words)

print(word_count)

Counter({'apple': 3, 'banana': 2, 'orange': 1})
  • Specialized container datatypes.
  • Counter, defaultdict, OrderedDict, namedtuple, and deque.
  • More efficient and flexible alternatives to built-in types.

3. bisect:

import bisect

sorted_list = [1, 3, 4, 4, 6, 8]
position = bisect.bisect_left(sorted_list, 4)

print(position)

#2
  • Array bisection algorithms for sorted sequences.
  • Binary search, insertion, and more.
  • Efficiently find and maintain sorted order in lists.

4. heapq:

import heapq

nums = [4, 7, 2, 5, 1, 3]
heapq.heapify(nums)

smallest = heapq.heappop(nums)
print(smallest)
#1
  • Heap queue algorithms (priority queues).
  • Maintain a sorted collection of items with efficient insertions and removals.
  • Useful for scheduling, priority-based tasks, and more.

5.json:

import json

data = {"name": "John", "age": 30}
json_string = json.dumps(data)
decoded_data = json.loads(json_string)

print(json_string)
print(decoded_data)

#{"name": "John", "age": 30}
#{'name': 'John', 'age': 30}
  • Encode and decode JSON data.
  • Serialize and deserialize Python objects to JSON format.
  • Store and exchange data in a lightweight, human-readable format.
Image by the Author

6. configparser:

import configparser

config = configparser.ConfigParser()
config.read("example.ini")

name = config.get("section", "name")
age = config.getint("section", "age")

print(name, age)
  • Configuration file parser
  • Read and write data from INI files
  • Manage application settings and user preferences

7. sched:

import sched
import time

def print_event(event_name):
print(f"Event: {event_name}")

s = sched.scheduler(time.time, time.sleep)
s.enter(5, 1, print_event, ("Event 1",))
s.enter(10, 1, print_event, ("Event 2",))
s.run()

#Event: Event 1
#Event: Event 2
  • General-purpose event scheduler
  • Schedule and execute tasks at specific times or intervals
  • Perform timed operations, such as periodic updates or reminders

8. random:

import random

random_float = random.random()
random_int = random.randint(1, 10)
random_choice = random.choice(["apple", "banana", "orange"])

print(random_float)
print(random_int)
print(random_choice)

#0.8998656473050128
#6
#orange
  • Generate random numbers and make random choices
  • Support for uniform, Gaussian, and other distributions
  • Shuffle and sample from sequences

9. secrets:

import secrets

token = secrets.token_hex(16)
p
print(token)

#97f3108ff85aef4b0b00c3c2154ae873

10. difflib:

import difflib

text1 = "The quick brown fox jumps over the lazy dog"
text2 = "The quick red fox jumps over the lazy dog"

differ = difflib.Differ()
diff = list(differ.compare(text1.split(), text2.split()))

print("\n".join(diff))
The
quick
- brown
+ red
fox
jumps
over
the
lazy
dog
  • Helpers for computing differences between sequences
  • Create and display diffs for text, lists, and other data
  • Identify changes, corrections, and updates in the data

11. timeit:

import timeit

def slow_function():
return sum(range(100000))

execution_time = timeit.timeit(slow_function, number=100)

print(f"Execution time: {execution_time} seconds")

#Execution time: 0.2170043410001199 seconds
  • Measure the execution time of small code snippets
  • Test performance and optimize code
  • Compare the speed of different solutions and implementations

12. pbt:

import pdb

def buggy_function(x):
y = x * 2
pdb.set_trace()
return y + x

result = buggy_function(10)
print(result)
  • Python debugger
  • Set breakpoints, step through code, and inspect variables
  • Debug and troubleshoot code interactively

13. xml.etree.ElementTree:

import xml.etree.ElementTree as ET

xml_string = """
<person>
<name>Alice</name>
<age>30</age>
<city>New York</city>
</person>
"""

root = ET.fromstring(xml_string)
name = root.find("name").text
age = int(root.find("age").text)
city = root.find("city").text

print(name, age, city)

#Alice 30 New York
  • XML processing library
  • Parse, manipulate, and generate XML documents
  • Read and write XML data in a structured and hierarchical format

14. HTMLParser:

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print(f"Start tag: {tag}")

def handle_endtag(self, tag):
print(f"End tag: {tag}")

def handle_data(self, data):
print(f"Data: {data.strip()}")

parser = MyHTMLParser()
parser.feed("<html><head><title>Example</title></head><body><p>Hello, world!</p></body></html>")
Start tag: html 
Start tag: head
Start tag: title
Data: Example
End tag: title
End tag: head
Start tag: body
Start tag: p
Data: Hello, world!
End tag: p
End tag: body
End tag: html
  • Basic HTML and XHTML parser
  • Extract information and data from HTML documents
  • Build custom web scrapers and data extraction tools

15. re:

import re

text = "The quick brown fox jumps over the lazy dog"
pattern = r"\b\w{5}\b"

five_letter_words = re.findall(pattern, text)

print(five_letter_words)

# ['quick', 'brown', 'jumps']
  • Regular expression operations
  • Search, match, and manipulate text based on patterns
  • Validate and clean data, extract information, and perform advanced text manipulation
Image by the Author

16. argparse:

import argparse

parser = argparse.ArgumentParser(description="Example command line program.")
parser.add_argument("--input", type=str, required=True, help="Input file")
parser.add_argument("--output", type=str, help="Output file")

args = parser.parse_args()

print(f"Input file: {args.input}")
print(f"Output file: {args.output}")
  • Command-line option and argument parsing
  • Create user-friendly command-line interfaces for your scripts and programs
  • Handle options, arguments, and flags with ease and flexibility

17. logging:

import logging

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")

logging.debug("This is a debug message")
logging.info("This is an info message")
logging.warning("This is a warning message")
logging.error("This is an error message")
logging.critical("This is a critical message")
WARNING:root:This is a warning message
ERROR:root:This is an error message
CRITICAL:root:This is a critical message
  • Flexible event logging system
  • Log messages with different severity levels to various outputs
  • Debug, monitor, and analyze your applications and systems

18. decimal:

from decimal import Decimal

a = Decimal("0.1")
b = Decimal("0.2")
c = a + b

print(c)

#0.3
  • Decimal fixed-point and floating-point arithmetic
  • Perform precise and accurate calculations with decimal numbers
  • Suitable for financial applications, scientific simulations, and other numerically sensitive tasks

19. fractions:

from fractions import Fraction

a = Fraction(1, 3)
b = Fraction(1, 6)
c = a + b

print(c)

#1/2
  • Rational number arithmetic
  • Perform calculations with fractions and exact rational numbers
  • Useful for exact arithmetic and applications with a focus on accuracy and precision

20. sqlite3:

import sqlite3

conn = sqlite3.connect("example.db")
c = conn.cursor()

c.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)")
c.execute("INSERT INTO users (name, age) VALUES (?, ?)", ("Alice", 30))
conn.commit()

for row in c.execute("SELECT * FROM users"):
print(row)

conn.close()

# (1, 'Alice', 30)
  • Manage and interact with SQLite databases directly from your Python code
  • Store, query, and manipulate data in a lightweight, serverless, and self-contained format
Image by the Author

21. requests:

import requests

response = requests.get("https://api.example.com/data")

if response.status_code == 200:
data = response.json()
print(data)
else:
print(f"Error: {response.status_code}")
  • HTTP library for making requests
  • Interact with RESTful APIs, download files, and scrape web content
  • Simplify HTTP requests with a user-friendly and feature-rich API
Image by the Author

22. flask:

from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/")
def hello():
return "Hello, World!"

@app.route("/api/data")
def api_data():
data = {"name": "Alice", "age": 30}
return jsonify(data)

if __name__ == "__main__":
app.run()
* Serving Flask app '__main__'
* Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
INFO:werkzeug:Press CTRL+C to quit
  • Lightweight web application framework
  • Build web applications, APIs, and microservices quickly and easily
  • Offers a flexible and extensible architecture for your web projects

23. Pytest:

# Install pytest using pip:
# pip install pytest

# Create a test file named "test_example.py" with the following content:

def add(a, b):
return a + b

def test_add():
assert add(1, 2) == 3
assert add(-1, 1) == 0
assert add(0, 0) == 0

# To run the tests, use the `pytest` command-line tool:
# pytest test_example.py
  • Testing framework for Python applications
  • Write and organize tests for your code, libraries, and projects
  • Discover, execute, and report on tests with ease and flexibility
Image by the Author

24. scipy:

import numpy as np
from scipy.optimize import minimize

def objective_function(x):
return x[0] ** 2 + x[1] ** 2

initial_guess = np.array([1, 1])
result = minimize(objective_function, initial_guess)

print(result.x)
[-1.07505143e-08 -1.07505143e-08]
  • Scientific computing library
  • Provides a wide range of algorithms and tools for optimization, integration, interpolation, and more
  • Builds on the NumPy library to offer advanced functionality for scientific applications and research

25. os:

import os

print(os.listdir("."))

#['.config', 'example.db', 'sample_data']
  • Interact with the operating system
  • File and directory management
  • Environment variables and process information

26. glob:

import glob

print(glob.glob("*.txt"))
  • Find all pathnames matching a specified pattern
  • Wildcard pattern matching
  • The simple way to list files in a directory

27. itertools:

import itertools

for combo in itertools.combinations("ABC", 2):
print(combo)

('A', 'B')
('A', 'C')
('B', 'C')
  • Iterator building blocks
  • Combinatoric generators, such as permutations and combinations
  • Memory-efficient looping

28. time:

import time

start_time = time.time()
time.sleep(2)
end_time = time.time()

print(f"Elapsed time: {end_time - start_time} seconds")

Elapsed time: 2.002392292022705 seconds
  • Time access and conversions
  • Measure performance and delays
  • Handle time zones and daylight saving time

29. datetime:

from datetime import datetime, timedelta

now = datetime.now()
one_week_from_now = now + timedelta(weeks=1)

print(now)
print(one_week_from_now)

2023-03-19 20:02:27.337600
2023-03-26 20:02:27.337600
  • Manipulate dates and times
  • Perform arithmetic with dates
  • Format and parse dates and times

30. hashlib:

import hashlib

message = "Hello, world!"
hashed = hashlib.sha256(message.encode("utf-8")).hexdigest()

print(hashed)

# 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3
  • Cryptographic hashing and message digest algorithms
  • Support for SHA, MD5, and other hash functions
  • Create secure hashes and check message integrity

31. urllib:

from urllib.request import urlopen
from urllib.parse import urlparse

url = "https://www.medium.com"
response = urlopen(url)
parsed_url = urlparse(url)

print(response.read())
print(parsed_url)
  • URL handling modules
  • Open URLs, encode and decode data, and parse URLs
  • Work with HTTP, HTTPS, and FTP protocols

32. flake8:

# example.py
def add(a, b):
return a + b

print(add(1, 2))
  • Run flake8 example.py to check for style and quality
  • Enforce PEP 8 code style
  • Catch errors and improve code readability

33. pathlib:

from pathlib import Path

current_dir = Path(".")
for file in current_dir.iterdir():
print(file)

#.config
#example.db
#sample_data
  • Object-oriented filesystem paths
  • Simplifies file and directory operations
  • Works on Windows, macOS, and Linux

34. smtplib:

import smtplib

from_email = "you@example.com"
to_email = "recipient@example.com"
message = f"Subject: Hello\n\nHello, {to_email}!"

with smtplib.SMTP("smtp.example.com", 587) as server:
server.starttls()
server.login(from_email, "your-password")
server.sendmail(from_email, to_email, message)
  • Send emails using the Simple Mail Transfer Protocol (SMTP)
  • Authenticate with email servers
  • Send plain-text and HTML emails

35. email:

from email.message import EmailMessage
import smtplib

msg = EmailMessage()
msg.set_content("Hello, world!")
msg["Subject"] = "Greetings"
msg["From"] = "you@example.com"
msg["To"] = "recipient@example.com"

with smtplib.SMTP("smtp.example.com", 587) as server:
server.starttls()
server.login("you@example.com", "your-password")
server.send_message(msg)
  • Create, manipulate, and parse email messages
  • Complements the smtplib and imaplib modules

35. yaml:

import yaml

data = {"name": "John", "age": 30}
yaml_data = yaml.dump(data)

print(yaml_data)

age: 30
name: John
  • Read and write YAML (YAML Ain’t Markup Language) files
  • Human-readable data serialization format
  • Supports complex data structures and custom data types

36. platform:

import platform

print(platform.system())
print(platform.python_version())

Linux
3.9.16
  • Access system and platform information
  • Retrieve the OS version, hardware details, and Python version
  • Write cross-platform code

37. math:

import math

print(math.sqrt(9))
print(math.sin(math.pi / 6))

3.0
0.49999999999999994
  • Basic mathematical functions and constants
  • Trigonometry, logarithms, exponentiation, and more
  • Floating-point arithmetic and rounding functions

38. Statistics:

import statistics

data = [1, 2, 3, 4, 5, 6]

print(statistics.mean(data))
print(statistics.median(data))
print(statistics.stdev(data))

3.5
3.5
1.8708286933869707
  • Basic statistical functions
  • Calculate the mean, median, mode, variance, and standard deviation
  • Handle data sets with missing or infinite values

39. queue:

import queue

q = queue.Queue()

for i in range(5):
q.put(i)

while not q.empty():
print(q.get())

0
1
2
3
4
  • FIFO (first-in, first-out), LIFO (last-in, first-out), and priority queues

40. tempfile:

import tempfile

with tempfile.NamedTemporaryFile(mode="w+t") as temp_file:
temp_file.write("Hello, world!")
temp_file.seek(0)
print(temp_file.read())

Hello, world!
  • Create temporary files and directories
  • Automatically clean up resources after use
  • Named and unnamed, file-like objects and file system entries

41. uuid:

import uuid

random_uuid = uuid.uuid4()
print(random_uuid)

4cef1f15-10ee-4dbb-84ec-c6527925258b
  • Universally unique identifiers (UUIDs)
  • Generate and manipulate 128-bit identifiers
  • Useful for generating unique keys, identifiers, or tokens

42. zipfile:

import zipfile

with zipfile.ZipFile("archive.zip", "w") as zf:
zf.write("file.txt")
  • Use the ‘zipfile’ module to create and extract ZIP archives.

43. csv:

import csv

with open("data.csv", "w", newline="") as csvfile:
writer = csv.writer(csvfile)
writer.writerow(["Name", "Age", "City"])
writer.writerow(["Alice", 30, "New York"])

Use the ‘csv’ module to read and write CSV files.

44. copy:

import copy

original_list = [[1, 2], [3, 4]]
shallow_copy = copy.copy(original_list)
deep_copy = copy.deepcopy(original_list)

Use the ‘copy’ module to create shallow and deep copies of lists or other mutable objects.

45. atexit:

import atexit

def goodbye():
print("Goodbye!")

atexit.register(goodbye)

Use the ‘atexit’ module to register functions to be called when the program exits.

46. pickle:

import pickle

data = {"a": 1, "b": 2}
serialized = pickle.dumps(data)
deserialized = pickle.loads(serialized)
  • Use the ‘pickle’ module to serialize and deserialize Python objects, which can be useful for storing data or communication between processes.

47. pprint:

import pprint

data = {"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]}
pprint.pprint(data, indent=4)

{'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}

Use the ‘pprint’ module to pretty-print complex data structures for better readability.

48. fileinput:

import fileinput

with fileinput.input(files=('file1.txt', 'file2.txt')) as f:
for line in f:
print(f.filename(), f.lineno(), line, end='')

Use the ‘fileinput’ module to read multiple files line by line, treating them as a single input stream.

Image by the Author

49. doctest:

def add(a, b):
"""
>>> add(1, 2)
3
"""
return a + b

if __name__ == "__main__":
import doctest
doctest.testmod()

Use the ‘doctest’ module to test your code by writing examples in your function’s docstring, allowing you to keep documentation and tests close to the code.

50. inspect:

import inspect

def my_function():
pass

print(inspect.getsource(my_function))

def my_function():
pass

Use the ‘inspect’ module to retrieve information about live objects, such as their source code, documentation, or call stack.

51. locale:

import locale

locale.setlocale(locale.LC_ALL, '')
formatted_number = locale.format_string("%d", 1234567, grouping=True)

Use the ‘locale’ module to work with locale-specific formatting of numbers, dates, and currency.

52. traceback:

import traceback

try:
1 / 0
except ZeroDivisionError:
traceback.print_exc()

Traceback (most recent call last):
File "<ipython-input-43-507d9690673b>", line 4, in <module>
1 / 0
ZeroDivisionError: division by zero

Use the ‘traceback’ module to print and format exception tracebacks, which can be useful for debugging and logging.

53. zlib:

import zlib

data = b"example data" * 100
compressed = zlib.compress(data)
decompressed = zlib.decompress(compressed)

Use the ‘zlib’ module to compress and decompress data using the DEFLATE algorithm, which can be useful for reducing storage space or network bandwidth usage.

54. fnmatch:

import fnmatch

filenames = ["file1.txt", "file2.pdf", "file3.txt"]
txt_files = [name for name in filenames if fnmatch.fnmatch(name, "*.txt")]

Use the ‘fnmatch’ module to filter filenames or other strings using shell-style wildcards, which can be helpful when working with file lists.

55. sys:

import sys

python_version = sys.version
script_name = sys.argv[0]

Use the ‘sys’ module to access system-specific parameters and functions, such as command-line arguments, Python version, or the current interpreter’s path.

56. shutil:

import shutil

shutil.copy("source.txt", "destination.txt")
shutil.move("old_location.txt", "new_location.txt")

Use the ‘shutil’ module to perform high-level file operations, such as copying or moving files and directories.

57. typing:

from typing import List, Tuple

def my_function(numbers: List[int]) -> Tuple[int, int]:
return min(numbers), max(numbers)

Use the ‘typing’ module to annotate your functions and classes with type hints, which can improve code readability and facilitate static type checking with tools like Mypy.

58. pkgutil:

import pkgutil

for importer, module_name, _ in pkgutil.iter_modules():
print(module_name)

Use the ‘pkgutil’ module to work with Python packages, such as listing all installed modules or iterating through a package’s modules.

59. array:

import array

arr = array.array("i", [1, 2, 3, 4, 5])

Use the ‘array’ module to create and manipulate arrays of fixed-size numeric types, which can be more memory-efficient and faster than using lists for large amounts of numerical data.

60. shelve:

import shelve

with shelve.open("my_shelf") as db:
db["data"] = {"key": "value"}

with shelve.open("my_shelf") as db:
print(db["data"])

Use the ‘shelve’ module to create and work with persistent dictionaries, which store key-value pairs on disk and can be used as a simple database for small-scale applications.

Image by the Author

61. numpy:

import numpy as np

array = np.array([1, 2, 3, 4, 5])
mean = np.mean(array)

NumPy is a powerful library for numerical computing in Python, providing support for arrays, matrices, and various mathematical operations.

62. pandas:

import pandas as pd

data = {"A": [1, 2, 3], "B": [4, 5, 6]}
df = pd.DataFrame(data)

pandas is a popular library for data manipulation and analysis, providing data structures like DataFrame and Series for handling tabular and time-series data.

Image by the Author

63. matplotlib:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
plt.plot(x, y)
plt.show()

matplotlib is a plotting library for creating static, interactive, and animated visualizations in Python, such as line, scatter, and bar plots.

64. seaborn:

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
sns.boxplot(x="day", y="total_bill", data=tips)
plt.show()

seaborn is a statistical data visualization library built on top of matplotlib, providing a high-level interface for creating informative and attractive statistical graphics.

65. sci-kit learn:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression().fit(X_train, y_train)

scikit-learn is a popular library for machine learning in Python, providing tools for classification, regression, clustering, and various other learning tasks.

66. statsmodel:

import statsmodels.api as sm
import numpy as np

X = np.random.rand(100)
y = 2 * X + 0.5 * np.random.randn(100)
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()

statsmodels is a library for estimating statistical models and performing statistical tests, offering a wide range of statistical models such as linear regression, logistic regression, and time series analysis.

67. plotly:

import plotly.express as px

data = px.data.iris()
fig = px.scatter(data, x="sepal_width", y="sepal_length", color="species")
fig.show()

Plotly is a library for creating interactive, web-based visualizations using Python, such as line charts, bar charts and scatter plots, with support for advanced features like animations and 3D plots.

68. bokeh:

from bokeh.plotting import figure, output_file, show

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

output_file("line.html")
p = figure(title="Line plot example", x_axis_label="x", y_axis_label="y")
p.line(x, y, legend_label="y=2x", line_width=2)
show(p)

Bokeh is a library for creating interactive, web-based visualizations in Python, providing a flexible and high-level interface for creating complex, feature-rich plots.

69. folium:

import folium

m = folium.Map(location=[45.523, -122.675], zoom_start=13)
folium.Marker([45.524, -122.674], popup="Portland, Oregon").add_to(m)
m.save("map.html")

folium is a library for creating interactive maps using Python and the popular JavaScript library Leaflet, allowing you to easily visualize spatial data and add various map layers and markers.

70. geopandas:

import geopandas as gpd

world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world.plot()

geopandas is a library for working with geospatial data in Python, extending the functionality of pandas by adding support for geospatial data types and operations.

71. wordcloud:

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = "Python is a great language for data analysis and visualization"
wc = WordCloud(background_color="white").generate(text)
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.show()

wordcloud is a library for creating word clouds, which are a popular way to visualize the frequency of words in a text corpus.

72. pydot:

import pydot

graph = pydot.Dot(graph_type="digraph")
node_a = pydot.Node("A")
node_b = pydot.Node("B")
edge = pydot.Edge(node_a, node_b)
graph.add_edge(edge)
graph.write_png("graph.png")

pydot is a library for creating and manipulating graph descriptions in the DOT language, which is a popular language for representing directed and undirected graphs.

Image by the Author

73. tensorflow:

import tensorflow as tf

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation="softmax")
])

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5)

TensorFlow is an open-source library for machine learning and artificial intelligence, providing a flexible platform for defining and running computational graphs, with support for deep learning models.

74. pytorch:

import torch
import torch.nn as nn
import torch.optim as optim

X = torch.randn(100, 20)
y = torch.randint(0, 2, (100,))

model = nn.Sequential(nn.Linear(20, 10), nn.ReLU(), nn.Linear(10, 2))
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

for epoch in range(10):
optimizer.zero_grad()
y_pred = model(X)
loss = criterion(y_pred, y)
loss.backward()
optimizer.step()

PyTorch is an open-source machine learning library based on the Torch library, providing tensor computation and deep learning capabilities with strong GPU acceleration support.

75. xgboost:

import xgboost as xgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

param = {"objective": "reg:squarederror", "eval_metric": "rmse"}
bst = xgb.train(param, dtrain, num_boost_round=100, evals=[(dtest, "test")])

y_pred = bst.predict(dtest)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

XGBoost is a scalable and high-performance gradient boosting library for tree-based models, providing a flexible and efficient solution for supervised learning tasks.

76. lightgbm:

import lightgbm as lgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

dtrain = lgb.Dataset(X_train, label=y_train)
dtest = lgb.Dataset(X_test, label=y_test)

param = {"objective": "binary", "metric": "binary_logloss"}
bst = lgb.train(param, dtrain, num_boost_round=100, valid_sets=[dtest])

y_pred = np.round(bst.predict(X_test))
accuracy = accuracy_score(y_test, y_pred)

LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be efficient and scalable for large datasets and high-performance tasks.

77. catboost:

from catboost import CatBoostClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=2, verbose=0)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

CatBoost is a high-performance gradient boosting library, designed specifically for categorical feature handling and improving performance on datasets with categorical features.

Image by the Author

78. Beautiful Soup:

from bs4 import BeautifulSoup
import requests

url = "https://www.medium.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

headings = soup.find_all("h1")
for heading in headings:
print(heading.text)

BeautifulSoup is a library for parsing HTML and XML documents, providing an easy-to-use interface for navigating, searching, and modifying the parse tree.

79. Scrapy:

import scrapy
from scrapy.crawler import CrawlerProcess

class ExampleSpider(scrapy.Spider):
name = "example"
start_urls = ["https://www.example.com"]

def parse(self, response):
headings = response.css("h1::text").getall()
print(headings)

process = CrawlerProcess()
process.crawl(ExampleSpider)
process.start()

crapy is an open-source web crawling framework.

80. nltk:

import nltk

nltk.download("punkt")
text = "Natural Language Processing is an interesting field of study."
tokens = nltk.word_tokenize(text)
print(tokens)

nltk (Natural Language Toolkit) is a library for working with human language data (text), providing tools for text processing, classification, tokenization, stemming, and more.

81. spaCy:

import spacy

nlp = spacy.load("en_core_web_sm")
text = "Apple Inc. is a technology company based in Cupertino, California."
doc = nlp(text)

for ent in doc.ents:
print(ent.text, ent.label_)

spaCy is an open-source library for advanced Natural Language Processing tasks, providing support for part-of-speech tagging, named entity recognition, and various other NLP tasks.

82. genism:

from gensim.models import Word2Vec
from gensim.utils import simple_preprocess

sentences = ["Machine learning is a subset of artificial intelligence.",
"Deep learning is a subfield of machine learning."]
tokenized_sentences = [simple_preprocess(s) for s in sentences]

model = Word2Vec(tokenized_sentences, min_count=1)
print(model.wv["machine"])

gensim is a library for unsupervised topic modeling and natural language processing, providing implementations for popular algorithms like Word2Vec, FastText, and LDA.

Image by the Author

83. pymongo:

import pymongo

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["example_db"]
collection = db["example_collection"]

data = {"name": "John Doe", "age": 30, "city": "New York"}
result = collection.insert_one(data)
print(result.inserted_id)

pymongo is a Python driver for MongoDB, allowing you to work with MongoDB databases using Python-like syntax and data structures.

84. openpyxl:

import openpyxl

workbook = openpyxl.Workbook()
sheet = workbook.active
sheet["A1"] = "Hello"
sheet["B1"] = "World"

workbook.save("example.xlsx")

openpyxl is a library for reading and writing Excel files (xlsx), allowing you to work with Excel spreadsheets using Python.

85. xlrd:

import xlrd

workbook = xlrd.open_workbook("example.xls")
sheet = workbook.sheet_by_index(0)

for row in range(sheet.nrows):
print(sheet.row_values(row))

xlrd is a library for reading data and formatting information from Excel files (xls and xlsx), allowing you to extract data from Excel spreadsheets using Python.

86. xlwt:

import xlwt

workbook = xlwt.Workbook()
sheet = workbook.add_sheet("Sheet1")
sheet.write(0, 0, "Hello")
sheet.write(0, 1, "World")

workbook.save("example.xls")

xlwt is a library for writing data and formatting information to Excel files (xls), allowing you to create Excel spreadsheets using Python.

87. PyPDF2:

import PyPDF2

with open("example.pdf", "rb") as file:
reader = PyPDF2.PdfFileReader(file)
print(f"Number of pages: {reader.numPages}")

page = reader.getPage(0)
print(page.extractText())

PyPDF2 is a library for working with PDF files, allowing you to extract text, metadata, and other information from PDF documents using Python.

88. pdfminer:

from pdfminer.high_level import extract_text

text = extract_text("example.pdf")
print(text)

pdfminer is a library for extracting text, metadata, and other information from PDF files, providing a more advanced and customizable interface for working with PDF documents using Python.

89. pytesseract:

from PIL import Image
import pytesseract

image = Image.open("example.png")
text = pytesseract.image_to_string(image)
print(text)

pytesseract is an OCR (Optical Character Recognition) library for Python, allowing you to extract text from images using the Tesseract OCR engine.

Image by the Author

90. Pillow:

from PIL import Image

image = Image.open("example.jpg")
image.thumbnail((100, 100))
image.save("thumbnail.jpg")

Pillow is a fork of the Python Imaging Library (PIL) that provides extensive file format support, an efficient internal representation, and powerful image processing capabilities for Python.

91. holoviews:

import holoviews as hv
from bokeh.resources import INLINE
hv.extension("bokeh", logo=False)

x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

curve = hv.Curve((x, y), "x", "y")
curve.opts(width=400, height=300, line_color="red")

HoloViews is a high-level visualization library for creating interactive plots with concise expressions, allowing you to create complex and flexible visualizations without writing large amounts code.

92. dask:

import dask.array as da

x = da.random.random((10000, 10000), chunks=(1000, 1000))
y = x + x.T
z = y[::2, 5000:].mean(axis=1)

result = z.compute()
print(result)

Dask is a parallel computing library for Python that allows you to parallelize operations on large data structures like arrays, dataframes, and lists, providing an alternative to NumPy, pandas, and other libraries for handling large-scale data.

93. pyarrow:

import pyarrow as pa
import pyarrow.parquet as pq

data = pa.Table.from_pandas(pd.DataFrame({"A": range(5), "B": range(5, 10)}))
pq.write_table(data, "example.parquet")

PyArrow is a cross-language development platform for in-memory data, providing tools for working with Apache Arrow, a standardized columnar memory format for high-performance analytics.

94. sympy:

from sympy import symbols, Eq, solve

x, y = symbols("x y")
eq1 = Eq(3 * x + 4 * y, 12)
eq2 = Eq(x - y, 2)

solutions = solve((eq1, eq2), (x, y))
print(solutions)

SymPy is a Python library for symbolic mathematics, allowing you to perform algebraic manipulations, calculus, linear algebra, and more using symbolic expressions.

95. redis:

import redis

r = redis.Redis(host="localhost", port=6379, db=0)
r.set("name", "John Doe")
print(r.get("name"))

redis-py is a Python client for Redis, a high-performance in-memory data store, providing an easy-to-use interface for working with Redis data structures like strings, hashes, lists, sets, and sorted sets.

96. lxml:

from lxml import etree

xml_data = "<root><element>text</element></root>"
root = etree.fromstring(xml_data)
element = root.find("element")
print(element.text)

lxml is a library for processing XML and HTML in Python, providing a fast and easy-to-use interface for parsing, validating, and manipulating XML and HTML documents.

97. Opencv:

import cv2

img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
resized_img = cv2.resize(img, (100, 100))
cv2.imwrite('resized_image.jpg', resized_img)

OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning software library, providing a wide range of functionalities for image and video processing, including object detection, feature extraction, and image transformation.

98. IMDbpy:

from imdb import IMDb

ia = IMDb()

movie = ia.get_movie('0133093') # The Matrix (1999)
print(movie.summary())

IMDbPY is a Python package for accessing the IMDb’s movie database, providing an easy-to-use interface for retrieving information about movies, people, characters, and companies.

99. Hugging Face Transformers:

from transformers import pipeline

summarizer = pipeline("summarization")
text = "Hugging Face is a company based in New York and Paris that provides state-of-the-art natural language processing models."
summary = summarizer(text, max_length=25, min_length=5, do_sample=False)

print(summary[0]["summary_text"])

Hugging Face Transformers is a library for working with state-of-the-art natural language processing models, such as BERT, GPT, and RoBERTa, providing an easy-to-use interface for tasks like text classification, summarization, and generation.

100. joblib:

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from joblib import dump, load

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.3)

model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

dump(model, "random_forest_digits.joblib")

loaded_model = load("random_forest_digits.joblib")
print("Loaded model score:", loaded_model.score(X_test, y_test))

joblib is a library for saving and loading trained models, as well as parallelizing tasks for faster computation.

101. SQLAlchemy:

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Base = declarative_base()
engine = create_engine('sqlite:///mydb.sqlite3')

class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)

Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
  • SQL toolkit and Object-Relational Mapper (ORM)
  • Simplifies database operations

102. MLFlow:

import mlflow
mlflow.start_run()
mlflow.log_param("param_name", param_value)
mlflow.log_metric("metric_name", metric_value)
mlflow.end_run103.ensorFlow Extended (TFX)

End-to-end platform for deploying production ML pipelines
Code snippet: import tensorflow_data_validation as tfdv
Offers a set of libraries for data validation, transformation, and serving
Integrates with TensorFlow for machine learning model training()
  • The platform for managing the ML lifecycle
  • Provides tools for tracking experiments, packaging code, and sharing results

103. TensorFlow Extended (TFX):

  • End-to-end platform for deploying production ML pipelines
  • Code snippet: import tensorflow_data_validation as tfdv
  • Offers a set of libraries for data validation, transformation, and serving
  • Integrates with TensorFlow for machine learning model training

104. Prefect:

  • Workflow management system for building, scheduling, and monitoring data pipelines
  • Code snippet: from prefect import task, Flow
  • Provides a Pythonic way to define tasks and their dependencies
  • Supports distributed execution and offers a UI for monitoring

105. Kedro:

  • Open-source framework for creating reproducible, maintainable, and modular data science code
  • Code snippet: from kedro.pipeline import Pipeline
  • Facilitates organizing code into pipelines and nodes
  • Integrates with various data processing and ML libraries

106. Apache Airflow:

  • Workflow management platform for scheduling and monitoring data pipelines
  • Code snippet: from airflow import DAG
  • Allows creating, scheduling, and monitoring workflows using Directed Acyclic Graphs (DAGs)
  • Supports a wide variety of operators for integrating various services

107. PyTorch Lightning:

  • Lightweight PyTorch wrapper for high-performance AI research
  • Code snippet: import pytorch_lightning as pl
  • Simplifies training, evaluation, and deployment of PyTorch models
  • Offers built-in support for distributed training and mixed-precision

108. Optuna:

  • Automatic hyperparameter optimization framework
  • Code snippet: import optuna
  • Supports various optimization algorithms
  • Offers easy integration with popular ML libraries

109. Ray:

  • Distributed computing framework for parallel and distributed Python applications
  • Code snippet: import ray
  • Enables scaling and parallelizing Python applications easily
  • Provides a simple API for implementing distributed algorithms

110. ONNX:

  • Open Neural Network Exchange, a format for interchangeable AI models
  • Code snippet: import onnx
  • Allows models to be trained in one framework and run in another
    Supports

111. Scikit-image: Image processing library:

from skimage import io, filters
image = io.imread('image.png')
edges = filters.sobel(image)

112. Celery: A distributed task queue for Python:

from celery import Celery
app = Celery('tasks', broker='pyamqp://guest@localhost//')
@app.task
def add(x, y):
return x + y

113. TensorBoard: A visualization toolkit for TensorFlow:

from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
writer.add_scalar("Training loss", loss, global_step=step)
writer.close()Conclusion:

114. Boto3: The Amazon Web Services (AWS) SDK for Python:

import boto3
s3 = boto3.resource

115. Click: A library for creating beautiful command-line interfaces:

import click
@click.command()
@click.option('--count', default=1, help='Number of greetings.')
def hello(count):
for _ in range(count):
click.echo('Hello, World!')

116. Keras-tuner:

A library for hyperparameter tuning of Keras models:

from kerastuner.tuners import RandomSearch
tuner = RandomSearch(
build_model,
objective='val_loss',
max_trials=5,
executions_per_trial=3,
directory='my_dir',
project_name='helloworld')

117. Imbalanced-learn:

A Python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance.

from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_resampled, y_resampled = smote.fit_resample(X, y)

118. Tqdm:

A fast, extensible progress bar for loops and other iterable objects.

from tqdm import tqdm
import time
for i in tqdm(range(10)):
time.sleep(0.1)

119. Tabulate:

A library for pretty-printing tabular data.

from tabulate import tabulate
table = [["Alice", 24], ["Bob", 19]]
headers = ["Name", "Age"]
print(tabulate(table, headers=headers))

120. Pendulum:

A library to work with dates and times more easily.

import pendulum
now = pendulum.now()
print(now.to_date_string())

121. PuLP:

A linear programming library in Python.

from pulp import LpProblem, LpVariable, LpMaximize
prob = LpProblem("My Problem", LpMaximize)
x = LpVariable("x", 0, 4)
y = LpVariable("y", -1, 1)

122. Graphviz:

A library for creating, manipulating, and rendering graphs.

from graphviz import Digraph
g = Digraph('G', filename='hello.gv')
g.edge('Hello', 'World')
g.view()

123. PySpark:

The Python library for Spark, an open-source, distributed computing system that provides high-level API for distributed data processing.

from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").appName("Word Count").getOrCreate()

124. Elasticsearch-py:

The official Elasticsearch client library for Python.

from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

125. Shap:

A library to explain the output of any machine learning model using Shapley values.

import shap
explainer = shap.Explainer(model)
shap_values = explainer(X_test)

Conclusion:

In conclusion, Python’s vast array of modules and packages showcases its adaptability and versatility across a wide range of applications, including data analysis, machine learning, web development, and more. With over 350,000 packages at their disposal, developers can leverage Python’s extensive ecosystem to efficiently tackle complex problems and accelerate their projects. This article has provided an overview of some of the most powerful and widely-used Python modules, demonstrating their potential in various domains.

References:

GitHub repositories for some of the modules mentioned in the article:

  1. Matplotlib: https://matplotlib.org/
  2. Seaborn: https://seaborn.pydata.org/
  3. Plotly: https://plotly.com/python/
  4. NumPy: https://numpy.org/
  5. Pandas: https://pandas.pydata.org/
  6. SciPy: https://www.scipy.org/
  7. Scikit-learn: https://scikit-learn.org/
  8. TensorFlow: https://www.tensorflow.org/
  9. Keras: https://keras.io/
  10. PyTorch: https://pytorch.org/
  11. NLTK: https://www.nltk.org/
  12. SpaCy: https://spacy.io/
  13. Gensim: https://radimrehurek.com/gensim/
  14. OpenCV: https://opencv.org/
  15. Requests: https://docs.python-requests.org/
  16. Beautiful Soup: https://www.crummy.com/software/BeautifulSoup/
  17. Scrapy: https://scrapy.org/
  18. Flask: https://flask.palletsprojects.com/
  19. Django: https://www.djangoproject.com/
  20. SQLAlchemy: https://www.sqlalchemy.org/
  21. https://pypistats.org/top
  22. https://pypi.org/
  23. https://hugovk.github.io/top-pypi-packages/
  24. https://pypistats.org/
  25. Python Standard Library-https://docs.python.org/3/library/

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Senthil E
Senthil E

Written by Senthil E

ML/DS - Certified GCP Professional Machine Learning Engineer, Certified AWS Professional Machine learning Speciality,Certified GCP Professional Data Engineer .