Getting Started with SQLAlchemy
Implementing the SQLAlchemy for database operations In Python
What is SQLAlchemy?
SQLAlchemy is an open-source SQL toolkit and object-relational mapper(ORM) for Python.
It facilitates communication between Python programs and databases. Most of the time, this library is used as an ORM tool.
What is ORM?
ORM is a programming technique for representing database records as corresponding objects in programming languages. Therefore, we could interact and perform operations on those database records as if we are dealing with objects.
ORM gives us the power to turn database records into objects, making database interactions and modifications as effortless as handling objects in programming languages.
What are the pros and cons of ORM?
Pros
- Conciseness. ORM allows users to use programming languages of their choice with more concise and simple syntax than using raw SQL queries, reducing the amount of code needed.
- Optimal. ORM also enables users to take advantage of the optimization of Object-Oriented Programmings such as inheritance, encapsulation, and abstraction by representing database records as objects.
- Flexibility. With ORM, users can easily switch between different database management systems (MySQL, Sqlite3, PostgreSQL,…) without having to possess a sound understanding of these systems as well as SQL language.
Cons
- Time. Because ORMs are frameworks, users would have to spend time learning and familiarizing themselves with these tools.
- Less control. Using ORM, users will have less control and initiative with databases.
Let’s learn how to use SQLAlchemy, a fairly well-known ORM for the Python programming language, and see how this tool could help you save a significant amount of time and effort when working with the database.
Installing
!pip install sqlalchemy
!pip install pymysql
Importing:
try:
import sqlalchemy as db
from sqlalchemy import create_engine
from sqlalchemy import *
import pymysql
print("all imported")
except:
print("error in importing")
Engine Configuration In SQLAlchemy
According to SQLAlchemy documentation: The Engine is the starting point for any SQLAlchemy application. It’s “home base” for the actual database and its DBAPI delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database/DBAPI combination.
The general structure can be illustrated as follows:
Where above, an Engine references both a Dialect and a Pool, which together interpret the DBAPI’s module functions as well as the behavior of the database.
Creating an engine is just a matter of issuing a single call, create_engine(). The API dialect of the database we are using is passed in create_engine.
For PostgreSQL
Default:
engine = create_engine(‘postgresql://scott:tiger@localhost/mydatabase’)
Others:
engine = create_engine (‘postgresql+psycopg2://scott:tiger@localhost/mydatabase’)
engine = create_engine(‘postgresql+pg8000://scott:tiger@localhost/mydatabase’)
For MySQL
Default
engine = create_engine(‘mysql://scott:tiger@localhost/foo’)
Others:
mysqlclient
- engine = create_engine(‘mysql+mysqldb://scott:tiger@localhost/foo’)
PyMySQL
- engine = create_engine(‘mysql+pymysql://scott:tiger@localhost/foo’)
Similarly for other databases:
Oracle:
engine = create_engine(‘oracle://scott:tiger@127.0.0.1:1521/sidname’)
Microsoft SQL Server
engine = create_engine(‘mssql+pymssql://scott:tiger@hostname:port/dbname’)
SQLite
SQLite connects to file-based databases, using the Python built-in module sqlite3 by default. As SQLite connects to local files, the URL format is slightly different. The “file” portion of the URL is the filename of the database. For a relative file path, this requires three slashes. The three slashes are followed by the absolute path:
engine = create_engine(‘sqlite:///foo.db’)
Let’s get started and Connect to my Mysql database :
format for mysql : create_engine('mysql+pymysql://<username>:<password>@<host>/<dbname>')
dialect+driver : mysql+pymysql
engine = db.create_engine('mysql+pymysql://root:shelvi31@127.0.0.1/errors?host=localhost?port=3306')connection = engine.connect()
print(engine.table_names())
REFLECTION: reads the database and builds SQLAlchemy Table Objects
SQLAlchemy can be used to automatically load tables from a database using something called reflection. Reflection is the process of reading the database and building the metadata based on that information.
Importing metadata and table needed for reflection. MetaData is an object is a catalog that stores DB information such as a table, such that we don't need to keep looking them up.
from sqlalchemy import MetaData,Table
metadata = db.MetaData()#Defining table object by giving name of table stores in db:
dummy = Table("dummy",metadata,autoload = True, autoload_with=engine)#Using Function repr to view the details of the table that we stored as dummy:
print(repr(dummy));# Print only the column names
print(dummy.columns.keys())
Note that the Engine and its underlying Pool do not establish the first actual DBAPI connection until the Engine.connect() method is called, or an operation which is dependent on this method such as Engine.execute() is invoked.
Querying
query = db.select([dummy]).where(dummy.columns.seic >= 20)
result = connection.execute(query)
resultset = result.fetchall()
resultset[0:4]#Converting the result database to df:
import pandas as pd
df = pd.DataFrame(resultset)
df.columns = resultset[0].keys()
df.head(10)
Executing more Queries
Using Where:
query = db.select([dummy.columns.seic,dummy.columns.seic_benefit]).where(dummy.columns.seic >= 70)
result = connection.execute(query)
resultset = result.fetchall()
resultset[:10]
Using In:
query = db.select([dummy.columns.seic,dummy.columns.seic_benefit]).where(dummy.columns.seic.in_([72.6,70,60]))
result = connection.execute(query)
resultset = result.fetchall()
resultset[:][(60.0, '89664'), (60.0, '17082'), (70.0, '76108'), (60.0, '27696')]
AND, OR, NOT
# SQL :
# SELECT * FROM dummy
# WHERE seic = 70 AND NOT scheme1_benefit = 'Eligible'
# SQLAlchemy :
query = db.select([dummy.columns.seic,dummy.columns.scheme1_benefit]).where(db.and_(dummy.columns.seic >= 80, dummy.columns.scheme1_benefit != 'Eligible'))
result = connection.execute(query)
resultset = result.fetchall()
resultset[:]# SQL :
# SELECT * FROM dummy
# WHERE seic = 70 OR NOT scheme1_benefit = 'Eligible'
# SQLAlchemy :
query = db.select([dummy.columns.seic,dummy.columns.scheme1_benefit]).where(db.or_(dummy.columns.seic >= 80, dummy.columns.scheme1_benefit != 'Eligible'))
result = connection.execute(query)
resultset = result.fetchall()
resultset[:]
ORDER BY:
# SQL :
# SELECT * FROM dummy
# ORDER BY seic DESC, seic_benefit DESC
# SQLAlchemy :
query = db.select([dummy]).where(dummy.columns.seic==60).order_by(db.desc(dummy.columns.seic), dummy.columns.seic_benefit)
result = connection.execute(query)
resultset = result.fetchall()
resultset[:10]
FUNCTIONS:
# SQL :
# SELECT COUNT(seic)
# FROM dummy
# WHERE seic==70
# SQLAlchemy :
query = db.select([db.func.count(dummy.columns.seic)]).where(dummy.columns.seic==60)
result = connection.execute(query)
resultset = result.fetchall()
resultset[:10][(3,)]
other functions include avg, sum, min, max…
GROUP BY:
# SQL :
# SELECT SUM(seic) as SEIC
# FROM dummy
# GROPU BY scheme1_benefit
# SQLAlchemy :
query= db.select([db.func.sum(dummy.columns.seic).label('SEIC')]).group_by(dummy.columns.scheme1_benefit)
result = connection.execute(query)
resultset = result.fetchall()
resultset[:5][(13828.999965667725,), (15699.400007247925,)]
DISTINCT
# SQL :
# SELECT DISTINCT seic
# FROM dummy
# SQLAlchemy :
query = db.select([db.func.count(dummy.columns.seic.distinct())])
result = connection.execute(query)
resultset = result.fetchall()
resultset[:5][(197,)]
Find my Jupyter Notebook and dataset here: https://github.com/shelvi31/SQLALchemy-in-python