Mastering File Handling in Python: A Comprehensive Guide Part 12 (End of the Series )

Published in

Python’s Gurus

10 min readJun 5, 2024

Saving data with shelve

A shelf is a persistent dictionary-like object. The beauty of it is that the values you save into a shelf can be any objects you can pickle, so you’re not restricted like you would be if you were using a database. Albeit interesting and useful, the shelve module is used quite rarely in practice. Just for completeness, let’s see a quick example of how it works:

# persistence/shelf.py
import shelve

class Person:
  def __init__(self, name, id):
    self.name = name
    self.id = id
with shelve.open('shelf1.shelve') as db:
  db['obi1'] = Person('Obi-Wan', 123)
  db['ani'] = Person('Anakin', 456)
  db['a_list'] = [2, 3, 5]
  db['delete_me'] = 'we will have to delete this one...'
  print(list(db.keys())) # 'ani', 'delete_me', 'a_list', 'obi1']

  del db['delete_me'] # gone!
  print(list(db.keys())) # ['ani', 'a_list', 'obi1']
  print('delete_me' in db) # False
  print('ani' in db) # True
  a_list = db['a_list']
  a_list.append(7)
  db['a_list'] = a_list
  print(db['a_list']) # [2, 3, 5, 7]

Apart from the wiring and the boilerplate around it, this example resembles an exercise with dictionaries. We create a simple Person class and then we open a shelve file within a context manager. As you can see, we use the dictionary syntax to store four objects: two Person instances, a list, and a string. If we print the keys, we get a list containing the four keys we used. Immediately after printing it, we delete the (aptly named) delete_me key/value pair from the shelf. Printing the keys again shows the deletion has succeeded. We then test a couple of keys for membership and, finally, we append number 7 to a_list. Notice how we have to extract the list from the shelf, modify it, and save it again.

If this behavior is undesired, there is something we can do:

# persistence/shelf.py
with shelve.open('shelf2.shelve', writeback=True) as db:
  db['a_list'] = [11, 13, 17]
  db['a_list'].append(19) # in-place append!
  print(db['a_list']) # [11, 13, 17, 19]

By opening the shelf with writeback=True, we enable the writeback feature, which allows us to simply append to a_list as if it actually was a value within a regular dictionary. The reason why this feature is not active by default is that it comes with a price that you pay in terms of memory consumption and slower closing of the shelf. Now that we have paid homage to the standard library modules related to data persistence, let’s take a look at one of the most widely adopted ORMs in the Python ecosystem: SQLAlchemy.

Saving data to a database

For this example, we are going to work with an in-memory database, which will make things simpler for us. In the source code of the book, we have left a couple of comments to show you how to generate a SQLite file, so we hope you’ll explore that option as well.

You can find a free database browser for SQLite at https://
dbeaver.io. DBeaver is a free multi-platform database tool for
developers, database administrators, analysts, and all people who
need to work with databases. It supports all popular databases:
MySQL, PostgreSQL, SQLite, Oracle, DB2, SQL Server, Sybase,
MS Access, Teradata, Firebird, Apache Hive, Phoenix, Presto,
and so on.

Before we dive into the code, allow us to briefly introduce the concept of a relational database.

A relational database is a database that allows you to save data following
the relational model, invented in 1969 by Edgar F. Codd. In this model, data is stored in one or more tables. Each table has rows (also known as records, or tuples), each of which represents an entry in the table. Tables also have columns (also known as attributes), each of which represents an attribute of the records. Each record is identified through a unique key, more commonly known as the primary key, which is the union of one or more columns in the table. To give you an example: imagine a table called Users, with columns id, username, password, name, and surname.

Such a table would be perfect to contain users of our system; each row would represent a different user. For example, a row with the values 3, fab, my_wonderful_pwd, Fabrizio, and Romano would represent Fabrizio’s user in the system.

The reason why the model is called relational is because you can establish relations between tables. For example, if you added a table called PhoneNumbers to our fictitious database, you could insert phone numbers into it, and then, through a relation, establish which phone number belongs to which user.

In order to query a relational database, we need a special language. The main standard is called SQL, which stands for Structured Query Language. It is born out of something called relational algebra, which is a family of algebras used to model data stored according to the relational model and perform queries on it. The most common operations you can perform usually involve filtering on the rows or columns, joining tables, aggregating the results according to some criteria, and so on. To give you an example in English, a query on our imaginary database could be: Fetch all users (username, name, surname) whose username starts with “m”, who have
at most one phone number. In this query, we are asking for a subset of the columns in the User table. We are filtering on users by taking only those whose username starts with the letter m, and even further, only those who have at most one phone number.

Now, each database comes with its own flavor of SQL. They all respect the standard to some extent, but none fully do, and they are all different from one another in some respects. This poses an issue in modern software development. If our application contains SQL code, it is quite likely that if we decided to use a different database engine, or maybe a different version of the same engine, we would find our SQL code needs amending.

This can be quite painful, especially since SQL queries can become very complicated quite quickly. In order to alleviate this pain a little, computer scientists have created code that maps objects of a programming language to tables of a relational database. Unsurprisingly, the name of such a tool is Object-Relational Mapping (ORM).

In modern application development, you would normally start interacting with a database by using an ORM, and should you find yourself in a situation where you can’t perform a query you need to perform through the ORM, you would then resort to using SQL directly. This is a good compromise between having no SQL at all, and using no ORM, which ultimately means specializing the code that interacts with the database, with the aforementioned disadvantages.

In this section, we would like to show an example that leverages SQLAlchemy, one of the most popular third-party Python ORMs. You will have to pip install it into the virtual environment for this chapter. We are going to define two models (Person and Address), each of which maps to a table, and then we’re going to populate the database and perform a few queries on it.

Let’s start with the model declarations:

# persistence/alchemy_models.py
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import (
  Column, Integer, String, ForeignKey, create_engine)
from sqlalchemy.orm import relationship

At the beginning, we import some functions and types. The first thing we need to do then is to create an engine. This engine tells SQLAlchemy about the type of database we have chosen for our example, and how to connect to it:

# persistence/alchemy_models.py
engine = create_engine('sqlite:///:memory:')
Base = declarative_base()

class Person(Base):
  __tablename__ = 'person'
  id = Column(Integer, primary_key=True)
  name = Column(String)
  age = Column(Integer)

  addresses = relationship(
    'Address',
    back_populates='person',
    order_by='Address.email',
    cascade='all, delete-orphan'
)

  def __repr__(self):
    return f'{self.name}(id={self.id})'

class Address(Base):
  __tablename__ = 'address'

  id = Column(Integer, primary_key=True)
  email = Column(String)
  person_id = Column(ForeignKey('person.id'))
  person = relationship('Person', back_populates='addresses')

  def __str__(self):
    return self.email
  __repr__ = __str__
Base.metadata.create_all(engine)

Each model then inherits from the Base table, which in this example simply consists of the default, returned by declarative_base(). We define Person, which maps to a table called person, and exposes the attributes id, name, and age. We also declare a relationship with the Address model, by stating that accessing the addresses attribute will fetch all the entries in the address table that are related to the particular Person instance we’re dealing with. The cascade option affects how creation and deletion work, but it is a more advanced concept, so we suggest you ignore it for now and maybe investigate more later on.

The last thing we declare is the __repr__() method, which provides us with
the official string representation of an object. This is supposed to be a representation that can be used to completely reconstruct the object, but in this example, we simply use it to provide something in output. Python redirects repr(obj) to a call to obj.__repr__().

We also declare the Address model, which will contain email addresses,
and a reference to the person they belong to. You can see the person_id
and person attributes are both about setting a relation between the Address
and Person instances. Note also how we declare the __str__() method on
Address, and then assign an alias to it, called __repr__(). This means that calling either repr() or str() on Address objects will ultimately result in calling the __str__() method. This is quite a common technique in Python, used to avoid duplicating the same code, so we took the opportunity to show it to you here.

On the last line, we tell the engine to create tables in the database according to our models.

The create_engine() function supports a parameter called echo,
which can be set to True, False, or the string "debug", to enable
different levels of logging of all statements and the repr() of their
parameters. Please refer to the official SQLAlchemy documentation
for further information.

A deeper understanding of this code would require more space than we can afford, so we encourage you to read up on database management systems (DBMS), SQL, relational algebra, and SQLAlchemy.

Now that we have our models, let’s use them to persist some data!

Take a look at the following example:

# persistence/alchemy.py
from alchemy_models import Person, Address, engine
from sqlalchemy.orm import sessionmaker

Session = sessionmaker(bind=engine)
session = Session()

First we create session, which is the object we use to manage the database. Next, we proceed by creating two people:

anakin = Person(name='Anakin Skywalker', age=32)
obi1 = Person(name='Obi-Wan Kenobi', age=40)

We then add email addresses to both of them, using two different techniques. One assigns them to a list, and the other one simply appends them:

obi1.addresses = [
Address(email='obi1@example.com'),
Address(email='wanwan@example.com'),
]

anakin.addresses.append(Address(email='ani@example.com'))
anakin.addresses.append(Address(email='evil.dart@example.com'))
anakin.addresses.append(Address(email='vader@example.com'))

We haven’t touched the database yet. It’s only when we use the session object that something actually happens in it:

session.add(anakin)
session.add(obi1)
session.commit()

Adding the two Person instances is enough to also add their addresses (this is thanks to the cascading effect). Calling commit() is what actually tells SQLAlchemy to commit the transaction and save the data in the database. A transaction is an operation that provides something like a sandbox, but in a database context.

As long as the transaction hasn’t been committed, we can roll back any modification we have done to the database, and by doing so, revert to the state we were in before starting the transaction itself. SQLAlchemy offers more complex and granular ways to deal with transactions, which you can study in its official documentation, as it is quite an advanced topic. We now query for all the people whose name starts with Obi by using like(), which hooks to the LIKE operator in SQL:

obi1 = session.query(Person).filter(
Person.name.like('Obi%')
).first()
print(obi1, obi1.addresses)

We take the first result of that query (we know we only have Obi-Wan anyway), and print it. We then fetch anakin by using an exact match on his name, just to show you a different way of filtering:

anakin = session.query(Person).filter(
Person.name=='Anakin Skywalker'
).first()
print(anakin, anakin.addresses)

We then capture Anakin’s ID, and delete the anakin object from the global frame (this does not delete the entry from the database):

anakin_id = anakin.id
del anakin

The reason we do this is because we want to show you how to fetch an object by its ID. Before we do that, we write the display_info() function, which we will use to display the full content of the database (fetched starting from the addresses, in order to demonstrate how to fetch objects by using a relation attribute in SQLAlchemy):

def display_info():

# get all addresses first
  addresses = session.query(Address).all()
# display results
  for address in addresses:
    print(f'{address.person.name} <{address.email}>')
# display how many objects we have in total
  print('people: {}, addresses: {}'.format(
    session.query(Person).count(),
    session.query(Address).count())
  )

The display_info() function prints all the addresses, along with the respective person’s name, and, at the end, produces a final piece of information regarding the number of objects in the database. We call the function, then we fetch and delete anakin. Finally, we display the info again, to verify that he has actually disappeared from the database:

display_info()

anakin = session.query(Person).get(anakin_id)
session.delete(anakin)
session.commit()

display_info()

The output of all these snippets run together is the following (for your convenience, we have separated the output into four blocks, to reflect the four blocks of code that actually produce that output):

$ python alchemy.py
Obi-Wan Kenobi(id=2) [obi1@example.com, wanwan@example.com]

Anakin Skywalker(id=1) [
  ani@example.com, evil.dart@example.com, vader@example.com
]

Anakin Skywalker <ani@example.com>
Anakin Skywalker <evil.dart@example.com>
Anakin Skywalker <vader@example.com>
Obi-Wan Kenobi <obi1@example.com>
Obi-Wan Kenobi <wanwan@example.com>
people: 2, addresses: 5

Obi-Wan Kenobi <obi1@example.com>
Obi-Wan Kenobi <wanwan@example.com>
people: 1, addresses: 2

As you can see from the last two blocks, deleting anakin has deleted one Person object and the three addresses associated with it. Again, this is due to the fact that cascading took place when we deleted anakin.

This concludes our brief introduction to data persistence. It is a vast and, at times, complex domain that we encourage you to explore, learning as much theory as possible. Lack of knowledge or proper understanding, when it comes to database systems, can really bite.

That’s it for today. It’s the last one from the File Handling Series. See you soon with a new series.

If you love my blogs consider purchasing me a book.

Python’s Gurus🚀

Thank you for being a part of the Python’s Gurus community!

Before you go:

Be sure to clap x50 time and follow the writer ️👏️️
Follow us: Newsletter
Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.