Manage cleanup of blob data in Amazon PostgreSQL RDS

dev writer
3 min readAug 30, 2021

--

Amazon provide managed Postgresql relational database for your cloud based applications. The biggest advantage of this compared to hosting your own server with PostgreSQL installed is that Amazon will take care of all the database administrative task of storage management, replication, high availability and throughput. Of course it all comes with a price. Since this a fully managed instance by Amazon, it limits the tools that you can run on these databases. One such tool is vacuumlo, this is the tool for reclaiming memory no longer referenced by any binary large object. But wait what is a binary large object or blob?

“PostgreSQL has a large object facility, which provides stream-style access to user data that is stored in a special large-object structure. Streaming access is useful when working with data values that are too large to manipulate conveniently as a whole” as explained in PostgreSQL documentation here. All large objects are stored in a single system table named pg_largeobject. Each large object also has an entry in the system table pg_largeobject_metadata.

Any PostgreSQL table having blob data type maintains the OID of the actual large object and the actual object storage is in pg_largeobject table. When the row from the table is deleted, the memory used by the large object is not cleaned up unlike other data types. This is because the same oid could be referenced in some other user table too.

To free the memory used up by large object that is no longer referenced by the user tables lo_unlink(<oid>) function needs to be invoked. In traditional standalone PostgreSQL setup, a utility called vacuum is available the one we talked about earlier in this article. It will clean up memory associated with all oids that correspond to deleted rows from table. It does this by finding all the oids that are currently present(hence in use) in all the tables in the database, it will then unlink all remaining oids that are present in pg_largeobject. The unlink as mentioned above frees up the memory linked to the oid. If however we do not invoke either of lo_unlink or vaccumlo then the database will keep on bloating over a period of time affecting performance and eventually functionality.

Cleaning up Large Objects through application code.

Any application which is using db is aware of the tables and schema. One possible approach could be to unlink simultaneously with delete for any entity. What this means is that a delete entity operation on table containing blob data is now a three step operation instead of earlier one (just delete).

  1. Fetch OID value from the column of the to be deleted row
  2. Delete the row
  3. Invoke the lo_unlink for OIDs from step 1

The below snippet shows the code

@Override
public void deleteByIds(Collection<Long> ids) {

//Step 1 - Get the oids corresponding to the ids
List<BigInteger> oidsToUnlink = getOidsForIds(ids);

//Step 2 Perform the deletion
deleteEntityById(ids);

//Step 3 Unlink the oids
unlinkOids(oidsToUnlink);

}

The tricky part if how to fetch the OID’s long value because the lo_unlink function needs the long value. The below snippet shows the example

private List<BigInteger> getOidsForIds(Collection<Long> ids) {    Query query = entityManager.createNativeQuery("Select user_object from users where user_id in :ids ");

query.setParameter("ids" , ids);
List resultList = query.getResultList(); return (List<BigInteger>) resultList;

}

I am using JPA here but you can use JDBCTemplate too basically any interface that supports running native SQL queries. The user_object column has the OID value and user_id is the primary key.

Once We have the OIDs then invoke the unlink for the OIDs sequentially.

private void unlinkOids(List<BigInteger> oidsToUnlink) {    for (BigInteger oid: oidsToUnlink) {        try {
Query query = entityManager.createNativeQuery("Select * from lo_unlink(:oidToUnlink) ");
query.setParameter("oidToUnlink", oid); query.getResultList();
}
catch (Exception ex) {

}
}
}

Summary : So we worked around the problem of missing vaccum_lo by cleaning up the OIDs through our application code.

--

--

dev writer

Avid reader, software developer, father, introvert. Here on medium to vent out my thoughts and learnings.