Symfony — Detect All Changes On Doctrine Entities

Filip Horvat
6 min readFeb 27, 2024

--

Detecting all changes on Doctrine entities can be challenging. Let’s explore why.

Suppose we want to track all changes in the database because we aim to reflect those changes in Elasticsearch(ES).

We aim to utilize Doctrine’s postPersist, postUpdate, and postRemove events, assuming that these events will encompass all changes in our database. In this example, we will disregard manual changes in the database, migration rollbacks, and direct queries made with Doctrine:

$this->entityManager->createQuery('UPDATE '.ES_User::class." e SET e.name = 'test@test.com' WHERE e.id = 1")->execute();

We specifically intend to track all changes made through Doctrine entities, such as the one illustrated here:

$user = new User();
$this->entityManager->persist($user);
$this->entityManager->flush();

When we receive postPersist, postUpdate, and postRemove events in the listener, our goal is to dispatch messages containing the className and identifier. This ensures that the synchronization of this entity with Elasticsearch will be executed in a queue, separate from the current execution. We aim to prevent Elasticsearch updates from slowing down our application. The general idea is to include the className and identifier inside the message and delegate the remaining tasks to the general Elasticsearch message handler with the workers. This approach allows for flexibility and abstraction, allowing the handling of specific cases in a more centralized manner:

#[AsDoctrineListener(event: Events::postPersist)]
#[AsDoctrineListener(event: Events::postUpdate)]
#[AsDoctrineListener(event: Events::postRemove)]
class DoctrineListener
{
public function postPersist(PostPersistEventArgs $args): void
{
//dispatch created message with className and identifier
}

public function postUpdate(PostUpdateEventArgs $args): void
{
//dispatch updated message with className and identifier
}

public function postRemove(PostRemoveEventArgs $args): void
{
//dispatch deleted message with className and identifier
}

Our system should be designed to be universal, meaning it is applicable to all entities in our system.

In each of the above methods (postPersist, postUpdate, and postRemove), our objective is to retrieve the className and identifier, then dispatch a message containing this data. For instance, the postPersist implementation might resemble the following:

public function postPersist(PostPersistEventArgs $args): void
{
//get entity
$entity = $args->getObject();

//get class name of entity
$className = get_class($entity);

//get identifier of entitiy
$meta = $this->entityManager->getClassMetadata($className);
$identifier = $meta->getSingleIdentifierFieldName();
$getter = 'get'.ucfirst($identifier);
$identifier = $entity->{$getter}();

$this->bus->dispatch(new EntityChanged($className, $identifier, 'created'));
//here is dispatched for example ('App\Entity\User', 1, 'created')
}

We aim for a universal listener, which is why we fetch the identifier in this manner. In our application, there may be various types and names of identifiers, and our goal is to cover all possible cases. As evident from this example, the process is not as trivial and elegant as it might initially seem.

While one might assume that using postPersist for persisting actions is appropriate, it’s important to note that postPersist is triggered before the data is stored in the database. Consequently, if we dispatch a message with the entity’s ID and className, and the message handler from the worker attempts to fetch that entity, it may not yet exist in the database because postPersist is triggered beforehand. This represents the first challenge in our approach.

Another issue is that flush is might never called because of different reason:

$entity->setName('test');
$this->entityManager->persist($entity);

While the postPersist event is dispatched, it’s important to note that we cannot guarantee that flush() will be called. If flush() is omitted, the change is not persisted to the database. In the scenario where we listen for the postPersist event and sync the change to Elasticsearch, we may sync a change that has not actually been executed. While it might seem unlikely for developers to omit flush() after persisting, when creating a generic solution for all entities and cases, especially in an agnostic context, it becomes less reliable. Also it is possible that there is an exception during flushing. It’s crucial to be aware of this potential limitation.

Additionally, another issue arises when flushing many entities, as the process might take more than a few seconds or minutes. Consequently, during this time window, there could be a situation where Elasticsearch reflects a new state with updated data while the corresponding changes in the database are still pending. It’s worth noting that this behavior may be acceptable in most scenarios, but developers should be conscious of the potential delay between changes made in the database and their reflection in Elasticsearch.

You might consider using postFlush in this case. However, it’s important to note that in the postFlush event, no entities are available. To address this limitation, a common approach involves storing entities during postPersist, postUpdate, and postRemove events and then triggering the desired logic in the postFlush event. While this workaround can be effective, it’s acknowledged as a kind of hack, as it involves additional steps to achieve the desired behavior.

#[AsDoctrineListener(event: Events::postPersist)]
#[AsDoctrineListener(event: Events::postUpdate)]
#[AsDoctrineListener(event: Events::preRemove)]
#[AsDoctrineListener(event: Events::postRemove)]
#[AsDoctrineListener(event: Events::postFlush)]
class DoctrineListener
{
protected array $entities = [];

public function postPersist(PostPersistEventArgs $args): void
{
$this->entities[] = $args->getObject();
}

public function postUpdate(PostUpdateEventArgs $args): void
{
$this->entities[] = $args->getObject();
}


public function postRemove(PostRemoveEventArgs $args): void
{
$this->entities[] = $args->getObject();
}

public function postFlush(PostFlushEventArgs $args): void
{
foreach ($this->entities as $entity){
//dispatch events of changed entities
}

//reset variable
$this->entities = [];
}
//...

One more issue that sometimes you will receive proxy objects like this:

    public function postUpdate(PostUpdateEventArgs $args): void
{
dump(get_class($args->getObject()));
//Proxies\__CG__\App\Entity\User

I will not delve into the details of why this happens, but in such cases, if you intend to dispatch a message with the class name and identifier, you will need to retrieve the actual class using:

use Doctrine\Common\Util\ClassUtils;
...
//Proxies\__CG__\App\Entity\User
$className = ClassUtils::getRealClass($className);
//App\Entity\User

One more noteworthy issue involves the deletion of entities, particularly when there’s a need to update related entities. For instance, if you have entities like User and Role, where the User JSON in ES contains embedded roles:

{
"id" : 1,
"name" : "test",
"roles" : [
{
"id" : 1,
"name" : "Role one"
},
{
"id" : 1,
"name" : "Role two"
}
]
}

Deleting a role requires updating the related entities (users). This scenario presents a challenge as it involves handling related updates or deletions in ES, ensuring that the data remains consistent and accurately reflects the changes made in the database.

You can filter users index in ES by roles, so for example you can tell ES give me all users which have role with name “Role one”, if that role is remove from the database we need to track that change and update all users JSON accordingly and remove that role from the JSON:

{
"id" : 1,
"name" : "test",
"roles" : [
{
"id" : 1,
"name" : "Role two"
}
]
}

So, if you are listening postRemove event, when role is updated you will not be able to fetch users from role to update all ES users JSON documents where that role is inside, because in postRemove event identifier is null so can fetch className, but you can not fetch identifier, that also means you can not fetch that entity from worker because in that moment of time there is no that entity in the database.

So you should probably listen to preRemove event but then you will need to store somewhere all related entities and then trigger ES sync event with those related entities later on postFlush, because you do not want to sync changes to ES before they are made and when you are not sure that will be made. Doing that is not so elegant and trivial.

Due to the following reasons, fetching related entities when an entity is deleted should be executed within preRemove, and that is contradicting with our initial idea of delegating everything to the ES message handler and handling all tasks in workers.

The above are just a few of the issues you should be aware of when developing a universal Doctrine listener to cover all changes made, enabling you to update Elasticsearch or use it in other scenarios.

That’s all I hope you enjoyed!

--

--

Filip Horvat

Senior Software Engineer, Backend PHP Developer, Located at Croatia, Currently working at myzone.com