Please don’t: using the same models for write and read in PHP

Fernando Castillo
8 min readOct 20, 2024

--

Models are a great tool to communicate with a data storage. We can define how the data looks like, and that makes sure that it’s compatible with the data storage, typically a database. Once we have a model that validates our input and helps us write that data we could be tempted to use it also for retrieving data. Except for some basic CRUD applications, that is not usually such a good idea. Let’s see why.

Set up a model to work with

Let’s use a simple User model and the interface of a repository, we don’t really need the details here. But let’s assume we have some assertion library that we use to validate that every model created is valid.

class User
{
public function __construct(
public string $email,
public string $name,
public string $password,
) {
Assert::email($email);
Assert::notEmpty($name);
Assert::password($password, strength: 3);
}
}
interface UserRepository
{
public function save(User $user): void;
}

So main use case, we get data for a new user, it validates that the name is not empty, the email is a valid email and that the password complies with whatever we defined as strength level 3. Then we send it to the repository and save it. Job done.

$user = new User(
$request->get('email'),
$request->get('name'),
$request->get('password'),
);

$repository->save($user);

Problem: Model properties that should not be read

So now we want to read a user by email from the database to return a json representation of it for a client to present some user profile. What happens if we add a read method to our repository reusing the same model?

interface UserRepository
{
public function save(User $user): void;
public function get(string $email): User;
}
// Inside some controller class
return new Response(
json_encode(
$repository->get($request->get('email'))
),
);

So what are we getting here?

{
"email": "peter@dailybugle.com",
"name": "Peter Parker",
"password": "$2y$10$OEaTphGkW0HQv4QNxtptQOE.BSQDnpwrB.\/VGqIgjMEhvIr22jnFK"
}

The first thing that should cross our minds when we watch this is that passwords, even encrypted, should never, ever be sent in any kind of communication from the server. So this is an important security concern.

Even if this is probably the worst possible case of an information leak caused by using a write model as a read model, it’s not the only one. Another common issue is just sending irrelevant information to the client. For example, we could have an active boolean we can use for enabling or disabling users that would be useless for the client, because if the user is not active the request will respond with a 404 Not Found. Irrelevant data means that we are sending bytes that will never be consumed, hurting performance. It may be little, but everything adds and this has an easy solution.

So what do we do? Provide a return with a restricted list of data? This could solve these problems.

class User
{
// ...

public function read(): array
{
return [
'email' => $this->email,
'name' => $this->name,
];
}
}

But there are more issues to solve, let’s see.

Problem: Unnecessary validations

Talking about performance, we have validations in the model constructor, but are those needed when we fetch data that is already in the database? They must have been valid the moment they were stored, so it can be argued that running those validations again is a waste.

But not only a waste, it can be a real problem. Validations might evolve and that can impact the ability to fetch results if we use a write model which makes use of validations. Suppose an application that validates that emails for users have a valid email format, but at some point another rule is added to blacklist some domains in email addresses. The validation is updated, but the existing users can’t really be updated, because they still expect communications via that email address.

Now we get a request for a list of 100 users in which one of them has a blacklisted domain, what happens? The whole request is considered an error. And what do we send the user? A 400 Bad Request response like if some user input was wrong? This is not the client’s fault but the server’s. In this case it should be some kind of 500 error.

To avoid this, I’ve seen some complex solutions involving Reflection and an instance without constructor. If we really had to use the write model in cases we don’t want to validate I would move the assertions to a static constructor though, like this.

class User
{
public function __construct(
public string $email,
public string $name,
public string $password,
) {}

public static function create(string $email, string $name, string $password): self
{
Assert::email($email);
Assert::notEmpty($name);
Assert::password($password, strength: 3);

return new self($email, $name, $password);
}
}

This way, when creating a new model that requires validation I can do User::new(), and use the constructor when fetching data from the database. Solves some issues, but there are more.

Problem: Adding extra data to the model

Another common situation is the client requiring some more data for the view. In our example, the view might need to show the number of comments that a user has created in the system. That’s not part of the model, but it looks wasteful not to add that in the same HTTP response and keep the client waiting for a second one just because the data does not match the write model.

Even if we try to add the data in the same request, sticking to this write model means that we can’t use a single database request to get the whole set of data, though in many cases that could be solved with a simple SQL join. Instead we get the write model and then do another database request to fetch the missing data, and compose it before sending it to the client.

return new Response(
json_encode(
array_merge(
$repository->get($request->get('email')),
['comments' => $commentRepository->count($request->get('email'))]
)

),
);

It works, but it means an extra database query with it’s impact in the performance. And it also hurts re-usability, as you can’t just call the repository somewhere else, you also need to copy and paste the comments part.

Problem: Are inserts and updates really the same?

For a last problem, this is not really a write vs read model, but when we are updating a model, can we really use the same class that we use when creating it?

So if we create a new user with this model, we expect name, email and password. For creating a user that’s ok, but in our example our security expert requires that passwords are updated in a specific way, which involves the user requesting a password change, an email being sent to the user with a limited time token and then validating that token to accept the new password.

The password should never be updated in any other way, so what do we do if we use the same model we already have for updating the user? We will have two different places in the code where we update the user, one for password, another for anything else.

interface UserRepository
{
public function save(User $user): void;
public function update(User $user): void;
}
// Updating name
$user = new User(
$request->get('email'),
$request->get('name'),
'WHAT DO WE DO WITH PASSWORD HERE?',
);

$repository->update($user);
// Updating password
$user = new User(
$request->get('email'),
'WHAT DO WE DO WITH NAME HERE?',
$request->get('password'),
);

$repository->update($user);

Now we have to deal with data in the model that must not be processed, which will make our repository implementation unnecessarily more complex. It will also force the model creation to provide data that will not be available and will not be used, making code much harder to understand. And finally, we introduce a fragile implementation that, if used incorrectly, can cause the update of something that should not be updated, just because it is in the model. If we process the user name change in a way that triggers a password update, that’s a serious problem.

Solution: Individual model for each case

How can we solve all the problems when reading a user? A dedicated model will do.

final readonly class UserRead
{
public function __construct(
public string $email,
public string $name,
public int $commentCount,
) {}
}

We can have another repository to fetch it.

interface UserReadRepository
{
public function get(string $email): UserRead;
}

This implementation, assuming a relational SQL database, would not select the password form the table which is not in the read model, solving problem number 1. This read model does not include validations solving problem number 2. And this model has a place for the comments count that can be implemented in the new repository by using a join in a single query, solving problem number 3.

Even more, if we have more representations of a user, we should have a different read model to cover each one. We could have a UserWithLastCommentsRead for example.

And for the update problems? You probably guessed. Individual models for each update.

final readonly class UserDataUpdate
{
public function __construct(
public string $email,
public string $name,
) {
Assert::notEmpty($name);
}
}
final readonly class UserPasswordUpdate
{
public function __construct(
public string $email,
public string $password,
) {
Assert::password($password, strength: 3);
}
}
interface UserRepository
{
public function save(User $user): void;
public function updateData(UserDataUpdate $userDataUpdate): void;
public function updatePassword(UserPasswordUpdate $userPasswordUpdate): void;
}

Now there are no mistakes or unnecessary data. Each update is isolated and it is much more protected from bugs.

Note that in the update models I didn’t add the email validation. That is intentional because it is going to be used to find the user, and if we have an evolved validation, as commented before, we would not be able to find older users with emails that are not valid anymore, but still in the database anyways.

Last words

This is really not that different as we model objects in the real world. We never consider everything about a real life object in a particular context. For example a car.

If a car is modeled by a driver, we can expect the positioning of the seat and the rear mirrors to be really important, while at the same time it is irrelevant for a mechanic doing some maintenance. The mechanic will probably be more concerned about metrics on the engine that are not important to the driver. And a kid at the school learning about transport methods will probably just care about it being a land transport with 4 wheels.

If we use different models for the same real life objects we can definitely do the same for our code models.

If you like my content, you can buy me a coffee to support it.

--

--