EF is smarter than you think

10 min readJun 17, 2024

Despite how popular EF is, developers are just too lazy to read the documentation 😬. As a result, a lot of additional and most of the time redundant code appears.

In today’s article, we will explore common code samples and ways to improve them. You’ll learn how to make your Entity Framework (EF) code cleaner and neat. Additionally, we will cover some advanced techniques that you can share and discuss with friends 😉.

Without any further ado, let’s begin

Domain

In all the examples below, the following entities will be used:

public class User
{
    public int Id { get; set; }
    public string Name { get; set; }
    public ICollection<Address> Addresses { get; set; }
}

public class Address
{
    public int Id { get; set; }
    public string Name { get; set; }

    public int UserId { get; set; }
}

No need in DbSet

Everyone who worked with EF knows that you need to define DbSet in your DbContext. This way Entity Framework will create tables in the database and match them with corresponding properties.

public class ApplicationDbContext : DbContext
{
    public DbSet<User> Users { get; set; }
    public DbSet<Address> Addresses { get; set; }
}

However, you don’t actually need to do it. As long as your entity is configured and configuration is registered in the DbContext EF can figure out which tables should be created:

public class ApplicationDbContext : DbContext
{
    // public DbSet<User> Users { get; set; }
    // public DbSet<Address> Addresses { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        // search for all FLuentApi configurations
        // modelBuilder.ApplyConfigurationsFromAssembly(typeof(ApplicationDbContext));

        modelBuilder.Entity<User>();
        modelBuilder.Entity<Address>();
    }
}

Later the table can be accessed by .Set<TEntity>() method:


await using var dbContext = new ApplicationDbContext();

dbContext.Set<User>().AddAsync(new User());

                        .  .  .

This is useful when you want to limit direct access from DbContext to some tables (which is often the case in DDD) or when implementing generic operations.

Update a collection of child items

Often, UI allows the change of multiple entities at the same time. It is a common task to update a parent entity and all its children. For example, in the image below, we can change the user’s name and add/update/remove its addresses on a single page.

Even though, it seems to be not the most complicated thing to do, in practice, it causes lots of developers to scratch their heads.

Here is just an approximate example of the code I have seen (please, do not try to understand it, just give it a quick look):

using (var dbContext = new AppDbContext())
{
    // Retrieve the user and its addresses from the database
    var user = dbContext.Users.Include(u => u.Addresses).Find(userId);  

    // Update the user's name
    user.Name = newName;

    // Add/Update/Delete addresses
    var existingAddressNames = user.Addresses.Select(a => a.Name);
    var addressesToDelete = existingAddressNames.Except(newAddressNames).ToList();
    var addressesToAdd = newAddressNames.Except(existingAddressNames).ToList();

    // Remove addresses that are no longer in the updated list
    foreach (var addressName in addressesToDelete)
    {
        var addressToDelete = user.Addresses.FirstOrDefault(a => a.Name == addressName);
        if (addressToDelete != null)
        {
            user.Addresses.Remove(addressToDelete);
            dbContext.Addresses.Remove(addressToDelete);
        }
    }

    // Add new addresses
    foreach (var addressName in addressesToAdd)
    {
        var newAddress = new Address { Name = addressName };
        user.Addresses.Add(newAddress);
        dbContext.Addresses.Add(newAddress);
    }

    // Update addresses
                .  .  .

    // Save changes to the database
    dbContext.SaveChanges();
}

It can be optimized with a heavy usage of Linq and other kinds of refactoring. I am pretty sure there are some bugs too. But my point is, that is quite a lot of work. You can realize that just by looking at how massive it is. That code will become even bigger, the more entity you have.

A much simpler approach would be just to delete all the addresses and repopulate them:

using (var dbContext = new AppDbContext())
{
    // Retrieve the user and its addresses from the database
    var user = dbContext.Users.Include(u => u.Addresses).Find(userId);
    
    // Update the user's name
    user.Name = newName;

    // Update addresses
    user.Addresses.Clear();
    user.Addresses.Add(new Address
    {
        Id = 2, // this one has an Id and need to be updated
        Name = "Rename Existing",
    });
    user.Addresses.Add(new Address
    {
        Name = "Add New",
    });

    // Save changes to the database
    dbContext.SaveChanges();
}

I would expect EF to delete all user’s addresses and insert new ones 🤔. However, that is not how it works.

Let’s check the generated SQL:

DELETE 
FROM [Address]
WHERE [Id] = 1;

UPDATE [Address] 
SET [Name] = 'Rename Existing'
WHERE [Id] = 2;

INSERT INTO [Address] ([Name], [UserId])
VALUES ('Add New', 1);

Updating child entities has never been so simple 😱

We do not have to write any complex algorithm to figure out which entities were added, which were updated, and which were deleted. Ef is smart enough to do it by itself, thankfully to Change Tracker 😏.

Access changes from another scope

Okay, this time to understand the problem, I would like you actually to analyze the code, but don’t worry I will help you 🙂

Imagine we have theAppDbContext registered as Scoped. We also have two services UserService and NameService. They both operate on the same user entity and both try to update the same Name property. Let’s also just say the original user’s name was John. And now where the fun begins:

You can see in comment №1 that we are renaming our user to Joe, but the changes are not submitted to the DB yet. Then _nameService.UpdateUserName() is called. Which also loads the same user from the database and updates its name. Could you tell what name the user will have in question 1? Is it “John” or “Joe”?
Also notice, that in comment №2 we are renaming our user to Jonathan, however, this time SaveChanges() is called. Could you tell what value will be in the name field when we return to the original code in question 2? Is it “Joe” or “Jonathan”?

class UserService
{
    public void UpdateUser([FromServices] AppDbContext dbContext)
    {
        var user = dbContext.Users.First(u => u.Id == 1);
        user.Name; // John
        user.Name = "Joe"; // 1
    
        _nameService.UpdateUserName();
        user.Name; // question 2 ???
    
        dbContext.SaveChanges();
    }
}

class NameService
{
    public void UpdateUserName([FromServices] AppDbContext dbContext)
    {
        var user= _dbContext.Users.First(p => p.Id == 1);
        user.Name; // question 1 ???
        user.Name = "Jonathan"; // 2

        dbContext.SaveChanges();
    }
}

If I were someone who didn’t know how EF works, I would guess “John” for the first case and “Joe” for the second. After all, that is how variable scope works. However, EF is smarter in this case. Here is the actual result:

class UserService
{
    public void UpdateUser([FromServices] AppDbContext dbContext)
    {
        var user = dbContext.Users.First(u => u.Id == 1);
        user.Name; // John
        user.Name = "Joe"; // 1
    
        _nameService.UpdateUserName();
        user.Name; // Jonathan

        user.Name = "Here's Johnny"; 
    
        dbContext.SaveChanges();
    }
}

class NameService
{
    public void UpdateUserName([FromServices] AppDbContext dbContext)
    {
        var user= dbContext.Users.First(p => p.Id == 1);
        
        user.Name; // Joe
        user.Name = "Jonathan"; // 2

        dbContext.SaveChanges();
    }
}

Even though the entity is reloaded in another service, we still have access to the changes done by other code. That happens because it is actually not a new entity.

Because of the Change Tracker, EF fetches the entity, checks its primary key, and sees such entity is already tracked, so it returns the already tracked entity. Therefore, all the changes done to an entity in one scope are present in another.

This has both advantages and disadvantages. On the one hand, we can be sure the entity we are working with is always up to date, regardless of manipulations done by other methods. On the other, we can accidentally submit changes we were not aware of.

Find() vs First()

Here is another example.

We have a method that loads a user and updates its name and then loads the same user and updates its addresses.

class UserService
{
    public void UpdateUser(int userId, string userName, List<Address> addresses)
    {
        UpdateName(userId, userName);
        UpdateAddresses(userId, addresses);
    }

    private void UpdateName(int userId, string userName)
    {
        var user = _dbContext
          .Users
          .First(u => u.Id == userId);

        user.Name = userName; 
    
        _dbContext.SaveChanges();
    }

    private void UpdateAddresses(int userId, List<Address> addresses)
    {
        var user = _dbContext
          .Users
              .Include(u => u.Addresses)
          .First(u => u.Id == userId);
        
        user.Addresses = addresses;
    
        _dbContext.SaveChanges();
    }
}

You can see that the same user is fetched from the database twice. As we already know, Entity Framework will return the same entity in both cases. However, there is still a problem, it will create two SELECT requests even though the user is already tracked. I guess EF is not that smart after all 😒

Surely we can have another method that loads the user and then uses it in both UpdateName() and UpdateAddresses(). However, there is another solution.

To avoid fetching the user twice, you can use Find() instead of First() (FirstOrDefault(), Single(), SingleOrDefault()).

Find() returns an entity from a database by its primary key. For the first time, a request to the database is made. Then the entity is tracked and cached, and for all future calls, the entity is immediately returned from the cache.

It works this way because Find() is an actual method in the DbSet and EF’s developers have direct access to Change Tracker. In comparison, First() is just an extension method for IQueryable with a predicate on any field (not necessarily primary key).

abstract class DbSet<TEntity>
{
                .  .  .
     public Task FindAsync(params object[] primaryKey);
                .  .  .
}

Revert changes with TransactionScope

It often happens that your business logic is not only about updating some fields. Frequently enough, you can update some properties, perform some calculations, and later update the rest of the model one more time. Imagine something went wrong during calculation:

class UserService
{
    public void UpdateUser(int userId, string userName, List<Address> addresses)
    {
        UpdateName(userId, userName);

        // some business logic that can do this:
        // throw new BusinessLogicException(💥)

        UpdateAddresses(userId, addresses);
    }

    private void UpdateName(int userId, string userName)
    {
        var user = _dbContext
          .Users
          .Find(userId);

        user.Name = userName; 
    
        _dbContext.SaveChanges();
    }

    private void UpdateAddresses(int userId, List<Address> addresses)
    {
        var user = _dbContext
          .Users
              .Include(u => u.Addresses)
          .Find(userId);
        
        user.Addresses = addresses; 
    
        _dbContext.SaveChanges();
    }
}

It means some data is saved, while some isn’t. This leads to data inconsistency.

Of course, we could add some try/catches here and there, write compensating actions, make our code complicated as hell, and so on. But why would we do it 🤔?

With EF, it can be done just in a few lines using BeginTransaction():

class UserService
{
    public void UpdateUser(int userId, string userName, List<Address> addresses)
    {
        using (var transaction = db.Database.BeginTransaction())
        {
            UpdateName(userId, userName);
            throw new BusinessLogicException(💥)
            UpdateAddresses(userId);

            transaction.Commit();
        }
    }
                    .  .  .
}

EF creates a new transaction and commits it on SaveChanges(). But if it sees that there is already an ongoing transaction it will attach all changes to it.

This time there won’t be any data inconsistency. UpdateName() can call SaveChanges() as many times it wants, data won't be saved until the transaction.Commit() hit.

Linq chaining

And the last one. I have seen my coworker write complex queries like this:

var result = _dbContext
    .Users
    .Where(u => u.Name.Contains("J") && u.Addresses.Count > 1 && u.Addresses.Count < 10) &&
        (idToSearch == null || u.Id == idToSearch)
    .ToList();

He tries to put all the conditions in the single .Where() statement:

So, I suggested rewriting it this way:

var query = _dbContext
    .Users
    .Where(u => u.Name.Contains("J"))
    .Where(u => u.Addresses.Count > 1 && u.Addresses.Count < 10);

if (idToSearch is not null)
{
    query = query
      .Where(u.Id == idToSearch);
}

var result = query
    .ToList();

The reasoning is following:

splitting the conditions into separate Where clauses makes the code more readable, especially when dealing with complex conditions or long expressions
it is better formatted, allowing a developer to focus on each filter individually
each condition is separated, making it easier to understand the intent of each filter
with separate Where clauses, you can reuse or modify conditions independently. If you need to change one of the conditions, you can do so without affecting the others. It provides more flexibility when maintaining or evolving the code
changed to a specific condition, will be better displayed in version control systems like git, making the PR review process smother

He agreed with me but refused to do it since it would decrease the performance. I was puzzled at first. How is it possible 🤔? Only then I realized, he thinks about those Linq operators as they are executed against regular collection.

EF will not filter rows by name, then filter remaining rows by addresses, and so on. It will convert those filter operators to SQL. Technically speaking, it does not matter which approach you are using.

I would also suggest using this approach with regular collections too, since:

the code base is consistent
most of the time, in memory collections, we are working with are small
the performance difference between the two approaches is negligible since even regular Linq does not work this way, but this is for another story🙃

Conclusion

EF proves to be a powerful tool that offers more capabilities than meets the eye. By leveraging its features effectively, developers can simplify complex tasks, optimize performance, and ensure data consistency.

Understanding the inner workings of EF allows us to unlock its full potential and streamline our code, making development a smoother and more efficient process.

So, the next time you find yourself working with EF, remember that it’s smarter than you think 😅.

👏 Clap it, if you find this article useful

💬 Let me know in the comment section whenever you have faced any other proof of EF facilities

⬇️ Don’t forget to check out my other articles

☕️ Support me directly with a link below

✅ Subscribe

🙃 And stay focused