Enhancing your EF Core insert performance

Gabriel Ribeiro Guerra
5 min readFeb 13, 2023

--

In this article, we are going to improve our insert performance when dealing with large lists on Entity Framework Core.

Have you ever struggled with insert performance when it comes to a considerable amount of data while using Entity Framework Core? If the answer is yes, then you should read those steps bellow and you may speed up your operations up to 5 times!

How is this possible?

We are going to use a nuget package called EFCore.BulkExtensions, which will provide us some bulk operations. In this article, we are going to focus only at the bulk insert operation, that in the background, uses MySQL’ s native feature LOAD DATA INFILE, known for reading a file and inserting rows at very high speed.

Setting up our project

Lets create our customer model

namespace efcore_bulk_insert_example.Models
{
public class Customer
{
public int Id { get; set; }
public string Name { get; set; }
public string LastName { get; set; }
public DateTime BirthDate { get; set; }
public string SSN { get; set; }
public string Email { get; set; }
public string PhoneNumber { get; set; }
public DateTime CreatedAt { get; set; }
}
}

Then, we create our Context

using efcore_bulk_insert_example.Models;
using Microsoft.EntityFrameworkCore;
namespace efcore_bulk_insert_example.Context
{
public class BulkExampleContext : DbContext
{
public DbSet<Customer> Customers { get; set; }
public BulkExampleContext(DbContextOptions options) : base(options){ }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseSnakeCaseNamingConvention();
}
}
}

Now, we create our connectionString, with the AllowLoadLocalInfile parameter as true, so we can use the LOAD DATA INFILE feature.

var defaultDBConnection = "Server=localhost; Database=efcorebulk; Uid=admin;Pwd=root; SSL Mode=None; AllowLoadLocalInfile=true";

Then, we use the connectionString when adding the dbContext

builder.Services.AddDbContext<BulkExampleContext>(options =>
{
options
.UseMySql(defaultDBConnection, new MySqlServerVersion(new Version(8, 0, 32)), opt => {
opt.EnableRetryOnFailure(
maxRetryCount: 3,
maxRetryDelay: TimeSpan.FromSeconds(30),
errorNumbersToAdd: null);
});
});

Now, we install EFCore.BulkExtensions

Last but not least, we create a repository for our customer entity and context is ready to use the bulk extension methods

using efcore_bulk_insert_example.Models;
namespace efcore_bulk_insert_example.Repository.Customers
{
public interface ICustomersRepository
{
Task BulkInsertAsync(List<Customer> customers);
Task AddRangeAsync(List<Customer> customers);
}
}
using EFCore.BulkExtensions;
using efcore_bulk_insert_example.Context;
using efcore_bulk_insert_example.Models;
namespace efcore_bulk_insert_example.Repository.Customers
{
public class CustomersRepository : ICustomersRepository
{
protected BulkExampleContext _context { get; set; }
public CustomersRepository(BulkExampleContext context)
{
_context = context;
}
public async Task BulkInsertAsync(List<Customer> customers)
{
await _context.BulkInsertAsync(customers);
}

public async Task AddRangeAsync(List<Customer> customers)
{
await _context.AddRangeAsync(customers);
await _context.SaveChangesAsync();
}
}
}

Database

For the database, we have a mysql v8.0.32 image running on docker. To run the image on your computer, open your cmd as an administrator, paste and execute the command bellow.

docker run --name mysql8 -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root -e MYSQL_DATABASE=efcorebulk -e MYSQL_USER=admin -e MYSQL_PASSWORD=root -d mysql:8.0.32

As the API project needs to specify that Load Data Infile is enabled (on the client-side), we also need to specify on the MySQL server side. We do that by setting the local_infile variable to TRUE:

docker exec -it mysql8 /bin/bash
mysql -u root -p  # (input root as password)
SET GLOBAL LOCAL_INFILE = TRUE;

And for the last step, we just need to create the table that we are using for this example:

CREATE TABLE `customers` (
`id` INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
`name` varchar(128) DEFAULT NULL,
`last_name` varchar(128) DEFAULT NULL,
`birth_date` datetime DEFAULT NULL,
`ssn` char(4) DEFAULT NULL,
`email` varchar(128) DEFAULT NULL,
`phone_number` varchar(64) DEFAULT NULL,
`created_at` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Testing

Now that our project is all set up, lets write some random data generator code block so we can compare the performances between AddRangeAsync and BulkInsertAsync.

So we can generate the random data, we’ll install and use the nuget package Bogus

Once we install the package, we’ll be able to generate 100k random customers , as shown bellow at the two endpoints of our ExampleController

using Bogus;
using efcore_bulk_insert_example.Models;
using efcore_bulk_insert_example.Repository.Customers;
using Microsoft.AspNetCore.Mvc;
using System.Diagnostics;
namespace efcore_bulk_insert_example.Controllers
{
[ApiController]
[Route("[controller]")]
public class ExampleController : ControllerBase
{
private ICustomersRepository _customersRepository;
public ExampleController(ICustomersRepository customersRepository)
{
_customersRepository = customersRepository;
}
[HttpGet("without-bulk")]
public async Task<IActionResult> WithoutBulk()
{
var sw = Stopwatch.StartNew();
var faker = new Faker<Customer>()
.RuleFor(x => x.Name, f => f.Name.FirstName())
.RuleFor(x => x.LastName, f => f.Name.LastName())
.RuleFor(x => x.Email, f => f.Internet.Email(f.Person.FirstName).ToLower())
.RuleFor(x => x.BirthDate, f => f.Date.Recent(1000))
.RuleFor(x => x.CreatedAt, f => f.Date.Recent(400))
.RuleFor(x => x.SSN, f => f.Random.Number(9999).ToString())
.RuleFor(x => x.PhoneNumber, f => f.Person.Phone);
var values = faker.Generate(100000);
await _customersRepository.AddRangeAsync(values);
sw.Stop();
return Ok($"Without bulk took {sw.ElapsedMilliseconds} ms");
}
[HttpGet("with-bulk")]
public async Task<IActionResult> WithBulk()
{
var sw = Stopwatch.StartNew();
var faker = new Faker<Customer>()
.RuleFor(x => x.Name, f => f.Name.FirstName())
.RuleFor(x => x.LastName, f => f.Name.LastName())
.RuleFor(x => x.Email, f => f.Internet.Email(f.Person.FirstName).ToLower())
.RuleFor(x => x.BirthDate, f => f.Date.Recent(1000))
.RuleFor(x => x.CreatedAt, f => f.Date.Recent(400))
.RuleFor(x => x.SSN, f => f.Random.Number(9999).ToString())
.RuleFor(x => x.PhoneNumber, f => f.Person.Phone);
var values = faker.Generate(100000);
await _customersRepository.BulkInsertAsync(values);
sw.Stop();
return Ok($"With bulk took {sw.ElapsedMilliseconds} ms");
}
}
}

Now, we can call both endpoints and compare their performances:

Without bulk — 20607 ms

With Bulk — 4298 ms

Some other results

I’ve tested three different sets, with 10, 100, and 500k customers, and we can see that the bigger the set is, the bigger the advantage BulkInsertAsync has over AddRangeAsync.

Do remember that, as the EFCore.BulkExtensions docs says

practical general usage could be made in a way to override regular SaveChanges and if any list of Modified entities entries is greater then say 1000 to redirect to Bulk version.

so, if you don’t have a list bigger than 1k registers, the advantage might not be that big.

Thats it fellas! If you want the repository for the API used in article, click here.

Happy coding!

--

--