The Four Deadly Sins of LINQ Data Access: Part 2–Too Many Columns

Published in

Falafel Software

4 min readFeb 28, 2014

Introduction

Last time I talked about the sin of Overly Chatty Queries and pointed out two other sins while showing an example of an Overly Chatty Query and how to fix it. One of those sins was the sin of Too Many Columns, and that’s what I’m going to discuss this time.

Confession

Here is the code snippet I showed last time which demonstrates how to eagerly load entities, ensuring you have all the data you need in a single query rather than requiring an additional query per loop iteration:

var orders =
    from order in Orders.Include(o => o.Customer) // Alternately, Orders.Include("Customer")
    where order.ShipCountry == "USA"
    select order;
    
foreach (var order in orders)
{
    Console.WriteLine(order.Customer.ContactName);
}

As I said back then, this query does repent of being Overly Chatty, but it is still selecting too many columns. The loop below only uses and needs Customer.ContactName, but the entire Customer table was included in the query results in order to fully populate Customer objects. For illustration, here is the generated SQL that results from executing these statements:

SELECT
[Extent1].[OrderID] AS [OrderID],
[Extent1].[CustomerID] AS [CustomerID],
[Extent1].[EmployeeID] AS [EmployeeID],
[Extent1].[OrderDate] AS [OrderDate],
[Extent1].[RequiredDate] AS [RequiredDate],
[Extent1].[ShippedDate] AS [ShippedDate],
[Extent1].[ShipVia] AS [ShipVia],
[Extent1].[Freight] AS [Freight],
[Extent1].[ShipName] AS [ShipName],
[Extent1].[ShipAddress] AS [ShipAddress],
[Extent1].[ShipCity] AS [ShipCity],
[Extent1].[ShipRegion] AS [ShipRegion],
[Extent1].[ShipPostalCode] AS [ShipPostalCode],
[Extent1].[ShipCountry] AS [ShipCountry],
[Extent2].[CustomerID] AS [CustomerID1],
[Extent2].[CompanyName] AS [CompanyName],
[Extent2].[ContactName] AS [ContactName],
[Extent2].[ContactTitle] AS [ContactTitle],
[Extent2].[Address] AS [Address],
[Extent2].[City] AS [City],
[Extent2].[Region] AS [Region],
[Extent2].[PostalCode] AS [PostalCode],
[Extent2].[Country] AS [Country],
[Extent2].[Phone] AS [Phone],
[Extent2].[Fax] AS [Fax]
FROM  [dbo].[Orders] AS [Extent1]
LEFT OUTER JOIN [dbo].[Customers] AS [Extent2] ON [Extent1].[CustomerID] = [Extent2].[CustomerID]
WHERE N'USA' = [Extent1].[ShipCountry]

Including all those unused columns means more work for SQL Server, more data being transmitted over the network, and more memory usage in .NET, all of which add up to reduced performance and scalability.

Repentence

The solution is quite simple: only request the columns you plan to use, like this:

var orders =
    from order in Orders
    where order.ShipCountry == "USA"
    select new { order.OrderID, order.Customer.ContactName };
    
foreach (var order in orders)
{
    Console.WriteLine(order.ContactName);
}

This results in a much smaller SQL query:

SELECT
[Extent1].[OrderID] AS [OrderID],
[Extent2].[ContactName] AS [ContactName]
FROM  [dbo].[Orders] AS [Extent1]
LEFT OUTER JOIN [dbo].[Customers] AS [Extent2] ON [Extent1].[CustomerID] = [Extent2].[CustomerID]
WHERE N'USA' = [Extent1].[ShipCountry]

Indulgence

“But”, some of you are probably thinking, “We use the Repository Pattern to separate our data access code and we don’t want to have a million different methods returning a million different combinations of columns, not to mention all of the extra classes you would have to define in order to be able to pass these custom projections back to the caller!” Fair enough. This is probably the kind of thing you were imagining:

void Main()
{
    using (var repository = new OrderRepository(this))
    {
        var orders = repository.GetOrderContacts("USA");       
        
        foreach (var order in orders)
        {
            Console.WriteLine(order.ContactName);
        }
    }
}
 
public class OrderRepository : IDisposable
{
    private NorthwindEntities context;
     
    public OrderRepository(NorthwindEntities context)
    {
        this.context = context;
    }
    private bool disposed = false;
     
    protected virtual void Dispose(bool disposing)
    {
        if (!this.disposed)
        {
            if (disposing)
            {
                context.Dispose();
            }
        }
        this.disposed = true;
    }
     
    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }
     
    /*
    Create one method for each custom projection
    */
    public IEnumerable<OrderContactName> GetOrderContacts(string shipCountry)
    {
        return (
            from order in context.Orders
            where order.ShipCountry == shipCountry
            select new OrderContactName { OrderID = order.OrderID, ContactName = order.Customer.ContactName }
        ).ToList();
    }
}
 
/*
Create one data class for each custom projection
*/
public class OrderContactName
{
    public int OrderID { get; set; }
    public string ContactName { get; set; }
}

I can see at least three objections to doing things this way:

Proliferation of data classes, requiring more code to be written and maintained
Proliferation of specialized methods in the repository
Possible duplication of query logic within the repository, only with different outputs

But there is a better way, thanks to the magic of generic type parameters! In summary, you declare your repository methods with a generic type parameter and return a collection of that type. Then, you pass a function that accepts a IQueryable of the base type and returns an IQueryable of the generic type. Here, sometimes it’s clearer just to show the code. Here’s the new repository method:

public IEnumerable<TResult> GetOrderContacts<TResult>(string shipCountry, Func<IQueryable<Order>, IQueryable<TResult>> transform)
{
    return transform(
        from order in context.Orders
        where order.ShipCountry == shipCountry
        select order
    ).ToList();
}

And this is how you would call it:

var orders = repository.GetOrderContacts("USA", os => os.Select (o => new { o.OrderID, o.Customer.ContactName }));

This has the following benefits:

No extra data classes
No extra repository methods
No duplication of repository query logic

Next time, I’ll circle back to the topic of Underly Chatty Calls. Yes, there is such a thing! Now, go forth and select Too Many Columns no more!

The Four Deadly Sins of LINQ Data Access: Part 2–Too Many Columns

Introduction

Confession

Repentence

Indulgence

Written by Falafel Software Bloggers