Improve performance, avoid boilerplate code and follow some best-practices when fetching object lists in .NET apps
Complex applications within the corporate environment (e.g. ERPs) usually display various lists of objects. These items might incorporate attributes from numerous entities (tables) like names, codes, so in order to display such a list, many lookups must be performed, if a normalized relational database is used (as opposed to NoSQL databases which do not favor lookups, but denormalization).
A typical example is a list of contracts brief data: although the general contract data are stored in a single table (creator identifier, creation date, a status like Awaiting for approval from a big boss etc.), there are many attributes that require lookups: creator name, status name, category name etc.
Fetching data for these lists can be done in several ways. The following examples will be based upon a real list within the Metro Data Warehouse Automatic Testing web application (a project I’ve been working with my colleague Claudiu Andrei for almost a year). This application is developed using ASP.NET MVC 5 which relies on Entity Framework 6 for own data fetch (from a SQL Server instance, other data is fetched from Unix file system and Teradata databases).
The list containing relevant information regarding test results for MDW developers and QA specialists is one example. The columns include Test name, Importing System (ETL entity), Environment (database), Test requester, Result, Duration etc. I will consider a typical filtering scenario: get all tests results within the last week from an environment of interest.
- Direct fetch with eager loading — involves directly gathering all required information using EF provided mechanisms (i.e. Include extension method):
Generated query is too large to be included here, but it clearly includes fetching data from all the involved tables, so it is quite expensive:
[cold run]: CPU = 187, Reads = 2745, Duration = 232
[warm run]: CPU = 31, Read = 1417, Duration = 102
This is not efficient, especially under heavy load, so it must be optimized.
2. Using a view to aggregate information
This method involves creating a database view to get all the required information. Entity Framework allow to map data models to views (read-only):
Data selection is straightforward:
This way is slightly more efficient than the first one:
— [cold run]: CPU 0, Reads = 1929, Duration = 205
— [warm run]: CPU = 0, Read = 1929, Duration = 64
3. Cache small tables items and avoid server-side JOINs
Most of the items used in JOINs come from small tables which are rarely changed, but heavily read. This looks like the perfect scenario to cache these items and have them used from memory.
The caching mechanism uses generic repositories and their implementation is outside of the scope of this article. They provide a simple and fast mechanism to find an item based on its identifier (O(1)). Another advantage is that most operations never reach the database (since everything is loaded when the web server starts, so no database reads are required).
Upon these caches, the application defines a custom mapping mechanism to directly obtain derived properties (e.g. TestName from ImportingSystemTestId).
What is required to get the list of tests results view presented at the beginning of the article:
a. A service model to hold the basic information returned by the filtering service:
The service model does not include displayed information (only identifiers), since this is more related to view.
b. A service method to accept filter values and provide a list of service models:
c. A view model to include information that must be displayed. A special mapping is defined between service model and view model, to ensure that developer does not write all the assignments:
The mapping is done through a specialized service that knows how to automatically populate properties marked with [StandardListItemMappingInfo] attribute. Its implementation can be found here and is an extension of auto-mapping mechanism provided by AutoMapper. However, its usage is straightforward:
So, by properly defining view model and service model, the mapping is done automatically.
Note: StandardListItem<T> contains only a key and a value, so this scenario does not deal when extra properties are required. However, the vast majority of cases require only the “value” from a joined list item.
What about performance?
One should expect the last method to be faster on the database side, since it fetches data from fewer tables:
— [cold run] CPU 47, Reads = 955, Duration = 57
— [warm run] CPU = 16, Read = 907, Duration = 5
However, we also have some mapping overhead which is about 10ms for about 250 records. Clearly, less (page) reads and faster execution.
Why use such a complex solution for a simple task?
Although it looks complex, the actual code required for a particular list is pretty minimal: define service model and view model based on required information, define filtering logic based on view allowed filters and perform the mapping. It seems a big overhead, but the mapping mechanism is generic and it is reused throughout the whole application (loading lists, loading single entities etc.)
There are also a few architectural advantages:
i. Separation between logic and view — service does not have to know about how we display the information. Maybe I want to define a public API that requires the same data, but few or none of the names. Or this service is used by another one to perform some operation on the results.
ii. Easier automatic testing — performing unit/regression tests is easier when the service returns only what is relevant business-wise. Not using Entity Framework’s navigation properties allows to replace all generic repositories with in memory ones and fully control the data when performing regression tests.
iii. Less code —developer is focused on writing the business logic (i.e. filtering logic), not lots of assignments.