Python Django EAV Model for Users to Easily Add Variables
Django ORM and our Motivation for EAV
Basic object-relational models in Python Django do not allow users to modify the data model within the application. For example, personal health assessments involve variables such as blood pressure, height, weight, waist size, blood glucose level. The basic model for collecting that data in Django would define every one of those variables in advance in our Python code, from which Django creates an object-relational mapping that amounts to a database table with one column for each variable.
If there are other variables we want to collect later, such as smoking status or cholesterol level, then we need to update the Python code for the model and perform Django migration to add new columns to our database table. Then we also need to update templates, forms, and reports to include the new variables. Doing this frequently could require and needlessly occupy development resources. So, what if we want the users of our Django application to be able to add variables themselves?
This is a particularly common request for applications involving collection of data in the field, in our case interview surveys and health assessments of voluntary participants. From this demand arises a desire to use Entity-Attribute-Value (EAV) models. The promise of an infinite number of user-defined variables combined in infinite ways tempts users seeking to maximize the value and longevity of an application. Real-life implementations of such models are non-trivial due to the administrative needs, domain-specific business logic and validation requirements, data entry interface, and reporting functionality.
Django Model Incorporating EAV
An EAV or sparse matrix model can make our implementation of data entry and reporting far more difficult because we are shifting from operating on a single record for all of the collected information for a participant to creating and maintaining a dynamic number of equivalent value records, each value record then mapped to the attribute or variable collected and a common parent entity reflecting our participant (or in a more complex case one instance of our participant completing a survey). Fortunately, Django is able to abstract some of this implementation for us.
See below for a representation of the sparse matrix as it applies to our Django data model. In practice there can be many empty or zero values because only a small subset of the potential questions for a survey are asked of and answered by a particular participant at an event.
See below for a conceptual entity relationship diagram of the EAV model.
Django data models to match our conceptual ERD look something like the following listings.
The Question model represents a question or measurement. Each Question is a variable for which users can collect data. Each question has question_text such as “What is your weight in pounds?” We define two types of questions — those where the answer is a number and those where the answer is text. In a real application you can code question types beyond integers and strings such as dates or digital signatures.
The Survey is a group of Question. Users can pick and choose from the all questions to construct a Survey. The ManyToManyField method tells Django that each Question our users create can be used in any Survey.
The Event is an occassion where users collect data for each Question in a Survey such as a company health fair. The ForeignKey method tells Django that every Event can have only one Survey or group of Question to ask. Every event also has an event_date on which the event is held.
The Participant is a person who answers the questions asked by a user at an Event. Each Participant always has a name, the Event where they were given the Survey, the date and time they answered the Survey, and any notes the user added about the Participant while entering answers.
The QuestionResponse is one Participant’s answer to a Question. We expect to have one QuestionResponse per Participant for every Question in the Survey for an Event. We use ForeignKey to tell Django that every QuestionResponse has one Participant and one Question. Django can figure out the rest from our previous models.
We need to allow for a means to store and retrieve values of multiple types (numeric and textual being the most common, but dates, times, and constrained choice selections are also desirable). Our QuestionResponse model uses a single value field defined as a large text object. This has the trade-off of the simplicity of using one underlying field for storage of all values versus the lack of explicit specification and enforcement of a particular data type at the model level for each of our questions. Alternatives include defining additional fields for storing values of other data types (in which case we must have logic that determines which field to retrieve when we want to display the collected data), and defining logically-separate relations for values of different data types (this technique is used for EAV product attributes in the Magento e-commerce platform, for example).
We also need to allow for varying the controls used by our Django form to match the type of data expected for a question. An additional method defined on our Question class allows us to define a separate collection of classes with form control specifications to match each question type we define within the attribute portion of our EAV model. In this way we can generate views that utilize the best UI and client-side validation for a user-defined question. Packages such as Floppy Forms or Crispy Forms may be used in place of or as an additional layer on top of our model to further customize input widgets. For the sake of simplicity, we will inline our widgets in our single data collection template in this example.
Reports with Dynamic EAV Columns in Django
Django templates do not allow passing arguments to functions and do not allow accessing dictionary values by a variable key (see https://code.djangoproject.com/ticket/12486). The Django philosophy is do all the data retrieval and manipulation within the Python code for models, forms, and views, and pass the data within a context for the template to display. (The “view” in Django then is essentially an MVC controller and the template is an MVC view). Obviously, this prevents a straightforward attempt to report our EAV survey responses in tabular format within a Django template.
One solution is to provide a function within our Event model to produce the virtual tabular view on the EAV survey responses. Using the prefetch_related Django queryset function on the question responses for the set of all event participants ensures that we will query our many-to-many relationship between participants and questions answered a single time when generating the report, rather than round-trip to our database multiple times for every participant or every collected data point. It is important that we define a consistent logical sort order for the questions within our Survey so we can efficiently iterate over every possible question and find the answer if present. Also note we are only iterating over the questions assigned to the Survey for the selected Event. The database may contain any number of other questions with data collected from other surveys without hindering our efforts.
Our Event method or member function eav_report_data builds an ordered list for each participant of the event such that the list contains one value that corresponds to each question on the Survey for the selected Event. We also pad the front of the list with two additional pieces of data, namely the participant identification key and participant name, for use in our report. The get_sorted_questions function added to our Survey model class ensures a consistent sort order is referenced across our application when compiling and displaying these reports.
Our view retrieves the requested report information from the Event using our eav_report_data function and passes it in context to our template for rendering. Our template then only needs to iterate once over the ordered list of questions to define the columns of our report, and then iterate once over the set of ordered lists of participant responses, rendering each response value in order corresponding to the list of questions above.
Simple refinements to such a reporting mechanism may include defining additional properties for specifying a custom sort order for our questions for collection or reporting, or a short name or unique textual key for each of our questions. Using Django templates it is also trivial to produce the same report output in a CSV or other delimited format.
Data Entry via Django Forms
For data collection, we create a very simple view for adding a participant to an event, a view for managing that participant, and a view for answering our questions. Utilizing our get_sorted_questions function of Survey allows us to provide a user-interface to walk a participant through the questions forwards and backwards with an indication of progress. Every response is saved to our QuestionResponse model which is in effect our sparse matrix, representing the many-to-many relationship between a participant at an event and an answered question. This example demonstrates a fully-functional though minimal interface for data collection, using only built-in Django packages and features. In practice an Ajax-enabled responsive front-end provides a more impressive-looking user interface and is able to leverage the same back-end code.
Some of the data we collect is not EAV-appropriate, the participant name in this example (typically other demographic data such as address as well), because we intend to always collect it for every subject participant. There is a disadvantage to storing this data in key-value format in our QuestionResponse model since it will make it less apparent how to access the data in our views (especially important in more advanced functionality such as automated email messages to our participants) and increase our storage and processing overhead needlessly. This data is modeled as fields of our Participant model, and added specifically by field name to our report and other views where needed. A proper EAV implementation in practice will nearly always work alongside a traditional non-EAV model to balance dynamic capability with performance and ease-of-development.
User Updates through Django Admin
For configuration of our Event, Survey, and Question objects we could define custom views with elaborate authentication and process logic depending on the project requirements. In this example, we enable the Django admin templates for these models. The built-in admin functionality allows an authenticated user a convenient means of defining any number of questions dynamically, grouping them arbitrarily into surveys, and scheduling events using those surveys. The data collection and reporting views continue to work with any new questions we define, fitting themselves to the surveys we configure.
Originally Published July 8, 2016
- An Introduction to Entity-Attribute-Value Design for Generic Clinical Study Data Management Systems
- Optimized Entity Attribute Value Model: A Search Efficient Representation of High Dimensional and Sparse Data
- Is this survey table considered an entity-attribute-value model?
- Alternatives to EAV anti pattern
- SQL Server — Survey Data Model — How to avoid EAV and excessive denormalization?
- Big Data Analytics: Third International Conference, BDA 2014, New Delhi
- EAV FAIL
- SQL Processing and Data Analysis with the EAV Model
Originally published at Equastat.