EAV: Flexible RDBMS Schema
Problem Statement
Allow the end-user to add custom columns on the fly to the tables. For example, a Lead table has 10 fixed columns like id, name, email, phonenumber, status, etc. But a user can add additional columns say leadprovider which is not needed by any other companies.
Option 1: EAV
Implementation
EAV or Entity Attribute Value is an industry wide accepted solution to manage flexible columns in a RDBMS based system. This helps extending tables tenant wise based on each tenant need. This approach does not have a limitation on the scale as data is being stored in a separate table.
Pros
- Space efficient
- Scalable
- Industry Best Practice
Cons
- Complexity
- Performance
Option 2: EAV — Multiple Level
Implementation
A variation of the EAV model, where we will have an Attribute and Value table per entity that we want to keep customizable.
Example
Entity: Lead
Attribute: Lead_Attribute
Value: Lead_Value
Entity: Product
Attribute: Product_Attribte
Value: Product_Value
Pros
- Space efficient
- Scalable
- Better performance than EAV Single level
Cons
- More tables to manage
- Higher Complexity
- Performance
Option 3: Nullable Attributes
Implementation
In this approach, instead of having additional tables, for values, we will add extra generic attribute columns.
Example
Entity: Lead
Columns
Id
Name
Phone
..
Attribute1
Attribute2
Attribute3
Attribute4
Entity
Lead_Attribute
companyID
AttributeName
Pros
- Easier implementation than EAV
Cons
- Extra columns are always there even if they are not needed
- Performance impact
- Storage Impact
Option 4: Dynamic Attributes
Implementation
Various databases support this particular use case with a special form of columns. For example, MariaDB has the option to create Dynamic columns.
https://mariadb.com/kb/en/dynamic-columns/
This solves the very specific use case we have, that one can create columns on the go.
Pros
- In-Built support for dynamic columns
Cons
- This makes us dependent on MariaDB
- It will be challenging to integrate Analytics tools easily as there is a specific way to fetch dynamic columns.
Option 5: JSON based storage for additional attributes
Implementation
Another way to store additional columns is to add them as a single column in JSON. This will help keep all the additional information in a single attribute.
Pros
- Performance
- Low Complexity
Cons
- Using additional columns for analytics purposes might be difficult
Recommendation
Option 1: EAV
As this is a tried and tested solution accepted throughout the industry, this looks like the best option we need to go ahead with. It has certain drawbacks like additional coding complexity and performance impact due to additional tables and joins. But as this is an Industry accepted solution, it will be easier to implement and handle any corner cases.
Option 5: JSON based storage
The advantage of JSON-based storage is simplicity in implementation. The challenge with this approach might be towards analytics as the data is stored in single column.
References
https://mariadb.com/kb/en/dynamic-columns/
https://inviqa.com/blog/understanding-eav-data-model-and-when-use-it
https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
https://db.in.tum.de/research/publications/conferences/sigmod2009-mtd.pdf