In the last weeks I was working with two projects. It was the first time in Pernix I have be in charge of projects. At first, telling others what they should do and assign tasks were not easy but as time passed by, it became less difficult. My experiences with past projects help a lot while I understood the code and architecture, so the effort pays off.
One of my tasks was to start with a minor improvement to a database design. So, normalization rules came to play the most important character. But I found a little problem while I was proposing a design: there were some fields that not always had data, how can I manage that? — I thought. As the vast majority of cases the answer is it depends. Normalization is a industry standard design pattern but doesn’t mean to follow it blindy. There is a denormalization concept too:
It’s important to remember that the optimal relational database implementation requires that all tables be at least in third normal form (3NF). […] Good database design also considers processing (or reporting) requirements and processing speed. The problem with normalization is that as tables are decomposed to conform to normalization requirements, the number of database tables expands.
- Database Systems (Coronel, Morris and Rob)
So, when is ok to denormalize? The problem I described above is a particular case which introduce some NULL cells, but the key point is there will not be much. The meaning of NULL makes sense in this problem because it is applied to special cases. Additionally, normalize these fields could incur in more JOIN operations heading to increase heavy I/O operations. Finally I opted to simply denormalize those specific fields.
Another important lesson is to use pagination when working with data. For example, if some application has a huge list of clients and it is showed in a webpage; is preferable to display the clients ten by ten in each page as Google does. But does it mean Google search all coincidences once and for all? Similarly, and application does not need to download all data from a database, it is better to provide only what it needs. Keep in mind this “simply” (to program pagination in some cases is not so simply, it depends of DB features) suggestion surely will improve application performance.