Pre-Work, Procrastination and Normalization
A look into my procrastination and a brief introduction to the four forms of normalization.
Pre-Work Reflection
Finishing the Launch Academy work in a timely fashion would be challenging if it were spelled out clearly how long it would take to finish each stage of each module. The fact that things do not take the same amount of time for everyone makes providing this information impractical. Add to that, the fact that old college habits of procrastination die hard, an over estimation in my ability to speed through the pre-work, and a weekend trip to New York the weekend prior to the pre-work’s expected completion date, and you have my pre-work experience.
Right off the bat, the concept of speeding through the pre-work is the wrong approach to take. All the problems presented in the pre-work are ones that should be approached with out a time limitation. The more time that can be taken to understand the concepts, work on creative solutions to suggested problems, and refactoring said solutions, the better.
The parts of the pre-work I am referring to in particular are the reading of Learn to Program and Beginning Database Design. In theory, you could take infinite time to read these two books, because they ask open-ended questions. When declaring that there are a finite number of solutions, and applying the law of diminishing returns to time spent on these two books, it would not be challenging to figure out when to stop looking for better solutions, and move on to the next stage of the pre-work. Having the horror of not completing the pre-work in time in the back of my head, I rushed these steps in the pre-work to say the least. In fact, I was so worried I would not finish the pre work, I rarely worked on the problems (The most important part, YIKES!). I usually read the problem, thought about them, than moved on. I told myself I would come back to them when I finished the pre-work. Which I still intend to do once I finish this blog post.
With the discussion of my poor time management skills out of the way I would like to take time to have a more technical conversation about Beginning Database Design. The objective of a database is to store data and make it available for future analysis. The problem is, no matter how well the data model is implemented; the model will be incapable of answering all potential future analysis questions. This is largely because of the changing nature of data (or the changing data of nature), how we observe the world, and our ability to ask new questions. We will always be able to collect more data, but the other important problem Clare Churcher addresses is creating data models that do not hide data already stored from future analysis. Not only cannot storing the correct data be costly, but not storing the data correctly can keep us from turning the data we store into useful knowledge. Beginning Database Design introduces the fundamentals for introducing more robust data models that anticipate future uses of the data we choose to collect (or not collect).
I attempted to read BDD in a day. I do not recommend this. By the time I got to chapter 8, “Normalization” my brain was a bit data’d out. Which is why I want to take the time to revisit the concepts introduced in this chapter. As defined by BDD “Normalization is a formal way of checking the fields to ensure they are in the right table or to see if perhaps we might need restructured or additional tables to help keep our data accurate.” (Churcher, 113) Sounds simple enough, but it was the three normal forms that I struggled to understand. Normal forms refer the different levels of normalization, which address additional situations where problems might occur.\
The first normal form says that “we should not try to cram several pieces of data into a single field” (Churcher, 118) The example dealt with a table that contained a different plant species and there multiple uses. The solution suggested a separate table for plant uses to make plant uses more storable and accessible. It is vital to remember “A table is not in the first normal form if it is keeping multiple values for a piece of information.” (Churcher, 118)
The second normal form, along with the first normal form, helps to deal with update issues. Redundant storage of data can lead to update inconsistencies. “A table is in the second normal form if it is in the first normal form AND we need ALL the fields in the key to determine the values of the non-key fields.” And to remedy this “if a table is not in the second normal form, remove those non-key fields that are not dependent on the whole of the primary key. Create another table with these fields and the part of the primary key on which they do depend” (Churcher, 120). When a primary key consists of more then one attribute of the class, and not all of these attributes are needed to identify an attribute that is not part of the primary key, it is possible to the refactor table, removing the non primary key attributes and placing them in another table.
We can still have problems after normalizing to the second normal form. Hence the third normal form “A table is in the third normal form if it is in a second normal form AND no non-key fields depend on a field(s) that is not the primary key” and to correct this “if a field is not in the third normal form, remove the non-key fields that are dependent on a field(s) that is not the primary key. Create another table with this field(s) and the field on which it does depend” (Churcher 121). This means we can identify an attribute using more then one other attribute in the class. This can lead to inconsistency when determining how to report said instance attribute(s).
Lastly there is Bocye-Codd Form, a table is in this form if “every determinant could a be a primary key.” The best way to summarize this is “ A table is based on, the key, the whole, and nothing but the key (so help me Codd).” This means that every attribute in a class instance points uniquely to the remaining attributes. Every attribute can identify the object.
It is important to realize that normalization of tables helps to address a number of issues mentioned earlier in the chapter by Churcher that deal with modification, insertion and deletion of class instance objects. I apologize to anyone who read this entire post only to find out that the only person to learn something about normalization was me, by writing this post. For further clarification on normalization and other concepts referenced in this post please see Clare Churcher’s Beginning Database Design. She does a much better job of explaining these concepts (along with many other great database concepts) and accompanies them with practical examples.