Data Modeling : Intro
What is a data model?
Data Modeling refers to the practice of documenting software and business system design. The “modeling” of these various systems and processes often involves the use of diagrams, symbols, and textual references to represent the way the data flows through a software application or the Data Architecture within an enterprise.
Why Is the Data Model Important?
When designing programs or report layouts (for example), we generally settle for a design that “does the job” even though we recognize that with more time and effort we might be able to develop a more elegant solution.
- Leverage
The key reason for giving special attention to data organization is leverage in the sense that a small change to a data model may have a major impact on the system as a whole. For most commercial information systems, the programs are far more complex and take much longer to specify and construct than the database. But their content and structure are heavily influenced by the database design.
- Conciseness
A data model is a very powerful tool for expressing information systems requirements and capabilities. Its value lies partly in its conciseness. It implicitly defines a whole set of screens, reports, and processes needed to capture, update, retrieve, and delete the specified data.
- Data Quality
The data held in a database is usually a valuable business asset built up over a long period. Inaccurate data (poor data quality) reduces the value of the asset and can be expensive or impossible to correct.
What Makes a Good Data Model?
we can be a bit more precise than this and identify some general criteria for evaluating and comparing models.
- Completeness
Does the model support all the necessary data? Our model lacks, missing columns, missing records.
- Nonredundancy
Does the model specify a database in which the same fact could be recorded more than once?
- Enforcement of Business Rules
How accurately does the model reflect and enforce the rules that apply to the business’ data?
- Data Reusability
Will the data stored in the database be reusable for purposes beyond those anticipated in the process model?
- Stability and Flexibility
How well will the model cope with possible changes to the business requirements? Can any new data required to support such changes be accommodated in existing tables? Alternatively, will simple extensions suffice? Or will we be forced to make major structural changes, with corresponding impact on the rest of the system?
- Elegance
Does the data model provide a reasonably neat and simple classification of the data?
- Communication
How effective is the model in supporting communication among the various stakeholders in the design of a system? Do the tables and columns represent business concepts that the users and business specialists are familiar with and can easily verify? Will programmers interpret the model correctly?
- Integration
How will the proposed database fit with the organization’s existing and future databases?
- Conflicting Objectives
Our overall goal is to develop a model that provides the best balance among these possibly conflicting objectives.
Where Do Data Models Fit In?
Any sound methodology for developing information systems that require stored data will therefore include a data-modeling phase. The main difference between the various mainstream methodologies is whether the data model is produced before, after, or in parallel with the process model.
- Process-Driven Approaches
We naturally tend to think of systems in terms of what they do. We first identify all of the processes and the data that each requires. The data modeler then designs a data model to support this fairly precise set of data requirements, typically using “mechanical” techniques such as normalization.
- Data-Driven Approaches
They have since generally evolved into parallel or “blended” methodologies.
- Parallel (Blended) Approaches
Having grasped this theoretical distinction between process-driven and data-driven approaches, do not expect to encounter a pure version of either in practice. It is virtually impossible to do data modeling without some investigation of processes or to develop a process model without considering data. At the very least, this means that process modelers and data modelers need to communicate regularly. Indeed, they may well be the same person or multiskilled members of a team charged with both tasks.
- Object-Oriented Approaches
We have seen increasing use of object-oriented approaches to system specification and development, and, for a while, it seemed (at least to some) that these would largely displace conventional “data-centric” development.
- Prototyping Approaches
Rather than spend a long time developing a detailed paper specification, the designer adopts a “cut and try” approach: quickly build a prototype, show it to the client, modify it in the light of comments, show it to the client again, and so forth.
- Agile Methods
Agile methods can be seen as a backlash against “heavy” methodologies, which are characterized as bureaucratic, unresponsive to change, and generating large quantities of documentation of dubious value.
Database Design Stages and Deliverables
Conceptual, Logical, and Physical Data Models
The Three-Schema Architecture and Terminology
Who Should Be Involved in Data Modeling?
At this stage, let us just note that at least the following people have a stake in the model and should expect to be involved in its development or review
- System users, owners, and/or sponsors
We will need to verify that the model meets their requirements. Our ultimate aim is to produce a model that contributes to the most cost-effective solution for the business, and the users’ informed agreement is an important part of ensuring that this is achieved.
- Business specialists
Verify the accuracy and stability of business rules incorporated in the model, even though they themselves may not have any immediate interest in the system. For example, we might involve strategic planners to assess the likelihood of various changes to the organization’s product range.
- Data modeler
Overall responsibility for developing the model and ensuring that other stakeholders are fully aware of its implications for them: “Do you realize that any change to your rule that each policy is associated with only one customer will be very expensive to implement later?”
- Process modelers
Will need to specify programs to run against the database. They will want to verify that the data model supports all the required processes without requiring unnecessarily complex or sophisticated programming. In doing so, they will need to gain an understanding of the model to ensure that they use it correctly.
- Physical database designer
Will need to assess whether the physical data model needs to differ substantially from the logical data model to achieve adequate performance, and, if so, propose and negotiate such changes. This person (or persons) will need to have an in-depth knowledge of the capabilities of the chosen DBMS.
- Systems integration manager
Organizing the modeling task to ensure that the necessary expertise is available, and that the views of all stakeholders are properly taken into account, is one of the major challenges of data modeling.
Is Data Modeling Still Relevant?
Whether as a result of asking this question or not, many organizations have reduced their commitment to data modeling, most visibly through providing fewer jobs for professional data modelers. Before proceeding, then, we look at the challenges to the relevance of data modeling (and data modelers).
- Costs and Benefits of Data Modeling
A formal data-modeling phase, undertaken by skilled modelers, should reduce the costs of database development (through the greater efficiency of competent people), and of the overall system (through the leverage effect of a good quality model).
- Data Modeling and Packaged Software
After the package is purchased, we may still have considerable say as to how individual tables and attributes are defined and used. There is plenty of room for expensive errors here and thus plenty of room for data modelers to ensure that good practices are followed. If modifications and extensions are to be made to the functionality of the package, the data modeler will be concerned to ensure that the database is used as intended.
- Data Integration
Poor data integration remains a major issue for most organizations. The use of packages often exacerbates the problem, as different vendors organize and define data in different ways.
- Data Warehouses
The data model for a warehouse will usually need to support high volumes of data subject to complex ad hoc queries, and accommodate data formats and definitions inherited from independently designed packages and legacy systems.
- Personal Computing and User-Developed Systems
Owning a sophisticated tool is not the same thing as being able to use it effectively, and much time and effort is wasted by amateurs attempting to build applications without an understanding of basic design principles.
- Data Modeling and XML
The same benefits have led to its wide adoption as a format for the transfer of data between applications and enterprises, and to the development of a variety of tools to generate XML and process data in XML format.
Summary
Data modeling is a design process. The data model cannot be produced by a mechanical transformation from hard business facts to a unique solution. Rather, the modeler generates one or more candidate models, using analysis, abstraction, past experience, heuristics, and creativity. Quality is assessed according to a number of factors including completeness, nonredundancy, faithfulness to business rules, reusability, stability, elegance, integration, and communication effectiveness. There are often trade-offs involved in satisfying these criteria.
The benefits of data modeling are very important. Data modeling is the process of creating a data model to communicate data requirements, documenting structures and entity types. It will be a visual guide in designing and deploying databases with high-quality data sources as part of application development.
Reference
What Is Data Modeling? — DATAVERSITY
Data Modeling Theory and Practice Graeme Simsion