Trends in Data Modeling: The Move Towards Business
As the initial step in database creation and design, data modeling involves creating a conceptual model of how data items relate to one another. By marking the progression of a model from conceptual model to logical model and then to a physical schema, data modeling allows for the management of data as a corporate-wide resource. While once considered an exclusive field — one only accessible by Data Scientists and IT departments — data modeling has actually become easier and more user friendly and seems to be trending in that direction due to cloud computing, machine learning, and the overall tendency towards automation.
While end users at this point still need some basic technical knowledge, data modeling is slowly moving into the realm of the businesses that actually need the models and the databases. Because of this trend towards end users, the field of data management overall is emerging to more accomodate said end users. According to Gartner, a website tracking technology trends, “Predictive analytics vendors are trying to reach a broader audience than traditional statisticians and data scientists by adding more exploration and visualization capabilities for novices and business users” (Kempe). Allowing “novice and business users” greater control over the data modeling aspect of the database doesn’t necessarily mean that they will be doing the same type of data modeling that IT professionals and Data Scientists do at the moment, but rather this trend suggests automating the data modeling process overall making it even easier for end users to focus on what actually matters to them — the data (Kempe).
In a similar vein to automating data modeling and reducing its innate complexity, much of this is possible due to machine learning algorithms that work towards establishing similarities between existing and future data. By establishing these patterns, vendors of these easy-to-use data modeling analyses are able to better predict future data. This allows the recycling of patterns.
Pattern recycling is another recent trend that is significant because it reuses patterns, essentially creating new data models from older ones (Kempe). What’s interesting is that these patterns can be reused across various industries — not just within a company. Looking at the basic structure of data models and the basic structure of most businesses or organizations, there is a lot of similarity. Many of these industries have certain basic functionalities. For instance, even across industries, most businesses will have the entities of employees, customers, and products along with several others. And then within say the food industry, we can add the entities of menu, server, and host. And then within the specific restaurant (say The Cheesecake Factory) we can add the entities of cheesecake, bar, and pasta. Essentially, “one third of a data model contains fields common to all business, one third contains fields common to the industry, and the other third is specific to the organization” (Kempe). This structure allows for the recycling of data models based on these common patterns and furthermore allows users themselves to create these new data models in a way that addresses their specific needs.
Some of these algorithms that allow data recycling include “Deep Learning” which is ideal for Big Data sets, “Ensemble Learning” which essentially aggregates outputs from a series of predictive analytics models, and “Bootstrap Aggregating” which improves the precision of machine learning methods. Ensemble Learning in particular helps with pattern recycling as it combines the predictive models and in doing so, helps combine their outcomes (Kempe).
Using machine learning algorithms is also supplemented by the potential to use Natural Language Processing in order to provide results to queries. In fact, the current trend indicates that users may eventually be able to ask data related questions using Natural Language Processing as well.
As mentioned earlier, cloud computing in the form of Data-as-a-Service or DaaS is also something that we will be seeing more of in the future. DaaS is essentially building on the idea that a product can be provided on demand to users without the need for geographic proximity. This essentially means that a service can be given to consumers regardless of where the provider is and regardless of the organizational separation between the provider and the consumer (Dyche). In the case of DaaS, the product is data. The current trend with DaaS seems to be moving towards employing DaaS commercially as well as occasionally in larger organizations like the United Nations (“Statistical Data as a Service and Internet Mashups”).
DaaS builds on the idea that data quality can occur in a centralized location and as such it allows for an increased agility in customer access, a lowering of costs, and an increase in the quality of the data.
Because of the user-friendly nature of DaaS, customers do not need to have prior knowledge or any larger knowledge of the underlying data involved in order to simply access the data that they need. On the providers’ end, they can easily build the base with experts in data modeling and end up just outsourcing the presentation layer allowing for user interfaces that keep the costs minimal. This is primarily because the changes requested by users will generally only deal with the presentation layer and therefore be easy to implement (Dyche).
Of course, DaaS is a type of cloud computing and in that fashion runs into the issue of having the customer be constantly reliant on the service provider’s ability to avoid server downtime. The DaaS model also ends up leaving customers in this odd limbo of “renting” data instead of actually downloading or keeping it in any way. They can still use it to analyze and produce graphs, charts, and other visualizations; however, generally downloading the data is not an option. (“Exploring PBBI’s Vision for Geospatial Data as a Service”).
Cloud computing, however, is the reason machine learning has become so integrated in data analytics today and why that trend is predicted to continue. Basically, it is only because of cloud computing and the scalability of it that we have so much data in today’s world. The mass quantities of data is the reason automated data modeling becomes increasingly relevant and increasingly necessary.
Data modeling is an ever-expanding field in Information Technology with new modeling tools and new features being added on a regular basis in order to increase efficiency. Most of the current trends in data modeling indicate that the field itself and the process itself are moving closer and closer towards automation. Through machine learning, pattern recycling, cloud computing, and of course the way these concepts interact with each other to produce things like Data-as-a-Service, we see the advancements in data modeling making the process more and more accessible to end users, in most cases the businesses.
This does have implications for the future of Data Scientists as part of their role currently involves creating, testing, and monitoring data models based on the needs of the business. With part of that role disappearing with this automation, the need for their assistance beyond initial calibration is diminishing (Kempe). This does, however, clear their time in order to focus on more complex analytic applications to increase efficiency. Overall, the current trends towards user-friendly data modeling seems to bring much of the complexity of data modeling down to a base level that will hopefully not only increase understanding of the process but also improve efficiency and overall productivity.