Foldability — Why not Data?
Our biggest game changer has been data, which has challenges associated with its availability, accessibility and usability in a constant. The challenges have always evolved into chaos, and ultimately we have resulted our outcomes into small chunks of best fit our business requirements. In the end of the day the solution has left the project in a state of flux, leading to silos of additions which never have delivered what the business actually desired. The underlying issue here is we do not care about the foldability of our data. The term foldability is derived from the term fold which refers to the ability to collapse the structure into a compact form. What is the issue with data that it cannot be folded? or are we so rigid about the teachnology or software platform that we simply do not care to ensure that we practise foldabiltiy as an attribute of the data. Any data architect when discussed on this topic agrees that problems would have been simpler to handle had we worked on foldability as the focus of data, however we have created mountains of data over the last 59 years and it is not an easy exercise to implement foldability.
Having said this, however the new generation of developers and architects are asking the question of why? when we talk abiout data sharing as an issue, the reason being they are used to seeing data provided to them by Google or Bing on-demand in their format of usage. From an enterprise perspective this is not simple, there are legal, compliance, regulatory and privacy requirements for data to be met, yet my argument is if data is foldable, we can manage all these requirements and still use the data across different groups extracting what is required by that group, which in other words will provide the implementation of master data and metadata to the data, which is enterprise defined and understood.
Take the example of customer and see what we do today typically, we start with a conceptual model of the data and develop a table with attributes, it is called Customer_Table → First Name, Last Name, Address 1, Address 2, City, State, Zip, Email, Phone_Number_Cell, Phone_Number_Home, Gender, Marital_Status. Then we develop a logical model and a physical model. The table is used in the database with applications. When marketing looks at the table it saya there is some columns missing, another project is opened to add columns, at the same time inside sales looks at the table and asks for a few other columns, and campaign management team asks for still a few other fields. This creates a massive effort and results in three tables as the outcome. Good or bad this is the truth.
Now if we have data foldability, the same table first is defined as a file, and my favorite is JSON format. The file will look like this
{
“type”: “object”,
“properties”: {
“first_name”: { “type”: “string” },
“last_name”: { “type”: “string” },
“address”: { “type”: “string” },
“city”: { “type”: “string” },
“state”: { “type”: “string” },
“zip”: { “type”: “string” },
“email”:{“type”:”string”},
“phone”:{“type”:”string”},
“twitter_id”:{“type”:”string”},
“facebook_url”:{“type”:”string”}
}
}
The beauty of this file model is we can add attributes as needed, increase the depth of the data, increase the nested levels of the data as needed, the same file can be acquired everyday with changes as needed. The data can be extracted from this file based on the team requirements. The coding can be done to pull the data based on the analytics and reporting requirements, loaded to a table using the file as external schema link and results can be computed. This foldable concept of the data needs to be understood by each vertical and we can work on the standards as needed and implement solutions with more ease.
Encryption, security, sensitive data all of these are handled with JSON and folding the attributes creates a set of objects and sub-objects which can transport themselves with great ease and flexibility. The move to cloud from this point will be a simple journey with JSON and Parquet file formats as standards, multi-cloud systems become a reality and applications and platforms can be handled across all cloud vendors.
The concept of data portability with API has been existing for decades, however today major innovators from Apple, Google, Microsoft and Facebook are working on the open source Data Transfer project, which uses new API integration techniques to cross channel data with ease. However the underlying problem of whether we can “fold” the data is not yet there, which if addresssed will lead us to better insights and yield results that we have been chasing for long.
In the world of Internet of Things, this expectation should not come as a surprise but the innovation opportunity should be embraced and boldly taken forward. Then data critical missions will yield results faster as we are not constrained by length and depth of data which is easily managed.