‘Tis but an SQL wound, with MongoDB
Here I am; with reference to Monty Python, primarily from a structurally controlled world of relational database management systems.
I’ve been designing and developing information and process management systems, architectures, concepts, data models, data structures and algorithms for more than a decade. All of the aforementioned was to facilitate the creation of a solution. A solution which either in part, or in whole, assists in solving a larger problem; or even potential future problem.
One of the most mundane and time consuming bits, apart from talking to people who talk in circles or in synergy, are designing, developing and testing data models and against a given data structure or processes. With experience, certain trends have exposed themselves to simplify the modelling process, especially for reoccurring generic problems. This experience has helped me a lot and have saved me so much measurable time designing the data models for storage in a database.
In the past, it has been easy to build and accommodate various problems through relational database management systems (or RDBMS for short). This is because of formal training with my degree in Computer Science and because people managed their information through basic tables of data. Data which has usually been in the form of something like a Microsoft Excel Spreadsheet. Products like Excel filled a gap, or part of a solution to a problem, which was storing the data with relationship to something else. This allowed us to focus on the next thing, managing the processes using the data.
With the evolution of business and the way we communicate, so does the need for what data must be stored and how it must be stored to accommodate the current and changing data structure and processes. The solution to this is fairly simple, but also time consuming. As we need to analyse the changes, determine how to change the data structures for the changed processes. Modify the data models and transform and translate the database tables to accommodate this.
Sometimes, this process of change could take long. So long that just before you are done, the processes you are attempting to accommodate has changed again. In a world of business, this also costs a lot of money.
Conway’s Law, which was created by a computer programmer called Melvin Conway said the following in 1968:
organizations which design systems… are constrained to produce designs which are copies of the communication structures of these organizations
With Conway’s law in mind, we need to be able to adapt systems rapidly and quickly to represent the communication structures. Be it for the business in whole, or communication and contractual structures per customer. At times, it can be a deal breaker if the systems cannot accommodate the communications or customer requirements. Being able to adapt, and rapidly, has become increasingly import in recent years; if not the norm.
Although the traditional days of project management still exist in companies with loads of corporate red tape, these days, it’s more about being agile, lean and quick on you feet (or fingers). There isn’t time any more to first do a complete analysis, create use cases and write a complete user requirement specification. Then only to get it all reviewed by non-technical people to get this signed off only to start development followed by testing. The users want to changes now, or in a few weeks, not in 6 to 12 months.
The glory of relational database systems which helped save space through referencing and the space efficiency thereof as well as the extensive usage of indexing of the data stored. This allowed a large amount of pleasure and usefulness from systems using lots of data. Creating changes and transforming the data models for these systems is a pain though. Especially if done properly and not just slapped into the data as an extra column. It does not like the agile world too much. Storage space is not a problem anymore. Large data sets (which is now popularised with the buzzword “big data”) joining data from the various references is time consuming and slows the process down.
It slows it down because if how it requires to create joined pieces of data of equal length, sizes while maintaining referential integrity with a large amount of data which is processed by a standardised (and sometimes bastardised) structured language called SQL, or Structured Query Language.
NoSQL might sound like there is no SQL (pronounced S-Q-L not sequel, its not a continuation on something). Commonly NoSQL refers to not only using SQL. Until recently I believed NoSQL meant just that, no SQL. I believe many others still do. If you are familiar with RDBMS systems like Microsoft’s SQL Server, (unfortunately Oracle’s) MySQL, MariaDB, PostgreSQL and others you will be familiar with SQL. With NoSQL database systems, they might not comply fully with the complete SQL standards as it might not support all the language requirements. It might not use SQL at all, but it’s own variation of a <Insert your own phrase> Query Language.
My first exposure to a NoSQL based system was through a Object Oriented database system called InterSystems Caché. Caché in itself and the speed enhancements over traditional relational SQL based databases. It was part of the motivation of why we started using it. Licensing from InterSystems was very complex though which kind of deterred me from using it.
Recently, I’ve been working with relational databases again, I started looking at improving performance of these systems as the biggest problem was extracting financial reports. Due to the amount of data and the complexity of the data structures, joins and data references associated, reports can take 6 to 8 hours to process.
I’ve always enjoyed consistency.
Diving into MongoDB I’ve realised, there is more than just consistency here. There is a mind shift in simplifying our thinking, to storing data in almost the same way our data structures are built. This helps us keep context of of the data we are working with. Also, there is (mostly) no joining required in MongoDB. That’s because relationships are generally stored within the the master data. Thus coming back to consistency and keeping context.
Although it feels like a mind-shift, it’s actually so easy to adapt and use MongoDB.
I’ve started creating some test scenario for myself processing data and extracting data for the reports. Now having the similar data stored in smallish data sets, or collections as MongoDB calls it. Extracting information would take approximately four to five minutes with the traditional systems running on a dedicated server with an indexed relational database. With the preliminary, yet to be indexed, MongoDB database, running on my 3 years old notebook, I extracted the same information in three to four seconds! Yes, three to four seconds!
That in it’s own was a major jump in performance which blew my mind. Right now I built myself a data synchronisation process which automatically extracts the data from the traditional database, transforms the data and saves them as MongoDB documents. This process runs in parallel using asynchronous and concurrent computing. I’m hoping when I have a fairly replica of the current database to run some tests again. To see the difference in real world performance.
If the difference is significant enough, it will be enough motivation to either force the current third party provider to improve their system we are using, or to just build our own. Exciting times lay ahead and I can’t wait to get involved.
In time I’m going to add follow up stories to tell of my experiences. Areas where I struggled, or that was simplified by MongoDB and NoSQL. Perhaps someone else will find value in what I learn throughout the process to inspire them to create something big.