’Tis but an SQL wound, with MongoDB: Part II

Gareth David
8 min readNov 16, 2016

Welcome to the second story of my dive into the world of MongoDB and NoSQL as a practical solution. If you missed the first story, you can click on this link: ’Tis but an SQL wound, with MongoDB

This is some things I’ve learned about MongoDB and using it. I’ve also started doing some training provided by MongoDB University. The lessons have taught me a few things that are either not said in their documentation or not as clear. The documentation is quite comprehensive, don’t get me wrong. What I would love to see is more examples either within the API as to how to use the functions. The Microsoft Developer Network (MSDN) is a great example of this type of documentation.

Apart from the documentation being comprehensive, the drivers have excellent support from the developers of MongoDB itself.

Designing systems is not what you expect

I say this because; system design is changing from traditional engineering. It is a mind shift. The shift is in how to approach designing a solution and how to expand on an existing one. When I first started programming, I would see a problem; create a solution for it and then only worry about storing the data.

During my formal studies in Computer Science, Mathematics, Statistics and Information Management, we were taught more concisely how to store with database management systems, how they work and how to interact with them. This was primarily for Relational Database Management Systems (or RDBMS for short). With the System Development Life Cycle (or SDLC) we were trained that the traditional and best approach is to analyse the problem, write documentation describing the problem and the proposed solution. Then you would design the data models to fit the proposed solution. Implement the processes using the data models for the said solution. Design and create the user interfaces for the processes. Finally, it gets to testing it all with the users. After this, the cycle repeats.

From an engineering perspective, this works. In the physical work, if you make a mistake with a piping design for a refinery, you don’t want that mistake to be built. The problem is, software mainly works in a virtual world, but we follow models developed from a physical world.

The whole SDLC is summarized by the following sequence for system development in my mind:

Data Modelling --> Process Design --> UI/UX Design --> Testing

What often happens though is that you describe a problem, what you would like to see, or how you would like to work with, in the system. In that respect, people don’t come to you talking about the data, but about what they want to experience. Now a great analyst might assist with this, but in today’s environment, you see the analytical role being more done by designers and developers themselves due to budget constraints and other reasons.

With this, it seems like the summarized sequence which tends to happen more often in my experience is this:

UX Design --> Process Design --> UI/UX Migration --> Data Modelling
| ˄ |
| | ˅
+---> UX Testing/Implementation ---+ Testing

The clear difference here is that user experience design is an ongoing process throughout the design of the business processes which both feed into the user interface and ultimately dictate what data needs to be stored. It seems slightly backward style of thinking.

Is it really though? Especially considering how people struggle seeing something virtual being developed. It’s always a constant fight explaining how we are meeting targets of something they cannot see. This works especially for the people paying for the development. With the latter approach, there is constant visual feedback for non-technical people.

MongoDB is an application-centric mind shift

It’s forcing developers to treat the data storage as just that, the data storage. The data should be logically stored, easy and quickly retrievable and make sense.

But hang on, doesn’t RDBMS’ with structured tables store data logically, enable easy and quick retrieval of the data? It does yes, but for this to happen it has to go through various iterations of design. You start by looking at the problem, separating the facts or future scenario’s which need to be stored. Structure the logically similar data together and ensure that each piece of information is generally only stored once and the rest repeating that information reference that information through foreign keys and indices.

You end up with something commonly known as normalized data models.

To get to this state is not always that difficult, but it can be time consuming. Also you need to guess what will be required to an extent and how users will want to interact with the system to ensure that you cover all the bases. Usually down the line you discover you need extra data columns or something can be removed as you overstated the complexity of the problem or solution.

This isn’t application-centric at all, this is data-centric. I’m using the terms application- and data-centric to relate to which one prescribes what the solution will ultimately look like architecturally.

Application-centric is where the application or solution drives what needs to be developed and the application being the focal point of the overall solution.

Data-centric is where the data drives the application or solution to define what needs to be developed and data being the focal point of the overall solution.

With the traditional SDLC mind set, it makes sense, in the latter version, following modern trends, you already know what the users want, how they will work and processes involved. All you need to do is save it logically together. MongoDB enables this quite easily, in my mind anyways.

The data is supposed to support the application/solution from the ground up, logically.

I repeat logically, because we tend to have these normalized data models, which are very space efficient. Recombining the data, to retrieve information from it, at times, can be time very consuming. Even with the best of processors and servers and amount of memory.

As an example, you might have project management solution. Let’s use something simple. You’ll have projects with milestones. Each milestone with have tasks and their hours required to complete the task in the milestone. Like I said; a basic example.

The first thing would be to model all the data to be stored. This would entail traditionally from a quick guess we’ll need three tables at least. Which is Projects, Milestones and Resources with relationships which will be a Project instance can have many Milestones. A Milestone can belong to one Project but can have many Tasks. Lastly, a Task can belong to one Milestone.

This traditionally created tables with something like this:

+----------+        +------------+        +-------+
| | | | | |
| Projects | -----> | Milestones | -----> | Tasks |
| | | | | |
+----------+ +------------+ +-------+

That was data-centric, and is sufficient. In MongoDB however, you would only have a single collection (or table) called Projects. Each project would contain the contextually related data for the project. A similar model would look like this:

+--------------------+
| |
| Project |
| |
| +----------------+ |
| | | |
| | Milestones | |
| | | |
| | +------------+ | |
| | | | | |
| | | Tasks | | |
| | | | | |
| | +------------+ | |
| | | |
| +----------------+ |
| |
+--------------------+

Now for me this, logically, makes more sense.

But it data is the same? It is the same, in part. The benefit of this model is the fact that the when you retrieve, or work with a project, you already have all the milestones relating to that project, or even all the tasks. To get this from the structured world from normalized data models. You need to either need to do a second query, to filter out the relevant milestones, then a third query to filter for each milestone the tasks. You can also “just join” them and let the database system do that for you through the foreign keys and indices. This on hundreds of thousands of records in each table becomes time consuming to just re-join the data into its application represented structure.

Now, I know of size constraints in MongoDB documents which will force you to separate documents and create another collection, but this happens less often than in a tabular world.

The way MongoDB structures and stores data is close to a one-to-one mapping of how my application is representing it, visually or through code. Performance plus logically relating data. It just makes sense.

MongoDB is schema-less

Now the last thing I want to talk about, in this story, is schema design. Think of data modelling as defining what data need to be stored. The schema represents what into how.

In a structural world, it’s important having schema’s. The problem is when these schemas have to change and impacts the rest of the system as a whole. With data-centric systems, each record within a table schema has the exact same fields, of the exact same data types and sizes of the specific data type. The data schema change can have a significant effect. This is true especially when you start getting two different types of a single table. The solution would be extra columns or an extra table or tables.

Let’s say for the project management solution above, you get billable and non-billable tasks. Each type will have their own set of columns with some shared data columns. So you can either just add extra columns, but now many records will have space they are not using, which for indexing could mess around slightly. You can also just add one extra table. You’ll also have to either add another reference in the milestone table to the new table, or perhaps the billable tasks will need to have the extra reference. Lastly, you can keep the tasks table, remove the billable data columns, add them into a second billable tasks table with a third non-billable tasks table. Milestones keep reference to tasks. You now have this:

+----------+      +------------+      +-------+     +----------+
| | | | | | | |
| Projects | ---> | Milestones | ---> | Tasks | --- | Billable |
| | | | | | | Tasks |
+----------+ +------------+ +-------+ +----------+
|
˅
+--------------+
| |
| Non-billable |
| Tasks |
| |
+--------------+

In MongoDB, it’s different, you there is no strict schema. You do not have to define a structure in the database system on how you want to save the data. You just save it. So MongoDB can have documents (records) in the same collection (table) which all have different structures.

It’s not up the MongoDB as a database system to control the structure; it’s up to the driving factor, the application on what needs to be stored. MongoDB just ensures the availability of the facility to store and retrieve the data. Whatever the structure, with MongoDB for the aforementioned changes, the data “structure” might still look like:

+--------------------+
| |
| Project |
| |
| +----------------+ |
| | | |
| | Milestones | |
| | | |
| | +------------+ | |
| | | | | |
| | | Tasks | | |
| | | | | |
| | +------------+ | |
| | | |
| +----------------+ |
| |
+--------------------+

That’s because MongoDB is not data-centric, but application-centric.

For me, personally, these save a lot of time as well as accommodate the way system is being designed as opposed to an academic or engineering method.

Does this mean that MongoDB systems will have inferior designs? Not at all. You just get freed from the structured chaos of strictly controlled environments. Where the focus is not on is having the right data model to fit the processes proposed. Rather the focus is on, does the system do what the user wants it to do, regardless of how it’s stored.

The benefits here for me are obvious, others might not agree based on their experience and perception. That’s fine and that’s their choice.

Think that’s all for now. I will continue in a follow up story of my experiences working with and exploring MongoDB.

--

--

Gareth David

Mythical being that turns coffee into laughter and theorems