Database — Database Options (Part 10)
What are the options, and why we need to choose one over another?
We’re going to start with the simplest kind; desktop database applications, meaning an application you would install yourself on your own desktop or laptop, and you’d use the same application to both design the database and also to use it and enter data into it.
There have been many products in this category over the years. The two most relevant now are Microsoft Access and FileMaker. There are other options, like the open source alternatives like Apache Base and Bento.
Now these kinds of applications are not targeted primarily at professional developers, although development skills are often very useful. They are targeted at experienced users and business users. These are for internal use with small groups and small to medium size businesses with small to medium amounts of data.
Reasons to use?
- simple to install, and easy to use.
- provides templates to get started, which can be customized.
- provides tools to design your database and the user interface to interact with the database.
- has reporting tools that generate reports based on the data.
Reasons NOT to use?
- when many users are accessing it.
- having a large amount of data.
- building a website database.
The software used to maintain relational databases (RDBMS) are the classic database management systems we have been focusing on during these tutorials. The most obvious in this category are Oracle, IBM’s DB2, Microsoft’s SQL Server, MySQL, SQLite, along with several others.
Now, of course, these options aren’t identical. There are differences between them. There are differences in features, implementation and hardware requirements and expense.
The RDBMS are often split up into a family of different applications. So, you first install the database engine, and use a command line or install a separate application to interact with it, like MySQL Workbench for MySQL. But, none of them provide any application targeted at the end user, and you need to build that yourself.
The larger and complex your data will get, the more time and effort you need to maintain. In large companies, database administrator’s full time job is to maintain the database management system, and the database inside it. And being a good database administrator requires not just a good general knowledge of database design, but the specifics of the DBMS you’re using.
While RDBMS are the most widely used with the longest history, they aren’t necessarily always the best option. Let’s take a look at a couple of other database types.
Object-based and XML-based Databases
With the rise of XML as a format for exchanging data, there are also a handful of DBMS oriented specifically around using the XML as their internal structure, rather than the tables of columns and rows.
If XML is naturally your only kind of data, they are worth looking at. If, however, what you have is some XML, there’s nothing that stops you from just storing XML in a regular relational database table.
We can store XML data directly as a column value, where the column type is a text or XML, then either use SQL to work with it or XQuery (if stored as XML type).
One of the down sides with the RDBMS is the objects we would generate in in the object-oriented languages don’t necessarily map exactly to rows in the database table.
There have been an Object-Oriented DBMS (OODBMS) over the years trying to fix this issue, things like Objectivity/DB, VelocityDB, and Versant.
But, like XML-based databases, they never reached the popularity level of relational database systems.
Here’s the reason, relational databases are still very popular with object-oriented languages. Yes, the problem does exist, that software objects don’t always match one on one with rows in a table. But, what’s very popular is to use an ORM Framework; An Object-Relational Mapping Software (ORM).
This is a software framework that helps us map from objects in an object-oriented language to regular relational database tables and back again. These frameworks include Hibernate in Java, the Core Data Framework for Objective-C, the ActiveRecord Library in Ruby, and so on.
The most impactful developments in database technology came with the rise of what are called the NoSQL databases, and we’ll talk about those next.
NoSQL as a category can be a little confusing. These days that term is taken to mean not only SQL rather than No SQL at all.
Unlike in relational databases, where the skills are broadly transferable between, say Oracle and SQL Server, or DB2 and MySQL. In NoSQL, just because you know MongoDB, doesn’t mean you have any head start on learning Neo4J.
They’re very different from each other. They aren’t traditional relational databases, and they’re getting away from those rules. The most differences:
- They don’t use SQL as a query language.
- Usually they don’t store data in tables, a few of them do.
- Usually they don’t really care about relationships.
- Usually they don’t provide ACID transactions, though a few can.
- Usually they don’t require a formal data schema.
- Several have been developed and oriented around web and large-scale applications.
- Most of them are open source projects.
But, beyond that, there are a lot of categories of NoSQL databases:
They store a self-contained document (a piece of data) that describes its own schema, rather than rows and columns.
The format you’re actually storing might be XML, or JSON, which is just has a loose structure where every new piece of data that you store can have its own schema.
Each document, each piece of data, is given a unique id, but beyond that, you have flexibility. You don’t need to provide a formal schema. You don’t need to define relationships.
Examples in this category would be CouchDB and MongoDB.
This is another category which is all about having no predefined schema for your data, simply storing and retrieving everything based on a key-value pair.
The records are stored and retrieved using a key that uniquely identifies the record, and it’s used to quickly find the data within the database.
The value could be assigned to whatever data type you want, it could be integer, string, large, small, binary data, or whatever.
There’s a lot of crossover here with document databases like MongoDB and CouchDB, as they could fall both under the document databases and key-value databases categories. But, other well-known NoSQL databases would be MemcachedDB, Riak, and Project Voldemort.
This is when your data is in a graph structure; small connected nodes, with relation between them, where there’s no one master point, and everything can be connected to everything else.
While many of the NoSQL databases tend to de-emphasize relationships, this goes the other way. It’s all about having small nodes of data that have connections to other small nodes of data.
Neo4J is probably the best here, though there are others like AllegroGraph.
Relational Vs NoSQL Databases
The reasons to choose NoSQL databases over the relational ones:
— Do you need a flexible schema?
If you can’t define a relational database, because when you start to draw out your tables and columns, you just have no idea what they’ll be, because the data you need to store will change from moment to moment.
— Do you have a very large amount of data?
If you have hundreds of millions or billions of rows, then HBase database is a good candidate. If you only have a few thousand or a few million rows, a traditional relational database might be a better choice.
— Do you value scaling over consistency?
The traditional relational databases are oriented towards never losing a single thing, because for things like banking systems or airline bookings or patient information, you just can’t lose a thing.
If, however, you’re more concerned about being able to scale very fast, maybe your database needs to do vast amounts of statistics but no particular one entry is all that important. And some of the NoSQL databases are really more oriented to that kind of approach than the never lose a single row.
If you have well-defined data, don’t underestimate the ability of traditional relational databases, even if it might not be the newest technology around. The newer technologies are just here to fix different kinds of problems, but, your problem can be already fixed with a traditional relational database.