Data intelligence is not an afterthought
Looking at my LinkedIn news stream, it is perfectly clear that Big Data Analytics, the art of getting insights from large amounts of diverse and unstructured data, is booming.
Of course, if you have these gigantic piles of data, and you have no idea where to start, setting up a process to finally make some sense out of it all seems like a valid choice. In fact, I have deep respect for those wizards who apply machine learning algorithms and are able to distill value from those giant data lakes.
The downfall of data-centric thinking
The current trend to apply artificial intelligence (AI), machine learning (ML) or other so-called intelligent algorithms to distill insights from data, while being extremely helpful, is sometimes taken too far.
As a CTO helping out many startups and scaleups over the last years, I have heard more people talk about Big Data and -Analytics than I’ve heard people talk about setting up well-structured data in the first place.
And that’s an issue.
Secondly, there is an ongoing trend to abstract data out of developers sight, hence the popularity of Microsoft Entity Framework (EF) and other ORM frameworks that let you think structuring data is unimportant and you better focus on the code, promoting “Code First” ORM mapping.
Entity Framework eliminates the need for most of the data-access code that developers usually need to write.
(From Introduction to Entity Framework)
Thirdly, I have come across many developers who switched to NoSQL databases, where you don’t have to even think about structuring your data. After all, cloud storage is unlimited, so who cares about structuring, normalizing and optimizing data storage?
Making your data important again
As someone who’s been around for more than 25 years in software development, I don’t think it’s just being old-fashioned to let a good solution architecture start out with thinking about the data, and making sure the selected data structures are highly optimized for intelligent data retrieval, without the need for additional data collecting solutions like full text search engines or data mining.
“[…] The immaturity of NoSQL languages meant more complexity was needed at the application level. The lack of JOINs also led to denormalization, which led to data bloat and rigidity.”
(From “Why SQL is beating NoSQL, and what this means for the future of data”)
If you are a startup and your data is part of your IP, you should be spending time on creating a solid data architecture. Making sense of all the data you build up or collect is far easier when you’ve done some proper thinking beforehand.
Putting your data first, really taking the time to architect your data domain, means looking at your business proposition and crafting data scenarios, just like you would do with application requirements. Because in the end, your data contains the gold that you’re after. Don’t you agree that having the gold up for grabs is better than having to dig for it?