Host in, or Host out: that is the question!
I still remember when starting in this IT journey back there in 1998, how important was to have your own email domain. Having your own web server and if possible, your dedicated space in a data centre.
Efficiency is doing things right; Effectiveness is doing the right things.
I was working for a big national company in Argentina when I’ve seen for the very first time a data centre. Full of lights, blinking green and red, a lot of server’s cabinets, switches, routers. It was an awesome experience. I was dreaming every day to work in a place like that.
Then I became partner of a start-up, when the start-up terminology didn’t exist, let’s say a “company of 5 people”. We had a cabinet room of 6 blades, hosting more than 400 websites and email domains. It was something incredible. Having our own DNS servers, our own NAS.
After a while, I joined a multi-national company where the terminology “API”, “SOA”, etc. was pretty normal between teams. I was part of the Operations team supporting more than 150 developers.
We had to come with a solution for our monitoring service. We were hosting internally more than 500 VMs instances + a fairly large amount of EC2 instances.
A multi-region Nagios solution was put in place. It was something that helped with the monitoring, but we were using 1 node per region, and also another node for the dashboard. It was reliable, but we had to over-engineer the solution in order to make it robust.
There was almost no “managed solutions” for your monitoring apart from CloudWatch, but there was no dashboards by then, so it was complex to visualise in a big monitor, and it was for EC2 resources only.
We used Centreon at the beginning, and then we turned out to have plain Nagios and Thruk.
We were distributing the monitoring and also added automation and self healing to hundred of instances going up and down every hour based on load.
After a few years, I’ve started to realise, that the most important thing for a Software company is do software. Typically, an operations team, never is more than 10% of the entire workforce. So when it comes to solutions, you have to be clever and understand that you won’t be able to know everything by the time the business needs you to. If you need to know how to run a 24/7 monitoring system, a 24/7 backup system, a 24/7 logging system + do tickets + keep you trained and up to date in terms of technology, you realise the day has 24hs only and your systems start to crash.
The most important thing for a Software company is do software.
Then, we hit a problem. As an Operations team, you must provide a system that is reliable, secure, but also cost effective. That includes automation, monitoring, self-healing, logging capabilities, etc.
How can we provide that, if the normal operations team size is in between 5 to 10 people, and we have to deal with day to day issues, deploy environments of more than 300 instances, setup systems, etc?.
Let’s start with a few reasons why every IT director should consider outsource service commodities as logging, monitoring, user management, etc.
- Reduced service and support costs within a managed and predictable budget.
- Round-the-clock access to a help desk primed to resolve problems remotely and rapidly.
- Better quality of service, fewer IT failures and less downtime — thanks to well-defined SLAs.
It really depends how mature your organisation is in terms of outsourcing. Is it ready to let things go to a third party company who its expertise is exactly that? Getting lots of logs, processing them and providing a nice interface to deal with?. That’s how we do now with Logz.io. We send an average of 120GB of logs daily. Do you know how much infrastructure would we need to process this?
Also, it produces pretty dashboards as this one:
If the only thing we need to do is install an agent programatically, and that sends out thousands of metrics to a managed service like HostedGraphite and we don’t need to worry about backing up, patching, running monitoring systems 24/7, and the cost is a few cents per node, you tell me if it doesn’t worth. We send them around 100,000 metrics every minute. We have around 300 dashboards, and we managed to offload to our support floor the creation of these.
There are many of these companies out there. I am not telling you to go and pick specifically these ones, I don’t work for them, or get any commissions. They worked for us but may not for you. Do a bit of digging and find the right one for you.
As a conclusion note, remember you will have to compromise something at some point. If outsourcing your monitoring solution will reduce your pain points and you treat monitoring or logging information as non-confidential data, outsourcing it is the way to go. If you need to keep things in house, you will need more people to take care of it, and that’s a business’ decision rather than an IT one.