From Data Center to Serverless Computing; deciding which is best
At every point when moving from Unmanaged to Managed infrastructure, there is increased cost traded for offloading maintenance to the chosen provider. Below we explore incremental value added and trade-offs as control is traded for convenience. Skip to the bottom for example case studies. The goal is to build some mental models of where and when to use different hosting types
Data Center OR Home/Office Server
Whether a cold room filled with Xeons or a server under the desk at home, fully managed comes with more overhead than can be listed. At some point it becomes more and more vital that downtime be minimized. It doesn’t take long to realize how easily downtime can occur: power failure, ISP down, hardware failure, slow upload.
If you’re not in the technology business, it’s best to offload that work to a company that is, so you can focus on your work rather than worrying if the internet will go down.
It can make a lot of sense not to offload having to deal with a datacenter, hardware, power and bandwidth to a provider that is fully dedicated to such tasks at scale. For example ovh.com, hetzner.com offer fast bare metal machines for a relatively low monthly price. There may also be a co-location businesses near you that will rent you a server rack, and you can bring your own server for a monthly fee, while they take care of bandwidth, power.
This removes most the troubles above, but now there are new challenges. I pay in advance so I have to guess my maximum utilization for the month to avoid downtime. If I need more resources, like ram; disk; network, I will need to order another server and migrate the data, possible incurring some down time, or I will have to ask a tech at the third party to take down the server and upgrade it. I now have to secure and give proper access controls to any team members wanting access. It’s one server in one data center; the server can still go down due to: hardware failure, application logic failure, issues in the hosting company infrastructure.
Touching the border in between unmanaged and managed are private cloud software like openstack, vsphere, mesos, even ovh private cloud. This allows abstracting real hardware, giving a cloud-like environment on top of inhouse, or off-site dedicated servers. The complexity of maintaining a cloud environment internally may offset the added benefits of the flexibility a cloud environment provides. I will leave it outside the scope of the article.
Cloud compute has bought me some relief. I can have servers across many data centers, they can autoscale when load is high, and I only pay for what I use. I don’t have to wait for a tech to upgrade my hardware. I’m fully able to create networks, disks, servers, autoscaling, entirely from code, and make it all spin down within seconds. Rolling updates are now possible without downtime. If hardware fails, or applications fail, I can have unhealthy instances get deleted and replaced with new ones on the fly.
If I need support I still need highly skilled AWS professionals, which can be much more expensive than regular IT. If I over provision, my account gets hacked, I forget I left something running, I underestimated bandwidth costs, have a team that does not know AWS well, my costs can go sky high. If someone hacks my cloud account they can delete my whole business. They couldn’t do that with a datacenter. If a hardware exploit is found, my data could be stolen by an external party running on the same server.
It can be easier to just provision a group of larger machines and then run many isolated services off a cluster. Containerization makes software maintainable by enforcing the development environment is the same as production, it avoids undocumented infrastructure changes, makes possible rolling updates, uses configuration documents to create infrastructure with constraints on what should be connected. I can declaratively define what is required for a service to be called “healthy”, and how much redundancy each service needs. With containers I can have 1000 small services running on two big machines, rather than 1000 small VMs to maintain.
While it adds cost savings, often container management may require specialized knowledge of docker, kubernetes, rkt, ECS, continuous delivery, continuous integration. It can be a challenge understanding distributed systems and automated service deployment, and the interactions between those multiple services.
Serverless (AWS Lambda, Google Cloud Function)
Here the cloud environment takes care of server patching, server upgrades, security. I upload my code, it runs when it gets requested. I offloaded everything except writing the code. Serverless functions are fully scalable and I don’t deal with servers. Lambda @Edge allows use of cloudfront to serve requests for better latency.
At the same time I lost flexibility. I can’t just run Redis, and a webapp in the same code. My code is limited to 5 minutes runtime. Latency is not guaranteed, even if I keep the container hot. If I have hundreds of services that need to interact, it may be harder to manage than having modular code, or containers as microservices. It may be difficult to move to another cloud provider or move a service back to run locally, because of a fully dependency on single vendor provided managed services.
The meaning of “stateless” in lambda may be different than stateless in containers or spot instances. I can still have an in-memory log buffer in a container, that flushes once a certain size is reached. When it is terminated the container receives a signal to flush the logs. In lambda, I cannot batch outbound messages, however small, I have to add more complexity like adding kinesis or SQS.
Mock Case Study: Home server + Big Computation
I need to run Apache Spark, bigdata jobs on data that absolutely must be intranet only.
I buy a $1,000 used server off ebay with 32 core, 256gb ram, and I run my big data jobs. A similar used server on AWS would cost $2,000 each month, in addition to bandwidth, data transfer fees, data storage. Often people will keep a cluster of 5 to 10 m4.nlarge servers in EMR.
I just saved $23,000 or more by just buying an old server, and cross my fingers it lasts at least a year. All state can be saved to a NAS in case of failure.
Mock Case Study: Home server + Webserver
Occasional downtime is ok. I host a hobby site. Open port 443 on my switch and serve a website. If IP address changes use service like freedns.afraid.org . I never turn off the desktop, or raspberry pi.
Mock Case Study: Shared hosting + Webserver
I have a site that gets 5,000 hits a month or less. I just want something simple that can host static html or simple python, or php scripting. I don’t want to configure DNS, or keep my computer on always. A tiny VM, or a shared host is fine. For example 1and1.com as $5 servers.
Mock Case Study: Dedicated Server
I know linux very well, or I know enough to safely deploy containers. Some downtime is ok during hardware upgrades, migrations or host failure. I have offsite backups. I accept I may not be easily able to quickly handle spikes in growth. I don’t want to pay 10 times more for cloud VMs, and I love having the performance of a pure metal server. I love having NVMe SSDs for a tenth of what they’d cost on cloud providers.
Mock Case Study: Data Center
I have the IT to invest in vsphere, openstack or kubernetes, and the discipline to have versioned infrastructure as code across all teams. I have data redundancy, bandwidth redundancy, networking expertise. I’m unwilling to risk putting customer data on a third party provider. It’s better for me to run my data center fully inhouse.
Mock Case Study: Cloud
I need autoscaling based on demand. My service fluctuates wildly. Sometimes I need 1,000 servers to process a job, and have it run for only an hour. I know how to use spot/pre-emptive instances to pay reduced pricing when needed. I have a completely offshore team, so I can’t have a physical datacenter. My business cannot afford downtime, so hosting in different regions, and availability zones is essential. I have the expertise to avoid runaway costs, I have the staff to maintain, patch and upgrade servers.
Mock Case Study: Cloud + Autoscaler + CDN
My site needs to load as fast as possible. Previously I bought a dedicated server from ovh but it was running in france, meanwhile all my customers are in the US. It made more sense to run servers in US east, US west, and US central, autoscale them, put my domains on Google DNS or route 53, and put the site behind CDN. Mobile traffic has gone up 100% because loading times and latency are so low.
Mock Case Study: Serverless
I need event based triggers across all events on AWS, or webhooks from third parties. I also have a lot of services that run very infrequently or with sporadically jumps in demand. Latency is not a big issue and code is relatively simple. If I do end up getting a lot of traffic I’m willing to pay more than EC2 if I can avoid having to manage servers. I’m willing to use tools like zappa and apex to control complexity and deployment of the codebase when it gets larger. I know to avoid having too many services interacting in undocumented ways.
Dedicated Server vs Cloud
With dedicated you often pay for fixed bandwidth rate, like a 1gbit line. With cloud computing you can pay ingress/egress rates, which is basically download and upload. The cost can be variable depending on data transfer regions, or across availability zones. This can also increase in unpredictable ways if you use a loud balancer or CDN.
Dedicated servers may take more time to allocate, to add ram, disk space, fix hardware failure compared to cloud. They are purely in one region, and cannot autoscale. Cloud is wildly more expensive, especially for SSDs, slower due to VM technology.
Managed hosting comes with some risk. If a cloud provider is sold or poorly maintained, it’s really hard to move off. It’s possible to Cloud providers to go down across whole availability zones, it’s happened many times. It’s possible the dedicated server runs for 6 years without downtime, or that it fails tomorrow afternoon. In the end, increased cost for managed is just hedging against risks of downtime, up/down scaling against demand, point and click infrastructure.
Google Cloud vs AWS vs IBM Cloud vs Alibaba Cloud
I found AWS to have an extensive amount of services, while google cloud has a small number of killer apps, like bigquery, spanner. GCP has the ability to quickly create new projects, view the dashboard to see what’s running, select only one owner for that project, easily track the budget. This makes it easy to ensure cloud environments are created per project, rather than per department. Amazon Web Services has more open source third party software to make things easier, more third party integration support. It has AWS service catalog to ensure governance and security, but it requires cloudformation scripts, more setup. Google cloud automatically gives cloud functions an endpoints, rather than having to go through very expensive API gateway, but AWS has so many more languages to use with serverless, and also now has serverless aurora.
IBM Cloud has cplex, IBM Watson natural language, no cost per cloud function request. Alibaba Cloud is very inexpensive and has discounts sometimes.
It’s common to say, “I want the best in class hosting. Learning more than one environment would fragment our ecosystem. We pick one, learn it well, stick with it.” However, many companies are still dealing with trying to move off the IBM mainframe and into the cloud for the same train of thought. In my opinion, it’s better to diversify between on-prem, cloud, and dedicated where it makes sense to do so. Technology is rapidly changing, and just as it’s a bad idea to put all your money in a violently fluctuating investment or currency, it’s a bad idea to sink all your money into the popular provider of the day.
Hackers can delete your cloud: https://threatpost.com/hacker-puts-hosting-service-code-spaces-out-of-business/106761/
New classes of cpu hardware exploits appeared recently and were patched. Such exploits, if they had been discovered first by malicious third parties, could have been used to steal data from shared computing environments. https://meltdownattack.com/