From data center to Polycloud
We had to move out of the data center. Where do you move a technology company these days?
As we improved the way we develop FINN.no we overloaded the underlying server network. We needed an alternative to on-premise hosting or risk not being able to grow our hundred million USD business any further.
A network upgrade would require work that likely would lead to service disruptions. And seriously, a better network was not what we wanted; our real mission is to increase the company’s opportunities to innovate. The “good old data center” had become a liability. We decided to move to the cloud.
Infrastructure as a competitive edge
FINN.no is the dominating e-commerce site in the Norwegian classified advertising market. Facebook marketplace compete with us using their global platform and startups compete using cloud technologies. We did consider moving to a different on-premise infrastructure supplier, but local hosting companies focus on enterprise. Enterprise IT can not provide the required speed and flexibility needed to improve our products and capture new markets. The cloud offers combination of modern infrastructure concepts and easier integrations with business partners and services. Some clouds offer technologies that will propel us forward, others do not. How to choose?
We are known for adopting technology to achieve competitive advantages. Early heavy adoption of Kafka, Kubernetes, Prometheus and Terraform has accelerated how we work and our ability to deliver features faster with less friction. Our focus on technology help attract outstanding talent — talent that expect to be working with great technology. We decided to focus on technology when selecting a cloud partner.
A solution for a technology company
FINN implemented DevOps years ago. We have a true DevOps culture — making 200 production changes per day. We developed our own deployment abstraction tool for Kubernetes (known as FIAAS — FINN Infrastructure as a Service) to help developers quickly and safely manage their deployments. FIAAS merges simple application configuration with infrastructure defaults. Once tests are green and a service is deployed — statistics, errors, and alarms are built-in to help make ongoing work efficient and comfortable. This way both application developers and infrastructure developers can concentrate on enhancing their products.
All this is well and good! -But the modern and efficient continuous deployment platform balances on top of an outdated server solution 😬.
Servers and storage are expensive to administrate (and seriously, a virtual server is just a server). What we need is solid abstraction concepts to support elevated software ecosystems. “A platform for automating deployment, scaling, and operations of application containers across clusters of hosts” — Kubernetes. The early adoption of Kubernetes in 2016 to have 97% of apps running in one large Kubernetes cluster took 3 years. There is no reasons to turn back.
Kubernetes, the key element of all that we do these days, provides pretty good abstraction. -Containers and Kubernetes rely on simple principles for how a process is run in the operating system. As a result you have fine-tuned control over important application behaviours. Introducing Kubernetes in combination with breaking the monolith into microservices has proven a huge success. The number of microservices in production grows by the week (739 being the current number).
We used to manage hundreds of servers. Today we have just two dozen identical Kubernetes nodes. Managing a few identical servers with only stateless Kubernetes microservices running on them require just a fraction of the manpower required to maintain hundreds of pet VMs. Kubernetes herd itself to large degree, freeing up people to working on evolving the infrastructure services even further.
We plan to migrate our valuable Kubernetes workload to an quality managed cloud based Kubernetes service. We plan to do this using our normal deployment capabilities, as we just “start deploying to an additional Kubernetes target cluster”.
Securing a good Kubernetes runtime for FINN.no services is the number one priority. The Long term goal is the potential for our business. Professional cloud services will support us in providing our customers with stable services as we continue to grow. What we really want to get out of this journey is access to technologies, abstractions and services, and in particular room for more experimentation. We are adopting a Polycloud strategy because we prefer to use the best managed services from the best cloud providers at any given time. The current plan include using managed Kubernetes (GKE) from Google, Amazon Redshift for data warehouse and Fastly as CDN. Business needs and cloud service maturity will determine what additional services will be adopted.
Build vs buy
We want access to partners, services, and APIs for new cool technologies like image recognition, machine learning, AR, etc. We want access to capacity, to burst resource consumption in short periods of time, and the ability to duplicate test systems. We want access to new and potent technologies. In short, we look for possibilities to work differently, and more efficiently. We want more freedom. As Albert Camus said, “Freedom is nothing else but a chance to be better.’’ We think the cloud will increase our freedom to become faster and more innovative. To succeed, we need to identify cloud services and partners providing sensible cost, quality and velocity.
Freedom combined with accurate feedback provides a good foundation for growing a business. FINN is data-driven. We have millions of “time series” that track how infrastructure, applications, and customers behave. Developers combine metrics from different parts of the business to figure out how well a service behaves. As we relocate to the cloud, we want to expand our metrics to include cost details for all applications. Developers should be able to act on increased cost of operations as they normally do on metrics reporting increase in user facing error messages or how much memory an application use.
We expect to start migrating FINN.no from on-premise to the cloud in the beginning of 2020. There are hundreds of applications and terabytes of data to move, dozens of database clusters, message buses, monitoring and metrics stacks to redeploy, and new environments to implement. All using cloud native paradigms. We expect the migration process to last one year. This estimate is based on experience from moving our dev environment and on qualified guessing.
We will use the next months for preparations: What is the best way to provision our workload in the cloud, how do we improve security, how should the organization structure be, how do we link the cloud account to our mother company, last but not least: how do we collect and make cost information available to individual developer teams.
We will be writing more about how we prepare and perform FINNs journey to the cloud.