Why our heads are in the cloud - 2/3
While our feet are on-premises
In today’s episode
This is the second article in a series explaining why I Love My Local Farmer has decided to embrace the cloud. In this post, Inès Abderrahmane, CTO of I Love My Local Farmer, explains the business rationale behind this decision and highlights the expected value from the cloud. She also explains how they handled the security discussion. In the first post, Inès has described their on-premise footprint and also shared her thoughts and learning as of now. In the third article, she will disclose how the engineering team handled the discussion with the CFO.
I Love My Local Farmer is a fictional company inspired by customer interactions with AWS Solutions Architects. Any stories told in this blog are not related to a specific customer. Similarities with any real companies, people, or situations are purely coincidental. Stories in this blog represent the views of the authors and are not endorsed by AWS.
Our heads are in the cloud
A quick recap
In my previous post, I described our on-premise footprint and why we opened a second data centre in 2017 for disaster recovery rather than betting on the Cloud. I re-iterate here that I have no regret for this. We took the best decision with the skills, the knowledge and the constraints we had in 2016.
The preliminary analysis of Cloud technologies that we ran for the Disaster Recovery project opened a door. As I explained, it was appealing but we had a few questions to answer before jumping into it:
- What would be the benefits of the Cloud for us as a company?
- How would we secure our customer data in the Cloud?
- How would we manage our IT budget in the Cloud?
In the rest of the post, I cover the two first questions and answer the third in the last post of the series.
Speed up! That’s what our board repeats
In early 2010’s, we were nearly alone on our market segment. The demand for traceability of fresh and healthy produce was still emerging. Short food supply chains was not yet a hot topic. But during the 2010’s, the awareness of the climate crisis had raised very quickly and consumer habits started to shift. More and more people wanted to know what exactly they were eating, where it was produced, and how. We benefited by being quick to move and this initiative supported our growth.
However, in the second part of the 2010s, traditional retailers spotted this new consumer interest and muscled their offers. Newcomers entered the market with new ideas and new offers. Now, the competition is active and we have to sharpen our saw. As we were already a fully online business, we considered ourselves well armed but, when the board asked us to accelerate, we realized that we couldn’t. We had a six-month minimum roll-out timeline per significant project making it hard for the business to react to unexpected market changes (and COVID-19 has been a major market changer).
In my previous post, I presented our engineering team and highlighted some of our challenges. These challenges were the root cause of our slowness. Let me summarize here:
- Due to the lack of unit tests and integration tests, our releases were low quality with a lot of back-and-forth between development and testing teams
- Due to a lack of automation, reproducing deployments from development environments in test and production environments was tedious
- We did a lot of manual end-to-end testing
- We were overwhelmed by the heavy lifting tasks of patching and upgrading our servers whilst keeping the lights on
- We struggled to anticipate hardware and software failures
You may have spotted a pattern here: too many manual tasks and a lack of automation. To improve our practices, we were investigating DevOps cultural philosophies, practices and tools. While we were already practicing Continuous Integration, Monitoring and Logging, we identified that we were lagging behind best practices in this area. We also identified that we should invest in Infrastructure as Code (IaC) and Continuous Delivery (CI/CD) practices. Infrastructure as Code would help us to standardize our environments, to track configuration changes and to deploy environments in repeatable and automated ways. Continuous Delivery practices would help us to automatically build, test and prepare each code change for a release to production.
If the cloud could benefits us as company, it would definitely be in becoming more agile. Our board was especially interested by the numbers highlighted in the 2018 IDC Report on Creating Business Value with AWS :
- 62% more efficient IT infrastructure staff
- Nearly 3x more new features delivered
- 25% more productive application development teams
- 94% less time lost due to unplanned downtime
So, I decided to appoint a small team of cloud-enthusiastic people to investigate that we called the “A-team”. I mandated them with full support of other teams.
- the board asked us to speed-up our release cadence
- we were stuck with manual tasks and heavy lifting tasks
Multi-cloud or not multi-cloud? That is the question!
First, let’s evaluate this trendy question. As I already mentioned in the first post of the series, procurement asked us to consider multiple cloud providers so that we could get the best prices. I know that having one provider could be risky, and part of my role is to mitigate risks. However, cloud provider offers are large and complex and I had finite resources to spend on this investigation.
I had also to consider the cost of training my people on multiple cloud platforms. It would induce a lot of complexity like when you try to learn multiple foreign languages at the same time. I could choose to have dedicated team for each cloud provider. It would be easier. However it is a luxury I cannot afford. I didn’t want neither to ask my teams to learn two or more cloud providers. One is complex enough.
So, I decided to focus only on one cloud provider for now. I do not want to fail on our journey to the cloud because of complexity. We will learn and eventually I would change my decision if needed. Our decision criteria was as follows:
- Security oriented — we have to protect the data of our customers, and we expect the cloud provider to protect ours.
- Reliability — as we have been scalded by our Paris data centre outage, we don’t want to be tolerant to failure.
From our perspective, AWS is the cloud provider that best fulfils these two requirements. They now have over 200 services. They put security at the forefront of their priorities, and cloud computing experts like Zeus Kerravala stated that their downtimes are smaller than other cloud providers.
- to keep complexity low, we decided to use only one cloud provider
- we choose AWS for its focus on security and its reliability
What’s in it for us?
The A-team found that DevOps practices and tooling are deeply integrated into AWS offers. Regarding Infrastructure as Code, with AWS CloudFormation we could describe our infrastructure resources in a JSON or YAML file, and deploy them consistently in different AWS Accounts or AWS Regions. As our development teams are not comfortable with large descriptive JSON of YAML files, they could use the AWS Cloud Development Kit to define their infrastructure resources with different programming languages and still deploy them through the AWS CloudFormation service. We could also use third party tools like Terraform. Terraform could also help us to improve our practices now in our data centres. However, it rang a question in my head: as I want to encourage DevOps practices and especially automation and infrastructure as code, should I invest in Terraform to encourage these practices both on premise and in the cloud? I think the answer to this question depends on how long you plan to stay on premise and, if you plan a full migration or to operate both on premise and in the cloud? Terraform is definitely a great choice. In our case, as we are mainly a developer shop, we choose to leverage our Java development skills and to concentrate on CDK for Java.
Regarding the Continuous Delivery, my team found that AWS is well supported by well-know Continuous Delivery tools. For instance, they found strong support of AWS within Jenkins plugins, GitLab, Octopus Deploy, GitHub Actions or Azure DevOps. But the sweet honey pot is that they also provide built-in managed Continuous Delivery services: AWS CodeDeploy and AWS CodePipeline. The promise is a fully automated pipeline from the code change commit to the deployment in production.
The A-team also found some good insights on distributed application monitoring and the challenges of centralizing and analyzing logs in a system which involved more and more components. Being a former developer and still a tech enthusiast, one service particularly gained my attention. AWS X-Ray would allows us to track a request through all the services triggered to gain visibility on what is going on. It could bring us an additional level of details on top of our current Nagios based monitoring solution.
From an engineering perspective, we found evidence that the cloud offers services that could help us to improve our practices. It could speed our development process and bring more agility to the company. Finally, these services are managed so no need to install new tools neither operate them.
- DevOps practices and tooling are deeply integrated into AWS offers
- AWS offers managed services for IaC and CI/CD
- X-Ray offers distributed tracing capabilities that can help us improve the understanding of our applications
Who is responsible for securing the data?
The A-team also took the time to investigate Qiao Lǐ‘s (our CISO) questions about data security. On-premises, data security was about building walls between our production network, our corporate network and the internet. Data inside our production network were considered secured since only our employees could access the hardware. Moreover, we used separate hardware for production and the rest of our usage: development, test or corporate IT system. The walls we built were so high that even in the case of production incident, multiple people would have been involved to allow developers to access to the application logs. It was clearly slowing down our investigations. For personal and sensitive data , we encrypted them using keys. However encryption was the exception and not the standard. You may see the flaw here: we weren’t handling the insider threat. So, we had the warm feeling we were secure while we were in fact not. At best, we satisfied compliance.
We realized this flaw when we understood how to handle data security in the cloud. In the cloud, we don’t manage the hardware so how could we secure our data? This is where the “AWS Shared Responsibility Model” comes into play. AWS is responsible for the the security of the cloud, while we are responsible for the security in the cloud. OK, nice sentence but what does it mean? Basically, AWS is responsible for protecting the infrastructure that runs all of the services they offer: hardware, software, networking and facilities, and only this. So, we hold the airplane control wheel for all the rest: networking traffic, server-side encryption, client-side data, operating system configuration and update, network and firewall configuration, platform, applications, identity and access management and of course customer data. One thing that does not change between on-premises and the cloud is that we have to carefully protect our applications. If someone successfully executes an OWASP top 10 attack (SQL injection, XSS, XSRF, etc.), the attacker will gain access to the data the application has access to.
Thankfully we are not left alone there. AWS provides a lot of best practices and security considerations for each service. The most useful resource for us on a daily basis has been the Security Pillar of the AWS Well-Architected Framework. This framework has been crafted from years of experience accumulated by AWS Solutions Architects while helping AWS Customers designing and reviewing their architectures. Qiao also loved the provided Quick Start for PCI DSS compliance. It helps to quickly figure out what could be a PCI DSS compliant architecture on AWS.
- AWS is responsible for the security of the cloud
- We are responsible for the security in the cloud
- AWS provides security best practices and Quick Start for PCI DSS compliance
We had an issue with our velocity. Our board was not happy and we desperately needed to improve. I have presented the IDC studies to the board. Leveraging the cloud was the answer to our needs. We, as a company, have decided to act.
I made very clear to everyone that we will start with one cloud provider only! It will keep the complexity of the transformation as low as possible. Data security and reliability are our main priorities. Based on available public reports, AWS is a top performer in those two areas. The deep integration of DevOps practices and services into their platform confirmed AWS as our first choice.
When COVID-19 pandemic hit us, we were discussing if we were ready or not. Obviously, the crisis has changed the landscape and the cloud turned out to be the only viable option to react quickly. Working From Home and Delivery projects saved our business.
Based on the success of those two projects our goal is now to reduce time-to-market timeline from half a year to 2 months. We want to prove that we are able to achieve the IDC outcomes.
Stay tuned for my next post where I will explain how we have won the buy in from our CFO.