Don’t Let a Dot Com Tell You How to Enterprise
One trend I’d love to bury with 2020 is strategy shaming.
I’ve seen speakers ridicule the question: “but what if we deploy without containers?” My question to those speakers is: How can you give unwarranted guidance or constraints to a solution without understanding the use case or environment?
The truth is some applications are still not a fit for containers or microservices. Some applications aren’t even fit for virtualization. Some people forget that not all workloads use HTTP(S) or REST APIs. I’ve seen an entire team of developers at a banking organization dedicated to an application used by just a few high roller customers. Such niche apps might not benefit from auto-scaling groups or serverless architecture.
For some low-latency organizations, an HTTP retry or TCP retransmission means it’s already too late. How do you really decide between monolith and distributed solutions? Do you need global scale-out resiliency for parallel tasks or do you need nanosecond performance for a serial operation?
It’s important to respect the different requirements between white collar (enterprise) and no collar (startup) environments. Note these are my personal opinions based on my own experiences, and not official opinions of HashiCorp.
The Tao of HashiCorp
At HashiCorp we actually have a simple set of guiding principals. This isn’t a suggestion to the industry, it’s our own guidelines established during our own journey. A lot of industry specialists like to classify HashiCorp as a cloud company but the truth is that the word “cloud” isn’t mentioned once in our tao.
Our tao is used to determine where our own efforts should be focused and often involves community input. In fact, I would say the biggest unwritten clause in our tao is community input. A lot of our development and roadmap comes directly from customer requests, including occasional new products. This includes our new Boundary and Waypoint projects, which were direct results of community requests. We don’t push tools or solutions that contradict needs or increase complexity, hence the emphasis on pragmatism.
The Tao of HashiCorp:
- Workflows, not technology
- Simple, Modular, Composable
- Communicating Sequential Processes
- Versioning through Codification
- Automation through Codification
- Resilient Systems
I like to think that these general principles apply to both startups and enterprises. There are a few that enterprises struggle with adopting though.
The biggest change in my experience is 4: Immutability. Disposable infrastructure is the universal helpdesk solution of “turn it off and turn it back on.” Volatile state in a system can often break things. Resetting a computer resets any memory leaks or runtime config that end up being root causes for most issues. Infrastructure is reset by trashing it and rebuilding it. This is another reason containers are designed to be stateless and quick to start up.
Unfortunately in a heavily regulated world where logs and state have retention rules this isn’t always easy. Understand that you can’t always release an “Agile” bull into an enterprise china shop.
Deploy Early, Deploy Often
Speed doesn’t always equal success. It’s nice to be able to deploy rapidly if you need to, but don’t let speed be a goal above safety. I can deploy a broken app 100 different ways but I’d rather deploy once successfully than deploy 100 times broken.
Chaos engineering is the growing practice of having self-inflicted, randomized failures triggered around your infrastructure to constantly test its resilience. That’s all well and good for social media or streaming services where users are buffering media minutes ahead, but it’s not good enough for a financial institution or marketplace that needs millisecond precision.
When you are sure you need to deploy you should be able to do so with automation as quickly as possible whether you’re a startup or an enterprise. The important part is testing or quality assurance on a release. If you’re looking for a non-opinionated multi-platform release tool HashiCorp has open sourced and released a project called Waypoint free of charge.
What common practices and team strategies are good for Enterprise and Startups alike?
There’s a loaded phrase echoed around the DevOps scene for a few years now that makes me twitch a little every time I hear it.
Every company is a software company.
Generic statements like this are unhelpful. Most organizations don’t have the top priority of being a software company. Cloud reinforces this as they often don’t want to manage their own hardware either. The goal is to keep their users and customers happy and a requirement of that is to have the most efficient software and infrastructure available to support their customers. Never bury the business goals of an organization in software requirements or you risk taking your eyes off the prize. I don’t believe in mandates without understanding an organization’s requirements and use case but there are some general recommendations that I think fit well for any digital transformation or greenfield startup.
Infrastructure as Code
First and foremost, having all of your infrastructure defined as code is a worthwhile goal for anybody. Your IaC profile becomes the DNA of your organization. DevOps is the collaborative gene therapy of that DNA. If a fire or natural disaster wipes out a building, datacenter, colo, or cloud az, your IaC profile can be used to clone the digital skeleton for your organization in a hurry.
Version control captures a full audit trail and supports rollback just like it does for application developers. Note that IaC does not include your data, which must still be backed up with a proper and tested DR plan. Whether you’re PXE booting stateless hardware or deploying thousands of VMs a day, consistency and accountability are much simpler with a collaborative code base.
Terraform is HashiCorp’s answer to this. Crucially too many people blur the lines between infrastructure application platforms. Terraform is great for managing lift and shift or cloud native workloads, but it’s not the most efficient for deploying individual applications.
For that we have orchestrator platforms like Nomad and Kubernetes. Going back 5 years the industry had a solid line between IaaS and PaaS, or Infrastructure as a Service and Platform as a Service. Infrastructure deals with machines, VMs, and dependent config for your applications. Platform deals with actually running your applications and ideally abstracts everything beneath it. It’s most efficient to carry that distinction with Terraform and leave the platform to orchestrators.
Use consultants as consultants. Use contractors as contractors. Do not mix up their purpose. I spent 4 years consulting and it was some of the best life experience I could ask for. Jumping from one challenge to the next not only builds up a consultant’s skills but also helps spread tips and tricks of the trade — though nothing in the form of sensitive IP. If you bring in a consultant for a year with no SOW or goals you’re using them incorrectly.
Get a contractor who is happy to be staff augmentation. Partners and SIs can always make good recommendations but make sure you know how many middlemen there are involved. I have been assigned really lucrative engagements where I found myself to be 4 layers of resellers and subcontractors away from the customer, which isn’t in anybody’s interest and isn’t a good value to you as a client. Use trusted partners where you can. My job is enabling HashiCorp’s partners so that you can trust them to manage our stack. I’m biased but it’s the most fun job at HashiCorp.
I think it’s brilliant to bring in new talent with a bounty incentive program. It helps to get a new perspective from a fresh hire by letting them prod your security practices and perform routine pen tests. I’ve spent a lot of time at large institutions around and I’ve been shocked to see a range of security strategies.
While most lean toward overprotective, I’ve seen some actually too open and I’ve stumbled into a team’s entire confidential FX code base by accident. Your team should always report problems with surprise access. Security is especially important with most of us working remotely during COVID lockdowns.
Security is a universal need no matter where you sit on the cloud spectrum. HashiCorp’s Cloud Operating Model has an implicit zero trust approach to environments. Rather than lock & key cabling, you can assume that any packet may cross public lines or wifi and thus must be fully encrypted end to end. With this mindset, it’s odd to look back and wonder why this wasn’t the case even with secure cabling. Everything that crosses a trust barrier must be secure even if it’s just on a loopback interface between users.
We’re getting to the point where we see any static credentials as vulnerable as old telnet services. Even SSH keys have no expiration, hence the push towards SSH signed certificates and mTLS for all service mesh communications. If you’re still using helpdesk tickets to have someone create your credentials, you’re walking a tightrope of vulnerability. Your team may be perfectly trustworthy but what will forensics say during a postmortem when a DBA created an account and handed it to 5 of the application developers? Secrets between 6 people aren’t secrets whether you’re in the cloud or in a small business half rack.
The perpetual helpdesk omnisolution: turn it off and turn it back on. Disposable infrastructure is the same concept scaled up. If your entire estate is broken in a massive outage but you have your data set aside, you can turn it off and on again by destroying and rebuilding your infrastructure. A key to this ability is making sure your data is separate from your infrastructure and all of your infrastructure is represented as code. This also tends to make it easier to scale out, since your infrastructure and applications are isolated from your data.
Sometimes application teams insist on running the latest generation hardware. They should be able to justify this spend whether it’s a CAPEX purchase or a cloud budget increase. Can the application be scaled horizontally? Early internet search engines managed to scan terabytes of index in a few milliseconds using a rack of early gen Pentium boxes.
What’s your app’s excuse for needing the latest hardware? What about instance size? I’ve seen customers paying hundreds of thousands of dollars for commercial licenses on 20 core servers complain about their application performance. A simple profile shows their database going serial queries that utilize only a single CPU core. Why pay for leading edge hardware and software licensing per core if the application can’t even use it?
One of the most common Terraform Sentinel policies is the instance size limitation. Make sure development environments don’t use expensive high performance instances or VMs. Develop on slow cost effective environments to code efficiently. Hardware engineers have spoiled today’s developers with solid state storage and fast networks.
The ultimate decision when dealing with open source ISVs is whether you get value out of buying Enterprise support. In this regard I fully admit my own bias, but I’ve also seen organizations that let open source estates start in a small lab and grow to run critical systems without even realizing it.
In the cases of HashiCorp products, a failure in open source deployments is a point of no return. Most pertinent is the replication of Vault, where a seal or a set of master keys is accidentally wiped out by a careless Terraform destroy. There is no way to retrieve those keys if they are lost, and if you don’t have an Enterprise replica with a separate seal, your Vault is effectively lost upon restart.
In other cases the primary use case involves audit trails for compliance or forensics. A streaming company may not care if an environment fails — they might just reset it and move on. A bank fudging a large financial transaction will need to know exactly what happened. If an error is caused by the product, they will need access to support for answers or fixes.
SaaS solutions present simpler value. Terraform Cloud saves you the trouble of maintaining local Terraform Enterprise or building your own pipelines and solutions which sometimes cost more in the long run and are never safe from coverups. If your team builds a platform around open source tools, they can sure as heck cover their tracks in logs when they mess up. A fellow Red Hatter once put it to me simply: OSS ISVs are not software companies, they are sales companies. Then another Red Hatter went even further, saying OSS ISVs aren’t sales companies, they’re insurance companies. Enterprise agreements establish a risk agreement that can really save you in a bind.
Retention and Churn
There’s no sugar coating it but cloud migrations are not sexy. Nobody ever ran a reality TV show about digital transformation. There is a lot of churn in this industry so it’s important to hire and retain core talent by keeping the job interesting. That may feel like entitlement. Startups have a lot more freedom in this sense and try to make themselves attractive to talent even if they don’t pay as well. For my own perspective, I like to work with folks that have blue collar experience and respect their roots.
The good news is it’s not difficult to keep your workplace happy even when you’re in a formal Enterprise setting. Startups may have nerf fights or casual everyday, but Enterprise can do helpful things like host community user groups or events. HashiCorp’s own HUGs are available around the globe and I look forward to when they resume in-person after COVID settles down. In the meantime you can reward teams for things like bug bounties (see above) and my favorite: have developer events where you challenge a team to find who can best optimize your application to run on old infrastructure. See the 1 Million Container Challenge we run with Nomad.
If your team has the bandwidth offer them some outside charity work. Staring at a screen for the entirety of 2020 has most folks itchy to get out. Donate some work time to cleaning up the local area or volunteering with an org if you can. HashiCorp has charity programs as well as philanthropy funding programs and our founders have their own initiatives too which boost morale.
These guidelines may seem obvious, but now go back through the list again. Have you checked your IaC for secrets? Do a git search or recursive grep for “password” on all of your repos. Have you ever actually grepped your home directory for your personal password? You might be surprised what you find. Be sure to clear your shell history afterwards. Can your app perform just fine on 15 year old hardware? Why or why not? The questions are worth asking.
You might cry hypocrisy, as I suggest that Enterprise shouldn’t listen to startups, but I merely want to point out that nobody should suggest how an organization outside of their industry functions. I’ve shared with you how some of my own observations in the industry aren’t necessarily a universal fix. I can’t honestly advise a startup to operate the same way an Enterprise does, and I can’t prescribe startup models to an Enterprise. What I can do is share where I see overlap, and there is a bit worth noting.