Everbridge’s SaaS Ops and Engineering teams partner to move their life-saving products towards reliable automation
While many technical teams are tasked with ensuring their products are running around the clock, few have the responsibility of a product that has the potential to save lives. Everbridge’s SaaS Ops and Engineering teams discuss the demands of a “truly 24/7 shop”, the mission that motivates them, and the interesting challenges of an ever-evolving tech stack. For this story, we spoke to David Baker (Principal Data Architect), Clare Holley (Senior Database Reliability Engineer), Steven Ingles (Senior Quality Engineer), and Bill Slack (Engineering Manager). Interested in joining the team? Check out open roles or get in touch at RecruitingTeam@everbridge.com.
What does Everbridge do and why does it matter?
David: We’re a critical communications and critical events management platform. Our customers use us to send emergency notifications in situations like natural disasters, active shooters, and Amber Alerts. Lots of companies say they’re a 24/7 shop because they have a website or an app, but if those systems go down, they’re not putting people’s lives at risk. Those are the real stakes we deal with, but that’s one of the reasons I like working here. Everbridge doesn’t just make rich people richer. I could earn more at a bank, but this is much more satisfying.
Bill: I love knowing our technology saves lives. And it has applications beyond public safety too. Right now, we’re expanding into day-to-day critical events like IT service management. For example, we make it easy to notify people instantly if a company’s systems go down. Everbridge does have competitors in that space, but we can do it more reliably, and on a much larger scale.
Why did you decide to join the team?
Steven: It was a pretty simple decision for me. I chose this company because of the mission. Making sure people stay safe and get the information they need, when they need it — that’s always been important to me. I think Everbridge fills a real need in the world.
“What we’re doing has meaning, and you see that immediately.” — Clare
Clare: What we’re doing has meaning, and you see that immediately. Another reason for me was the technologies we use. Everbridge is a heterogeneous environment; as a DRE, or database reliability engineer, you’re supporting several databases, not just one. I’ve always liked that, because it helps me build new skills. You don’t get bored here. There’s always something to pique your interest.
There’s also a lot of flexibility, which was important to me. A lot of the work DREs do is at night, so it’s nice that we can work remotely. We use GoToMeeting to stay in touch. And I’m only a few hours from the Boston office, so I still have opportunities to visit there, or even the Pasadena office.
David: I was bored at my previous job, to be honest — although I didn’t realize exactly how bored I’d been until I got here. I walked away from a lot to join Everbridge, and I was nervous about that at first. But looking back, it was one of the best decisions I’ve ever made. I worked for a big corporation where a lot of people were just collecting a paycheck and waiting for retirement. I don’t know a single person like that here. Everyone is engaged. We’re all trying to move the company forward.
“We get to work directly with customers. That’s a big difference compared to working at a huge company. I really enjoy helping them solve new problems.” — Bill
Bill: I came from a company 10 times the size of Everbridge, but it was growing very slowly. It had actually been a startup before, and I missed the rapid change — all the new products rolling out and all the excitement that brings. That’s why I came here.
I also like that we get to work directly with customers. That’s a big difference compared to a huge company like Facebook, where you have two billion customers but you’re not interacting with any of them. The Everbridge customer base is constantly expanding, and I really enjoy helping them solve new problems.
How do SaaS Ops and Eng divide responsibilities?
Bill: We’ve had lots of conversations about what falls under which team. That’s still evolving, but I’ll tell you what we’re working towards: Engineering is responsible for delivering high-quality software, and SaaS Ops is responsible for the infrastructure on which that software runs. Some pieces of code are deployed on the Ops side, so there is some overlap. That’s where we have to collaborate to make sure our tool sets work together. If we do the work to make it reliable, we can deploy with the push of a button. No getting on the phone together at 1:00 a.m. to do upgrades.
David: We all think about both sides at the same time, but because we’re not shy about throwing new features into the product, we want to be sure we’re choosing the right stack. The primary concern of the Engineering team is fulfilling the product’s needs, while Ops is focused on operability.
How do the teams work together?
Clare: We’re partners. We’re coming together more now, as opposed to one team running ahead with an idea and then just throwing it back at the other for support. For example, we were working on the implementation of an application recently, and Engineering’s original plan was to use Elasticsearch. But we had a roundtable discussion with both teams, and we realized Elasticsearch might not be the best option, so we ended up going with a widely-used relational database management system.
Bill: Or if we’re creating a load balancer, the developers will know what they need functionally, but Ops has to be in the loop too. Ops will understand how to design and implement what is needed to comply with our stringent security and data privacy requirements, as well as capacity and performance. That’s where the collaboration comes in.
Steven: As a QA engineer, working with the folks in SaaS Ops has been great. They support some of our QA environments and the surrounding infrastructure, so that relationship is really important. I joined a few months ago, and I’ve been impressed by how fast and responsive Ops is. They’ll drop everything to help out with something I need or listen to my concerns. And the reverse is true too. When Ops needs help implementing new features, we always stop and make sure they have the right information.
David: Yeah, compared to some of the other publicly traded companies I’ve worked for, the wall between Ops and Engineering here is much thinner. We don’t always link up as much as we’d like, but that’s because of time constraints, not unwillingness. The developers we support work very, very hard to give us software that doesn’t suck. That’s a big deal for us in Ops, because if lousy software makes it to production, that makes everyone’s life miserable.
Since DevOps has become standard practice here, the Ops team has changed a lot. We were understaffed when I first came on board, and people were siloed to specific projects. It felt like we were constantly saying no. Now we have more skilled people who can partner with Engineering. We can sit down, challenge architecture, and be helpful early on.
What’s exciting about the technology you’re using right now?
David: We’re using a lot of the newer, community-supported platforms and cool tools like noSQL database technologies. Some of theses platforms, we didn’t have a clue about a year ago, and now we love how well it works for us. We don’t use any legacy software and we plan to continue down that road. Clare is working on one of our most exciting projects, implementing our new noSQL databases with a cloud services provider, which is part of a broader movement toward automation.
Clare: This project will move us off data centers and into a platform that auto scales, which will help us maximize availability. By moving to it, we can keep operating even if a region goes down, and our customers will never notice a difference.
Bill: Engineering is completely on board with the move toward automation. We want to hire people who can help us with projects similar to Claire’s. I think we’re seeing a commitment from all sides to more modern platforms. We’re moving toward microservices that are highly scalable and resilient, so that we can scale up quickly when we experience growth spells. If a new customer comes on board with hundreds of thousands of users, for example, that can spike traffic. Writing software that’s easy to implement and can handle those loads is no small thing, but it’s interesting work.
“We use different sets of tools for every project, and share our results and experiences across teams. We’re learning from each other as we go.” — Steven
Steven: People in QA have been very open to changing the tools we use and trying new things, and we use a slightly different set of tools for every project. I like having that flexibility. We also share our results and experiences across teams, so we’re learning from each other as we go.
Everbridge often uses technologies I’m familiar with, but uses them in complex ways I hadn’t considered before. It took time to get up to speed, but now that I’m more familiar, I’ve been able to contribute to the testing and to introduce some automation within the product I work on.
Tell us about the automation success you’ve had so far.
Clare: I think one of the biggest wins was a notification engine that we moved from a legacy system to the new noSQL database technology. We were able to spin up multiple services so if one went down, we could keep broadcasting. It’s allowed us to reach a lot of different users and communities compared to what we were doing in the legacy environment.
David: Capacity on demand comes to mind for me. That’s something we couldn’t do a couple of years ago, but during Hurricane Irma this fall, we were able to double our capacity by pushing a couple of buttons. Then we easily pared it down using a neat set of infrastructure as code and automation tools we’ve implemented when we no longer had the need.
Bill: We also found a great piece of software for configuration management, as well as for service discovery and microservices. Say I have a service called “Messages.” Instead of needing to know it runs on these 5 or 10 nodes, you can just spin up the machine and it knows to talk to its console master. Then when I connect, all I have to say is “I want to talk to Messages” and it automatically routes me there. It makes implementations and customizations much easier. We could spin stuff up ourselves and that would be interesting, but do you really want to run a 24/7 piece of critical events software on something you wrote in a couple of days? With this, we’re building on the shoulders of giants.
What makes someone successful on your teams?
David: You need to be comfortable with change. In Ops, what you’re working on changes from week to week. You can’t just build something and stare at it lovingly, because it might get torn to shreds a few weeks later. I also believe in that old saying, “the perfect is the enemy of the good.” By no means do you try to squeak by, but when you do a good job and you’re done, you move on to what’s next.
I’d also say you need to be comfortable invoking, writing, and maintaining code. Candidates are sometimes confused about how we define the site reliability engineer and DRE roles. They’re different from a traditional sys admin or DBA position, and we’re looking for people who are excited about that, and want to own the full stack and use automation.
Bill: Lots of different personalities do well here, as long as you communicate with your team and try to understand what other people are doing. You definitely do need to be comfortable with change. There’s a lot going on at once, and any two given teams might approach things a little differently. But the goal is to eventually choose the best option and move forward together. That’s how we continue to improve.