Working at Amazon: Software Engineer
Just one person’s reflections on what Amazon does well
I am a Frontend Engineer at AWS CloudFormation since July 2017. This has been my first and only job in software engineering and I thought that maybe I would write down some of thoughts about my experience thus far. Having friends in other teams within Amazon and other companies has given me some insight into some of the more unique traits about working at Amazon. And this probably needs to be said — my thoughts and opinions here are my own and in no way am I trying to represent Amazon.
The Leadership Principles
The Leadership Principles at Amazon have been one of the most influential tenets of corporate culture in the past few years. Numerous companies have adopted the same principles (with their own color) hoping to replicate the same kind of success that Amazon has achieved since its founding.
Consider just how rare it is for a company to expand its core business in so many ways with great success in just 25 years! Think: Since its founding, Amazon transitioned from an online bookstore to being an “everything” store. They practically invented cloud computing as a core business, redefined customer service, and even ventured into devices with unique technologies like Alexa, Fire, Kindle, etc. Amazon’s business model and thinking (customers over profit) has even overturned traditional business principles. Now consider the numerous companies that have died out because they were unable to pivot or expand their core business effectively? Think: Blockbuster, Polaroid, Toys “R” Us, etc.
Despite what some say, the difference was not talent. Blue-chip companies had no shortage of talent, but a small tiny company fronting as an online bookstore in the mid-90s was able to propel itself against the retail giants of its time. Even against the major tech companies of the time, Apple and Microsoft, Amazon Web Services was able to take a lion’s share of the cloud computing market share within the past decade.
Now perhaps you are thinking that I’ve just about drowned myself in the Amazon kool-aid since I’ve been working here and you may be right; however, I’ve seen the way Amazon has operated, even as an entry-level engineer, and these leadership principles are not just mantras people parrot around meetings — they are the foundation, the very basis upon which Amazon thinks and builds all its products. Anyone who has seen this company grow can tell you that it is not a surprise that Amazon has been successful since its founding. And if Amazon continues down the road as a Day 1 company (as opposed to a Day 2 company), it will continue to be successful in the near future.
All the sections below are simply the ways in which those Leadership Principles manifests itself in my job as a software engineer. I won’t be going down the list comprehensively or in any organized fashion, but you will see how they inform our practices and thinking below.
Let’s first talk about teams and how despite being such a large company and the kind of bureaucracy that comes with the size, Amazon is able to innovate at speed like much smaller startups.
Most Amazon organizations are huge (as you can imagine). However, within each organization are many sub-organizations, and within each sub-organization are many teams, so on and so forth. The team that I am a part of is made up of just 4 engineers with a manager (the manager also oversees other smaller teams within his/her product domain). We have many adjacent sister teams that work in relative close collaboration with our team. Each team has a lean working environment. We are relatively autonomous in our decision-making process because reacting quickly to customer issues and asks are a top priority for any team at Amazon. This means it is very rare that some senior executive conceives of an idea in a vacuum and entire teams would be dedicated to that effort. Managerial whims make for poor decisions. Instead, each engineer is empowered to drive their product the way customers want.
Likewise, culture is driven at the individual level. Whether this means how the team handles testing, operational work, social outings, and the like, there is no single “This is how Amazon does X.” It differs across even smaller sub-teams within the same umbrella team. Some teams work from home on certain days and some have regular food and snacks on Fridays. The point is, it is very hard to generalize Amazon or Amazon Web Services in any particular way as it pertains to culture. Some teams have more operational work, some have more new feature work, some engineers find themselves working a lot of hours, some barely clock in 35+, it all depends.
The reason why the Leadership Principles apply to everyone, even those positions not typically seen as “leaders,” is because we are all expected to lead. That means the company has to empower its employees to develop the culture of the team they belong to. Again, the company should provide the structure and means for smooth and accurate decision-making processes and Amazon does a pretty good job at that.
Start with the customer, work backwards
A well-known distinguishing feature of Amazon is its intense focus on customer obsession. If we can imagine the Leadership Principles as different components of a tree, customer obsession would be the trunk from which all the other principles branch out from. Without the customer, Amazon is a headless chicken. We like to say, “Start with the customer, then work backwards.” Questions like “Does it make sense for our users?” and “What value does it bring for our customers?” drive our thinking. To back this point, our users have multiple channels to reach us — through our product managers, our console feedback system, the social media channels (e.g., Reddit, Twitter, Slack, etc.), etc. We read the feedback, group similar feedback, and prioritize it in our work. Sometimes we miss the point, that’s why we are constantly iterating on feedback — over and over again.
New features often start with the question, “What do our users want?” or, “What change would alleviate the pain of our users?” Our product managers, who literally sits less than 10 feet from our desks, exist to answer these questions. They sit in our design meetings, help test our features, give input in wording, and imagine (or re-imagine) better customer experiences. Furthermore, from a feature’s conception to its marketing and eventual release, you can bet that not only the PMs are involved, but the engineers are as well. We are all stakeholders in this symbiotic relationship with our customers. Software engineers aren’t mere bystanders but active participants in managing customer expectations. I regularly read customer feedback, review product marketing drafts for new features, and provide input as it relates to improving customer experience to our backend service and even our public documentation. From operational excellence (discussed below) to developing new features, “customer service” is the hat we all wear at Amazon.
Ownership and Meaningful Work
I’m certain that for anyone, having ownership over their work is an important factor of job satisfaction. Take away ownership, and you deprive motivation. As such, software engineers at Amazon are tasked to own their respective services. It is rarely “not my concern” or “not my job.” As a software engineer, I don’t simply own a small, isolated segment of a codebase. I own the entire experience of our service. This also means that my work cannot be evaluated in isolation. The highs and lows of our product affect our team in tangible ways. Some people prefer owning and maintaining their own “space”; however, that’s not how it works here from my experience. In return, you get the satisfaction of having your name tied to the features you deliver to customers and you become the subject matter expert.
One particular illustration of this ownership principle is that operational work is part of the job description of Amazon software engineers. Amazon does not hire a separate dev-ops team to test, deploy, and support our product. Software engineers are expected to do all that on top of their “coding.” Engineers are generally part of a team’s on-call rotation throughout the year, a 24/7 support system that escalates high-severity (read: customer-impacting) tickets. Managers are no exception to this rule. Tickets often involve many engineers and managers participating in its resolution. The rationale behind software engineers owning even operational support is an extension of the ownership principle: we cannot simply “punt” the problems engineers create to a separate team when things go awry. Think about it — if there are bugs in the application, root causing the issue is best left in the hands that wrote it. Additionally, it heightens coding and testing standards to ensure things don’t break apart when it reaches production. Responsibility is a two-way street. An engineer cannot push code to production, reap the satisfaction of doing just that, and avoid the consequences for the issues that might arise. Admittedly this might not be everyone’s cup of tea; however, it provides invaluable insight into advanced distributed systems and the trade-offs involved in architecture (even entry-level software engineers are often asked a system design question during the onsite interview loops). And while this might be one of the least popular aspects of working at Amazon, the rationale is clean and simple — you own the product from its implementation to its maintenance.
Another component of job satisfaction is that the work must feel meaningful. In my experience, my work has high customer impact. As a front-end engineer that owns an AWS service console, my job isn’t to just add styling to a widget, it is to imagine a better experience for our users and bring it to life. I have personally worked on or reviewed a large percentage of our codebase and have designed and implemented entire features on my own during my tenure. Again, teams are lean. A small team of engineers (or in my case, just one) could be tasked to implement entire features by themselves. This allows us to have meaningful projects that not only add to your portfolio but also to the kind of learning that only comes from having your hands deep in the product lifecycle.
There is often a perception that working in a large company will make you feel like a cog in a machine. While this is the case at any large company, I suspect the problem of being a “cog” isn’t so much the fact that there is just one of you among a sea of people; rather, it is the fear that being one among many will reduce ownership and meaningful work to go around. As mentioned above, Amazon does many things to ensure you have proper ownership and meaningful work. And if the work does not align with your goals, there are literally hundreds of teams across Amazon that welcomes internal transfers. Many people have been in multiple product groups to find their interest — it is no small benefit having the mobility to try out cloud computing at AWS, or Alexa, or retail, or VR/AR, or our numerous machine learning teams. It is no exaggeration to say that a significant portion of the interview process across all roles at Amazon is based on the Leadership Principles — even for software engineers (I would say about 40–60% of the interview weight is based on answers to behavioral questions). Because of this “leadership” hiring bar at Amazon, internal transfers are confidently made upon these shared principles, which are of primary concern, despite potential differences in things like relevant technological experiences, which are secondary.
Any proposal at Amazon might start with the question, “What does the customer want?”; however, the quick follow-up has to be, “What does the data say?” Whether the data is direct customer feedback and/or click analytics, decisions at Amazon must be driven by sufficient data. If the saying goes, “Measure twice, cut once,” Amazon says, “Measure ten times, cut once.”
Documentation is a significant part of any good engineering team but I suspect many companies do it out of obligation, not as a centerpiece of their operations. Amazon loves writing. The six-pagers are an integral component of business decision-making. Why’s that? Consider: “Writing is nature’s way of letting you know how sloppy your thinking is.” When people are tasked to persuade through writing, they are forced to reckon with the data, the source material, to prove their point. Blind experimentation is not a thing at Amazon. Products or new features do not take off without data. If there is insufficient data, find more data points. At Amazon, you are expected to employ critical thinking abilities to interpret the data objectively and/or ask the right questions that would lead to the data necessary to make the right call.
Related to this is using data for operations work. Every week, our team’s on-calls has a meeting reviewing all the graphs containing hundreds of thousands of data points from our service. We discuss every spike or dip to ensure there is no issue. Again, data drives our thinking. Merely guessing at the root cause of an issue is like shooting in the dark. When we say, “What do the graphs look like?” We are starting with the data and working backwards.
For an insightful case study into this line of thinking, check out this Twitter thread from Andrew Certain, a 20+ year veteran Senior Principal Engineer at Amazon:
Any resilient infrastructure should have high failure tolerance. For example, your website might be hosted by multiple servers across multiple regions so that if one region goes down, your business doesn’t go down with it. That’s basic system design. What I’m talking about here is not that kind of failure tolerance (as it relates to distributed systems) but how Amazon approaches “failures” like product launches, features that missed the point, or even bugs that reached production that impacted many customers. For example, what happened as a result of that one day in February 2017 when S3 went down and brought down a significant chunk of the internet with it? One thing’s for sure, AWS didn’t shutter its doors after that incident. As far I know, there was probably a COE.
COE. The first time I asked what the acronym stood for, a coworker jokingly said it stood for “Cessation of Employment.” He quickly clarified that it just meant “Correction of Error.” To this day, I believe that this is one of those processes at Amazon that just yields so much value. Processes often complicate things with tedium and COEs are sometimes no exception; however, well-written COEs aims to ensure that the same issue never happens again. COEs are also presented before senior leadership to share lessons learned across different organizations as well. That’s it. They exist as a learning mechanism for “failures.”
When I wrote my first COE — due to a bug I inadvertently introduced to production — I was asked to “own” a COE. My manager reassured me that this is for everyone’s growth and things slipping into production is not any one person’s fault. There are multiple checks that the bug went through after I put out the code review — for example, two teammates reviewed the code and approved it, our integration tests passed in our deployment pipelines, and the alarms on our metrics tracking errors didn’t highlight the failure until a customer reported it. A COE is designed to present a cogent summary of what happened, how it went down (the timeline), why it happened, and most importantly, what’s next. COEs are written and presented liberally across many different scenarios. They are not a mark of shame, but a process by which the team can focus their attention on how to avoid a similar issue in the future. There isn’t enough time in the world to spend on finding who to blame or why it isn’t their fault. Someone owns the COE, they present it, and action items are set to ensure it never happens again. Amazon tolerates failure, not only because it can afford to do so but because there are so many things to learn from failure.
Cradle of Cloud Computing
As an employee of AWS, the cloud computing segment of Amazon, there are certain aspects that I take for granted. One such aspect is the fact that I work where cloud computing literally began. There are engineers who have worked here that have literally designed and architected the numerous services that AWS now offers to customers worldwide. And you can bet, as Amazon is a document-driven culture, that there are countless resources left behind by pioneers unrivaled in the world. It is hard to stay “Number One” at anything. It takes not only luck and innovation but persistence. AWS is relentless in trying to close the gap of customer needs in cloud infrastructure. Though it may not last forever (competition breeds innovation), AWS did it first, does it the best, and aims to stay on top — and being a part of that is something wholly unique to working at AWS.
It is no secret that Amazon has its issues and I won’t even attempt to get into it here. However, I hope that from sharing one engineer’s experience, one can see what Amazon gets right, or at the very least, does well. I don’t know how long I will be here as there are no guarantees that Amazon wants me to stay a long time or whether my interests will point me elsewhere; all I know is that working here for the past two and a half years has been a special opportunity that has profoundly changed the way I not only think about engineering but business, corporate culture, and so many other things not immediately adjacent to coding.