How 3 Magoosh Engineers Support 5 Million Users (and Still Sleep at Night)
The most common question I get from other engineers is: how do you manage everything with such a small team?
We’re a team of three full-stack and multi-platform (Rails and React Native) engineers building and supporting an online study platform and API, which supports five mobile apps, with over 5 million registered users.
So what’s the secret?
It’s not that we work insane hours to keep the site up, and no we don’t drink excessive amounts of Blue Bottle either. We’ve maintained 99.9% uptime in the last two years, and our engineers still have a good work-life balance. Our three biggest secrets are our 1) prioritization process, 2) willingness to outsource non-core application work, and (3) commitment to high-quality code for non-experimental projects, which reduces on-going maintenance.
For this article, I’ll focus on prioritization.
Each week we go through a strict prioritization process. We take a look at the tasks in our product & engineering queue and divide up the work between our three engineers. For larger projects that span multiple weeks, we try to break the work down into chunks that can be accomplished in a week or less. Almost always, we have more proposed tasks than time, so we practice this prioritization process a lot.
Here’s a flowchart that outlines our process:
Let’s walk through each step in this diagram. The five main questions we ask about the [feature/bugfix/improvement] are:
1. Is this a major issue preventing our students from studying?
- Is the site down? 💥
- Are students losing study progress?
- Are students having trouble accessing their content?
- Are new students blocked from signing up for Magoosh?
If the answer to to any of the above is “yes,” we’ll certainly prioritize it for the week.
2. Is this task related to a quarterly goal?
- Each quarter, our company creates goals that multiple departments collaborate on. One such theme might be “ACT & SAT student engagement,” and many engineering tasks of various sizes can come out of that.
If the task is directly related to a goal, we’ll likely prioritize it!
3. What is the impact to effort ratio?
- How many students is this impacting? Is this issue occurring for all iOS students? Or just 1% of our Android students?
- How much engineering effort will it require?
- What is the revenue impact?
- Will it save time for someone else on the team?
- Is there a workaround — like using our website instead of the mobile app?
If the issue is impacting a large number of students, we’ll prioritize it. If its impact is relatively small and/or if there is an easy workaround, we likely won’t.
4. Can we solve most of the problem without engineering effort?
- Can someone else on the team solve this issue without code?
- Can we periodically run a query to temporarily address the issue?
- Non-engineers often use Zapier, Google Sheets, and other tools to automate some processes that otherwise would have to go through the engineering team. Is that possible here?
- How confident are we that the new feature will stick around, and solve the problem effectively? Is it possible that we’ll have to iterate on this later because we don’t have all the requirements?
New code requires long term maintenance, so we try to avoid adding a lot of new code unless we’re sure it will be effective. 😀 This has an added bonus that engineers can always feel good knowing their work will be impactful!
5. Will an engineer learn something by working on this? Is someone excited to work on it?
- Even if the task isn’t very impactful, will this give an engineer a chance to work on a piece of the code that they’ve never touched? That in itself is very valuable. See bus factor.
- If it’s not educational, there’s always the fun factor. Will it knock out a pesky bug that’s been bouncing around for years? Will it fix some tiny margin spacing issue that’s been driving you insane? Then by all means, deploy that satisfying quick fix :)
Not every task that comes through our queue will give noticeable benefit to our students. However, it may still be really valuable to one of our engineers to poke around some new code and get a greater understanding of our codebase.
So where does this leave the [feature/bugfix/improvement]?
After our weekly meeting, all proposed tasks will end up in one of the following three states:
- Yes: this will be worked on this week.
- Revisit later: we’ll revisit this task fairly soon (in the next couple weeks) and re-prioritize.
- Revisit much later: we may revisit this far in the future, or never. A task could be classified as tech debt, and we may come back to it if we’re working on related code as part of another task. Or a task could be classified as an engineering idea if it’s a “nice to have” and it’s absence doesn’t have any negative impact.
Let’s look at an example task
Here’s an example of how we might prioritize a task by using this chart. The task is to fix a bug in our coupon generator, which when working properly, can generate thousands of unique coupons for organizations that purchase many accounts.
Let’s use the flowchart!
- Is it a major site issue? No — the site is still up and running fine, and core site functionality isn’t impacted.
- Is it related to a quarterly goal? No — sales to organizations aren’t tied to any current goals.
- Does it have a good impact to effort ratio? Yes — I’ve estimated the fix to take 2.5 hours and the fix could save someone on our team up to 2 hours per week, plus help out with more bulk sales.
- Can we get most of the way without engineering? No — tools like Zapier don’t have permission to create records in our database.
- Does the team have capacity this week? Yes — Albert has a few hours to spare this week!
So if we follow the flowchart, the answer is yes! We’ll work on the task this week.
This may seem intimidating at first and like a huge hurdle to getting a feature built! But don’t fear! As another Magoosh engineer Albert wrote about, we have regular hack days where this flowchart doesn’t apply. Hack days have their own special flowchart, which looks like this:
That’s it! No magical self-coding website, no secret remote team of 15 engineers. This process is key to allowing us to have a tiny team that is still impactful. It’s also a process we can scale as we grow and add more engineers to our team.
Plus, we keep our code pretty tidy too, so that helps. :)
Thanks to the product & engineering teams (past & present) for helping with this post, and Alice for making the flowchart!