Technology migrations: unglamorous obligation or an impactful opportunity?
By Nikhil Garg
Coursera decided to start using GraphQL supported APIs in the spirit of making our platform faster, and better for our learners. To facilitate that, our engineers extended existing technologies used within Coursera to be compatible with GraphQL. In summary, all systems that used courier (an open-sourced data interchange system built by Coursera), and naptime (an open-sourced REST API library built by Coursera) could easily be extended to use GraphQL, along with a lot of other benefits.
As a result, we initiated the process of migrating legacy systems to ‘courier’ and ‘naptime’. We started with our Peer review assignments system. It is one of the earlier and more complex assignment systems built here at Coursera where learners are graded by their peers.
The overall task was to perform the following for peer review assignments system:
- Migrate existing scala models to using
courierschema models providing a type-safe, schema driven way of sharing JSON data between backends, web and mobile clients.
- Migrate ad hoc play routes/controllers to using
naptimefor a big gain in developer productivity, REST standardization, and an attempt to DRY out our API code.
- Both these migrations needed to maintain backwards compatibility for 1) underlying stored json data in case the existing data formatting needed to change 2) existing legacy API routes and responses to serve older un-updated mobile clients.
It was a 3 months long project involving both backend migrations, and a huge frontend effort to support the newer APIs and models.
This is a blog post about how we leveraged this technology migration opportunity to increase impact while staying excited, and emerged feeling proud of what we accomplished. It also references how Coursera’s core values a) Betterment b) Boldness c) Solidarity and d) Deep honesty guided us throughout.
What went differently?
It’s not improbable for an engineer’s first reaction after considering working on technology migrations to resemble: ‘not challenging’, ‘unglamorous’, ‘no growth’, ‘no accomplishment worth being proud of, or sharing with others’, ‘just migrations, eh.‘
I remember something that my manager hinted at several times during our 1:1s, ‘Although getting a meaty project with an anticipated high impact is desirable, how one approaches any given project and how much we push ourselves in the process is equally important for growth, impact, and technical excellence.’
This is our reflection on how the project went: We leveraged this migration opportunity to have a bigger impact on the organization. We learnt and grew a lot during the process. We felt motivated throughout the project, and will categorize it as one of the more technically challenging projects. We’re proud of what we accomplished, and excited to share our work with others.
Here is a summary of what helped us through: 1) Owning the final product, rather than ‘just migrations’ 2) Leveraging this opportunity for bigger impact 3) Upfront planning and having a timeline which we held ourselves accountable to.
Owning the final product, not just the migrations
Owning a system typically means its future maintenance and feature iterations fall under your team’s goals. We understood that apart from migrating the system with its existing weaknesses and strengths, we had an opportunity to dive in and do much more — own the whole system end-to-end rather than ‘just migrations’.
We decided to be bold and empower ourselves to question legacy architecture decisions, be creative and improve/redesign the architecture with newer ideas wherever we could. Within it, we found the motivation to learn deeply about the system, to find opportunities to simplify, and improve the system for easier future extensibility, maintainability, and for significantly reducing the onboarding cost for future developers working on the system.
We decided to take pride in the final product delivered, rather than restricting ourselves to ‘just migrations’ providing the much needed boost to the scope of learning and challenge in this project, along with a boost to the impact and team’s excitement.
The next two sections talk about how we actually leveraged and prioritized opportunities to deliver a bigger impact, while keeping the timeline in check.
Leveraging for more impact
While working with a system and making in-depth changes such as technological migrations, it’s relatively easier to migrate the system as is with its existing strengths and weaknesses without putting additional efforts to ‘improve’ the overall architecture. IMO, restricting migrations to be a mundane task is a missed opportunity.
We felt that when engineers take on in-depth technology migrations for a system, a lot of time is spent thinking about model/architecture redesigns, acquiring context from original authors, getting in the zone of a particular system, reading and understanding through complex legacy code, code reviews, regression testing, etc. We decided to leverage these basic building blocks, and go an extra mile by also including design improvements and code refactors in the project scope.
Some benefits we ended up with: 1) lesser time to onboard new engineers 2) faster iteration cycles for future feature developments — we’re already seeing this benefit while iterating on another system which shares a lot of features with peer review assignments — staff graded open-ended assignments 3) cleaner and more organized code base 4) standardized tooling for other devs working on similar migrations.
Here is a summary of some tasks that we ended up adding on top of migrations:
- Build and standardize tools for problems we solved for our migrations: In the spirit of solidarity, one of our senior team members ended up shipping a data migration tool that allowed for much safer data migrations for our peer review system, and would also serve as a great tool for other devs who will take on data migrations for their systems in the near future.
- While migrating, we redesigned data models and APIs: When migrating our data models and our APIs, we had an opportunity where we could do a complete redesign of our models and APIs, re-using much of the effort already spent in migrations. In the spirit of boldness, we decided to incorporate our past learnings, and future extensibility needs within the designs. Few examples 1) while migrating data models, we decided to go through the painful process of making our model hierarchy more typed, which earlier had a huge toll on devs working with the system 2) while migrating APIs, we redesigned the submission flow to make it more intuitive, deriving learnings from other similar workflows which have been built since then.
- Codify non-intuitive context: While deep-diving into the system, we gathered a lot of context by talking to original authors and by reading through complex code paths. In the spirit of betterment and solidarity, we codified this context within the business logic wherever possible, by making the code base more intuitive, and documenting the rest making it easier for future devs to onboard.
- Deprecate older abstractions: We deprecated legacy abstractions that were well-intended at the time of creation, but in hindsight, were scarcely used.
- Refactor and upgrade code base: Whatever code we touched, we tried to update it with the latest patterns and libraries, and also made simplifications to code paths that had gotten complex over time.
Upfront planning and timelines
Adding more scope to an already big migration efforts can become risky, and hard to sell. Here are a few things that we did to navigate this challenge: 1. Convincing ourselves, and product leaders that the additional improvements are worth the cost 2. Having a detailed upfront plan 3 Milestones and timelines that we hold ourselves accountable to
Is it worth the cost?
Here is my argument 1) these opportunities are really high ROI as talked about in the earlier section 2) they directly impact the motivation and excitement within the team and are likely to give great productivity gains.
When we communicated the increased scope, our product and eng leaders boldly supported us fostering our values of engineering excellence and a culture of betterment.
There is one big risk here though — bloated scope.
We encountered some real challenges during the migration process, both on the frontend and on the backend side. Few examples include 1) Safe and tested data migrations 2) Frontend carefully switching to new APIs safely by employing flow types and extensive unit tests. Combine these with the additional improvements mentioned in the previous section, it can quickly become challenging to strike a good balance between these the two and to not bloat up the scope to unacceptable standards.
Something that helped us through this was detailed upfront scoping and deep honesty in our estimation and communication. We tried to scope out as much as we could before diving into the implementation with detailed design docs, tasks broken down at a very granular level, time estimations and calculating possible risks for the timeline.
Milestones within the timeline helped us detect early on if we were drifting, and helped us strategize to get back on track. We used Jira extensively to manage the project and monitor our progress. Jira epics, weekly reflections to learn and remain accountable using jira sprint reports & epic reports, upfront time estimates for granular tasks were some of the tools that really helped us stay on track, and continuously improve on our estimation muscle.
Overall, working with such a great team and mentors, great support from our leaders, owning the final product rather than ‘just migrations’, challenging ourselves to do pride worthy work and leveraging opportunities to deliver an even larger impact, made for a great satisfying project.
Major thanks to Amory, Holly, David and Priyank for making the peer review technology migrations successful. Nikhil Garg is a backend developer on Coursera’s Learning Experience Team.
Originally published at building.coursera.org on March 3, 2017.