How we tackled technical debt at Wikipedia
Talking a bit helped us write off several years of technical debt
A big challenge for any software engineer is explaining an important technical change to an audience who may not necessarily have the appropriate context. “Technical debt” is a phrase that will perplex a product owner if not articulated correctly. If you work in any product-centric team, you’ll likely find many technical tasks may be brushed aside in exchange for more tangible, visible outputs. That said, technical debt is very real and needs to be addressed.
Back in January 2018, Wikimedia engineer Joaquin Hernandez and I pitched a one year project to pursue the scoped and measurable goal of increasing code coverage of our codebase and investing in our ageing but strategically important codebase. We did so with the belief that this could lead to working with code we liked much more.
Often developers, designers and product owners speak different languages and have very different and conflicting desires. Taking the time out to write the project proposal and talk together about problems, solutions and benefits was well worth the time and resulted in a shared mutual understanding. I believe that because of this work and new shared understanding, the project was easier to get onto our annual goals.
The problem statement
Our “stack” is a LAMP stack, with jQuery and some in-house (but open sourced) libraries you have likely never heard of. Given the age of our project, we still have several libraries that we can’t easily remove from our stack without breaking crucial and complicated tooling made by developers long gone that editors still use to keep Wikipedia running. For example, much of our site is built on jQuery UI which was released back in 2007. Our technical community has made good progress in pulling us away from these libraries by getting them off the critical path (JavaScript that is loaded without the user interacting with anything) and reducing redundancy.
Possibly the biggest example of success with our modernization of the front-end is the mobile website, which runs on a separate domain, and runs 60% fewer bytes of JavaScript than the desktop site, despite being more reliant on JavaScript for many of its workflows, notably editing and searching.
Our past front-end decisions continue to haunt the developers that work on our codebase and are unlikely to disappear from our stack entirely any time soon. Any new library being introduced to our stack is quite rightly met with suspicion. Any attempts to adopt modern JavaScript libraries such as Vue.js and React are quite rightly slower with the scrutiny you’d expect from a decade old web veteran with limited resources and time like Wikipedia that’s seen it all from its rocking chair on the porch¹. We’ve learned that bad technical decisions are hard to undo.
We’ve likely had every discussion every developer in every other organization has had, whether it’s been “Mustache VS Handlebars”, “Should we use TypeScript?” or “Which is better? Vue.js or React.js?”. What we’ve found is it’s really hard to get consensus on these big questions in an open source community with no benevolent dictator to make the decisions.
This aside, mobile usage is growing. We are interested in adopting technologies such as service workers which provide offline support and better availability, but enabling one in a codebase such as ours is risky business.
The solution
After discussion within our engineering team, we focused on a solution to, and have provided proof of concepts for, iteratively refactoring and revising the existing code. Our project proposal, which is public, explains why we feel strongly that we invest in mobile’s frontend architecture.
We drew our proposal from experience in a recent experiment with our page previews feature and a write up of the experience. Before the project, our front-end assets were managed with a MediaWiki-specific system called ResourceLoader and our proposal was to move off it and lean more on modern-widely utilized front-end tooling, for example — but not necessarily — Webpack. With this achieved we would refactor, improve and modernize our neglected component library.
Inside the project proposal, we importantly made no specific promises; instead, we presented tangible problems and measurable outcomes, which after six months we have now partially achieved. Those include:
- Increased test coverage (our code coverage was less than 50% and more alarmingly, 45 of our 81 files had 0% coverage.)
- Fewer regressions (we’ve noticed every JavaScript regression and the majority have not been related to our refactor project)
- Code built with modern tooling (all our JavaScript code is now built via Webpack which has made it much easier to work with)
- Possible performance improvements (so far we’ve not seen any change here but we see the potential for change)
- More reusable standardized components (we’ve started making headway on this!)
And less measurable (but hopefully outcomes that could later be recognized):
- Quicker on-boarding of new hires
- Quicker development cycles and estimations for product work
- More future proof code
Essentially, for a project with limited time and resources (Wikipedia’s mobile site is maintained by a team of just 6 paid people), we pitched a refactor not a rewrite. Rather than replacing a complicated system, we would slowly and iteratively improve it. This was important as it promised iterative development while keeping the site up and running (and improving at all times). While this slowed down development it kept our work visible and kept us accountable to our product team, meaning it was impossible for our development team to hold our product owners hostage by telling them we couldn’t build new things until the refactor had completed. It also allowed us to work on new products in parallel to this important work (we shipped several projects during this time) as well as guaranteeing that we would achieve something!
If you are interested, there is a technical write up of what we have done and what we have achieved so far on our internal blog: Migrating code from ResourceLoader to Webpack.
Lessons learned
Halfway through this project, my team has made great headway, and I feel I am at a good point to reflect on what’s worked well for us.
1. Stop coding every now and again
If your engineering team is going from sprint to sprint without stopping, this is likely a problem. Our team found respite from coding during our company all-hands and used this valuable time to talk strategically and write up the project proposal over the course of three days. This activity wasn’t all talking and writing— we built an important proof of concept!
Make sure your team is making time where everyone can stop coding so that the team is free to think and describe problems they are facing. Stop coding and talk to your team and find out what drives them mad and make sure you collect around a single problem statement.
In my opinion, a team offsite, in a foreign city, in the best environment for this to happen.
2. Be problem focused not solution-driven
It really helped my team to keep the project problem-focused rather than solution-driven. While many of us were tempted to be more ambitious and say we wanted to use React.js or TypeScript, being problem-focused allowed us the flexibility to do whatever we felt was important for the project at any given time to be seen as a success.
One unexpected output of our work, which I wrote about recently, was exposing JavaScript errors. Before we began, the rewrite had us a little nervous and our infrastructure is lacking a system for capturing JavaScript errors. Thanks to being a problem-focused project, we had the flexibility to incorporate this into the pproject. After some discussion, we decided to build the smallest possible thing we could using existing infrastructure — a client-side error counter. While not ideal and not allowing us to understand what the errors were, this allowed us to get a sense of whether changes to the site are introducing bugs to users and provides us with a lovely graph to monitor with motivating numbers. It also gave us data for a future technical project to pitch to our product team (I can’t wait for our next offsite)!
In addition to this, while we’ve been forced to look at the code, we’ve been noticing ways to improve it and prepare it for a modern future. For instance, we’ve been reducing our reliance on jQuery. While we’re not removing jQuery from our stack just yet, we’ve found inspiration in other efforts to do this such as Github to at least make this a real possibility.
3. Consider refactoring rather than rewriting
Many rewrites have been proposed and attempted at Wikimedia, in particular for the mobile site — proof of concepts exist in the form of a React mobile-clone Weekipedia and project Marvin but the results of these have never materialized in production.
A legacy system is a well battle-tested system. While sometimes a rewrite is necessary, (bridges are a good example!), refactoring is a great way to ensure that something good comes of the work you do and that product sees the value of what you are working on.
We’ve been working on a refactor in a living, breathing codebase, for six months now, and that project is still running and hopefully, you have noticed no difference. If I don’t say so myself, I think that’s a pretty remarkable achievement of my team.
4. Make talking a routine
The most important output for me was the conversations my development team had. Having a big technical project and the time and freedom to oversee it empowered the team to justify a dedicated hour every week to talk about the problem statement.
This project, while not giving us shiny technologies, has allowed us to have many conversations about many of the patterns popular libraries encourage such as composition over inheritance, Higher Order Components and dumb components to name a few. It allowed us to understand the history of the project; why things are the way they are; talk about what we like in other codebases and what we’d love to learn. It was essentially a focused brown-bag session that gave us all ideas of where we were heading and if we wanted to achieve things how we might get there.
If you work in a technical team and you are spending most of your time coding in some form, I highly encourage you to find at least one hour a week to talk about the higher-level goals of your work as a team. What one person is struggling with or excited about is likely a larger topic in need of a discussion.
Technical debt is a real problem — recognize it!
Much of Wikipedia’s code is not the most appealing of technology stacks for front-end developers. We don’t use any well-known frameworks. My experience has shown that new hires can take many months to become fully effective, and existing hires are susceptible to getting frustrated with it.
That said, I believe strongly that in a decade from now, much of software engineering is going to be about fixing codebases that were built in this era by teams understandably building irresponsibly in order to hit deadlines or create that first minimal viable product, so I think getting exposure to mature codebases and finding ways to refactor them responsibly is an enlightening and rewarding experience.
While hard, it’s our duty as engineers and product owners to explain and understand why cutting a corner is a bad idea and technical debt is a problem.
We should all talk more — to our existing team members and future team members. Good engineers should be explaining the problems they are solving before building (it might have already been solved!) and take time to document complicated or hacky code in great detail! We shouldn’t be afraid to write long and lengthy commit messages about what we are doing, no matter how obvious it might seem at the time.
We should all strive to ask each other for help and opinions in code reviews and outside code reviews. We should communicate about what we don’t like and what we do like. We should listen to the frustrated junior developers in the team and the conservative senior engineers that have seen it all before and find problems and solutions.
Someone is going to have to live in our code and suffer our exact same problems when we are long gone and ultimately it's the user who suffers from that. Let's remember that next time we decide to speed up a software delivery.
Footnotes
If you want to work with a codebase different from what you may be used to consider cloning our codebase and submitting a patch or two.
If you want to learn more about what we did with Webpack and what we were doing for, please read my other article Migrating code from ResourceLoader to Webpack
¹ [For the record, adoption of new libraries is not impossible (in fact a PHP Vue templating library has just been created with plans for adoption in Wikidata.org)!]