If you use Treeherder on repositories that are not Try, you might have used the backfill action. The backfill action takes a selected task and schedules it nine pushes back from the selected task. You can see the original work here and the follow up here.
In the screenshot above you can see that the task
mdaturned orange (implying that it failed). In the screenshot we can see that a Mozilla code sheriff has both retriggered the task four more times (you can see four more running tasks on the same push) and has backfilled the task on previous pushes. …
In Filter Treeherder jobs by test or manifest path I describe the feature. In this post I will explain how it came about.
I want to highlight the process between a conversation and a deployed feature. Many times, it is an unseen part of the development process that can be useful for contributors and junior developers who are trying to grow as developers.
Back in the Fall of 2019 I started inquiring into developers’ satisfaction with Treeherder. This is one of the reasons I used to go to the office once in a while. One of these casual face-to-face conversations led to this feature. …
At the beginning of this year we landed a new feature on Treeherder. This feature helps our users to filter jobs using test paths or manifest paths.
This feature is useful for developers and code sheriffs because it permits them to determine whether or not a test that fails in one platform configuration also fails in other ones. Previously, this was difficult because certain test suites are split into multiple tasks (aka “chunks”). In the screenshot below, you can see that the manifest path devtools/client/framework/browser-toolbox/test/browser.ini is executed in different chunks.
NOTE: A manifest is a file that defines various test files, thus, a manifest path defines a group of test paths. …
In the last few months I’ve worked with contributors who wanted to be selected to work on Treeherder during this year’s Google Summer of Code. The initial proposal was to improve various Treeherder developer ergonomics (read: make Treeherder development easier). I’ve had three very active contributors that have helped to make a big difference (in alphabetical order): Shubham, Shubhank and Suyash.
For over a decade Mozilla has been using IRC to publicly chat with anyone interested to join the community. Recently, we’ve launched a replacement for it by creating a Mozilla community Matrix instance. I will be focusing on simply documenting what the process looks like to join in as a community member (without an LDAP account/Mozilla email address). For the background of the process you can read it here. Follow along the photos and what each caption says.
If you have managed to get this far, Welcome to Mozilla’s Matrix! 😄
NOTE: If there’s an official page documenting the process I’m not aware of it. I will add it once it is published.
In June we discovered that Treeherder’s UI slowdowns were due to database slow downs (For full details you can read this post). After a couple of months of investigations, we did various changes to the RDS set up. The changes that made the most significant impact were doubling the DB size to double our IOPS cap and adding Heroku auto-scaling for web nodes. Alternatively, we could have used Provisioned IOPS instead of General SSD storage to double the IOPS but the cost was over $1,000/month more.
Looking back, we made the mistake of not involving AWS from the beginning (I didn’t know we could have used their help). The AWS support team would have looked at the database and would have likely recommended the parameter changes required for a write intensive workload (the changes they recommended during our November outage — see bug 1597136 for details). For the next four months we did not have any issues, however, their help would have saved a lot of time and it would have prevented the major outage we had in November. …
This post focuses on the work I accomplished as part of the Treeherder team during the last half of last year.
Some work I already blogged about:
Up until September we had been using code merges from the master branch to the production one to cause production deployments.
A merge to production would trigger few automatic steps:
If a regression was to be found on production we would either `git revert` a change out of all merged changes OR use Heroku’s rollback feature to the last known working state (without using Git).
Using `git revert` to get us back into a good state would be very slow since it would take 15–20 minutes to run through Travis, a Heroku build and a Heroku release. …
In bug 1566207 I added support for Heroku Review Apps (link to official docs). This feature allows creating a full Treeherder deployment (backend, frontend and data ingestion pipeline) for a pull request. This gives Treeherder engineers the ability to have their own deployment without having to compete over the Treeherder prototype app (a shared deployment). This is important as the number of engineers and contributors increases.
Once created you get a complete Heroku environment with add-ons and workers configured and the deployment for it.
Back in July and August, I was looking into a performance issue in Treeherder . Treeherder is a Django app running on Heroku with a MySql database via RDS. This post will cover some knowledge gained while investigating the performance issue and the solutions for it.
NOTE: Some details have been skipped to help the readability of this post. It’s a long read as it is!
Treeherder is a public site mainly used by Mozilla staff. It’s used to determine if engineers have introduced code regressions on Firefox and other products. The performance issue that I investigated would make the site unusable for a long period of time (a few minutes to 20 minutes) multiple times per week. An outage like this would require blocking engineers from pushing new code since it would be practially impossible to determine the health of the code tree during an outage. In other words, the outages would keep “the trees” closed for business. …