How Grammarly bashed our test site
(A little background: our official website is built with MERN stack + Google Cloud Platform. This is her 😘: https://www.evestemptation.com/)
It was the Thursday four weeks ago, January 10th. One of our software engineers, Mengze was ready to deploy the master to test site and get ready for next week’s release featuring a new more informative navigation bar.
She has worked on the new navigation bar for two weeks. Everything looked well locally (as always :P). Tested all passed. The test site behaved in the same way. Great, time to do hallway tests and let everyone in the company play with the test site.
A couple of hours later, one of the marketing fellows DMed her, saying that the site wasn’t responding — it was so slow it’s blank and that she was having trouble switch to another tab.
Mengze visited the site on her laptop, everything looked okay here. She continued to revisit her pull request, looking for potential code that can perform slowly. The pull request was using a new widget for a nice visual effect. Can there be anything from this third party that causes this? She tried a few tweaks with no luck. Expanding the scope to the last working version didn’t bring any light, either.
She turned to RJ, our system architect. RJ has been here since day 1 of the website. Knowing the ins and outs of the site, RJ spotted something weird. He tried to deploy other test versions without this PR and it’s still happening, and he was pretty sure other PRs were working before on test site. His laptop could reproduce this fine, and the CPU usage was very high.
It’s the end of Thursday now.
Maybe it’s not the frontend code, but backend. That can explain why all frontend PRs on the test site all had this issue. On Friday, RJ looked into the diff on the backend and still no clue. Command + Control + I, the browser seemed to be irresponsive each time at a different request and that how long for that to happen from the starting loading the page all felt the same.
What does that mean? Maybe the root cause wasn’t on our end, but the configuration of the browsers. RJ opened a new incognito window and copied pastes the link. Enter. The site loaded instantly as it should. What about another browser? Well, Safari worked fine.
So far we have isolated the cause to be within the browser and cross off some potential causes of the list: (would love to do a strikethrough font, which is not allowed on medium)
1. Mengze’s pull request
2. Other front-end pull requests
3. Back-end changes
Incognito mode means that a user’s cookie will not be stored after the window is closed and her behaviors are not trackable. RJ tried to visit the site again with regular chrome window, all data cleared. That didn’t fix it. Safari should already have all the session info. This led us to wonder what are other things that can be different from each, probably the same difference between Mengze’s browser and others’?
RJ blocked all the requests from extensions. The site’s back. Great! We moved a step close to the true:
He tried to block one extension at a time and whoop, a big leap!
Grammarly was the one slowing down the site! We checked with Mengze. She had the extension installed but disabled. Grammarly is such a popular extension. Almost, everyone in the office has this grammar autocorrecting plugin installed. A lot of our customers can have that added to their browser as well. We MUST fix this. We turned the Chrome developer mode on and found:
ERROR proxy loaded with error: The extensions gallery cannot be scripted.
Grammarly seemed to catch all the interactions and score the text user inputs base on grammar. RJ also opened a network tab to monitor the requests by Grammarly.
Hmm, seems they have a config.json mapping specific configurations for a whole list of domains.
We also tried set data-gramm=”false” to every text input of the home page to disable Grammarly’s initiation. However, this was happening with every page, we can hardly do this to all of the inputs. Back to searching for other possible solutions.
Apparently, they are using React, as well, which can be confirmed with their frontend job description, and we are on MERN stack. Plus, there’s something wrong with the scripts. Maybe there’s something conflict or incompatible with what Grammarly uses, so:
Grammarly>incompatible React/node/… versions?
It’s the weekend before the release, and we were in the version abyss. 🥶 (or not?)
We had a hectic weekend-revisit the code and try different versions. Still no luck. On Monday, RJ, along with other engineers, continued trying to resolve this issue. He tried reverting it back to one of the previous working version, but that didn’t work.
It’s almost 1 am now, so we entered Tuesday, exhausted and frustrated. RJ accidentally typed the test site URL wrong and pressed enter. The unexisting site was not responding, behaved the same way as our broken test site. After 20 seconds, it loaded a Google 404 page.
Wait a second! The URL wasn’t right, there shouldn’t be anything returned from our server. There’s only Grammarly. Nothing to do with our frontend, backend, or node/react version. Nothing to do with our code!
There goes our “Eureka” moment, where every little trait connects and clicks. The answer was ALL THIS TIME! Right ON THE SCREEN — the URL! Remember the config.json file? Grammarly has a specific configuration for each domain listed. Our best guess is that the test site’s URL matches some regex they have and causes this.
We tried a different URL this time with the same build, and all lookin’ good. We finally shipped the release. RJ sent an email to Grammarly support about the issue we found. That night, we found sometimes the site works, other times, still lagging. (maybe they are debugging 😐)
The next day, RJ got an email back from them, saying they are aware of the issue but don’t have a quick fix currently. Even before this publish, we still find this issue happening sometimes. If you find this blog helpful, or interesting, please clap 👏 .