Editor 2.0, Fast and Slow

Vadzim Hlinsky
PandaDoc Tech Blog
Published in
6 min readOct 4, 2023

Disclaimer

This is the first part in a series and it’s not entirely technical — maybe a little bit. It’s more about problems, high-level solutions (the good and some not-so-good ones), and our baby steps in this long journey of application optimization.

“Everything seems fast, so fast that thoughts of anything being wrong never arise. The feature works and there are no bugs — what could be more pleasant? And that’s how it goes until you go into production, and real users start using your product.”

Background

2019 was a happy year! We built a new version of PandaDoc’s document editor. Editor 2.0 boasted of scalable architecture, new features, and a friendly UX. However, we had overlooked one important aspect: performance.

The new editor was rewritten using the technologies considered cutting-edge at that time: React, Redux, and Immutable. The front-end community has established a set of de-facto industry standards, encompassing both good and bad practices (we’ll discuss more of them in the next article).

When you implement a new feature and follow these practices, it seems like everything should be lightning-fast by default. And if, consciously, you go against these practices, there’s always the magical incantation W̶i̶n̶g̶a̶r̶d̶i̶u̶m̶ ̶L̶e̶v̶i̶o̶s̶a̶ “Premature Optimization Is the Root of All Evil”, which should convince everyone, because so many articles appeal to this argument.

Then, you test the feature (of course, in isolation) and everything seems fast, so fast that thoughts of anything being wrong never arise. The feature works and there are no bugs — what could be more pleasant? And that’s how it goes until you go into production, and real users start using your product.

Surprisingly (not really), it turns out that users don’t work with empty documents and isolated functionality. They use multiple features, create documents with a large number of pages, different blocks, and fields. You start receiving feedback from frustrated users: slow loading, sluggish editing, slow drag and drop. You try to reproduce it, and… it’s all true. Everything is very slow.

“Clients use your application differently compared to what the developers thought (if the developers even thought about it), and even if they use your features as expected, the amount of data they use clearly exceeds your checks during the development stage. And, of course, not all clients have a 6-core Mac.”

Investigation

And so we started to figure it out.

The best solution would be to cover all aspects of the application with metrics from the beginning, to understand what specifically is causing the slowdown in each case. It’s a big mystery whether we added metrics or not; I hope so, but if not, then don’t make the same mistake. After all, clients don’t provide specific feedback; they just say everything is slow.

Sometimes it’s possible to reach out to the client and have a discussion with them to identify their specific concerns. If there are no metrics yet, this can be a good starting point.

After that, we started profiling diligently, trying different hypotheses whenever possible, and concurrently creating a backlog of tasks to address the issues.

After creating the backlog, when you start thinking about solutions, a dilemma as old as time often arises: should you do it fast and locally, which may help, or should you delve deeper and potentially rework certain conceptual aspects? I’m not saying that doing things quickly and locally is bad, but it’s important to understand the consequences and the potential future development of the application. Sometimes it’s a good approach, while oftentimes it’s patching holes in a sinking ship.

If you have several different options, it’s best to discuss them with your manager and your colleagues. Perhaps, from a business perspective, the best solution is to address the issue locally and not with the most optimal approach. There is no definitive answer here; it always requires discussion, negotiation, and explaining the pros and cons of each solution. Maybe you can do it quickly now, and then you get more time for refactoring (although you probably won’t, or your manager will not give you the time!).

Naturally, we chose to do it FAST.

Сache, just cache it

A few small fixes, and a brilliant idea comes up — let’s cache everything whenever possible.

That’s where lodash memoize comes to the rescue. Of course, no one reads the nuances of its usage, but who needs that? The cache just works!

We start caching complex computations and even React components. It was fantastic and lightning-fast, although not for long. After a while, we learned that customers began to observe the grey screen of death (Out of Memory):

Fixing the fix: memory f̶r̶leaks and local cache

After another round of profiling and studying the lodash documentation, it turned out that a significant portion of memory was not being cleared (quite unexpectedly). The size of the cache in lodash is unlimited; whatever you put in, stays there. Considering that we have a single-page application (SPA) where the client can navigate back and forth indefinitely, the memory usage keeps growing, and items are never removed. Furthermore, there are memory leaks in both react-dom (now that was really unexpected) and reselect.

Optimizing `lodash memoize`

You can move a part of the global cache to the React component level so that when the application unmounts, the cache is cleared as well. For cases where this can be achieved, the lodash library provides an optional argument called resolver where you can implement a cache retention and invalidation strategy. One option is to clear the cache when its size exceeds a threshold value.

Containing `reselect` leaks

Every selector you create uses a closure to store your cache and passed arguments. We identified two problems:

  1. createSelector always retains the last state. When the application unmounts, a portion of your store remains live. It wasn’t our biggest issue, so we just acknowledged and noted it.
  2. structuredSelector. The scenario where you directly use connectlike connect(structuredSelector({ a, b }))is more complex. Reselect retains all the arguments that come from connect, including not only statebut also props, and in cases when you have a reference to an instance inside the props, your code will leak. To fix this, we rewrote our code to explicitly pass only the necessary parameters. connect((state) => structuredSelector({ a, b }))

Now it seems that these problems could be fixed in another way — connect and factory functions, but it looks like we missed the documentation. Again? 🥲

Encountering react-dom issues

Unfortunately, we didn’t have time for deep immersion into the internals of React. We realized a problem with leaks was a well-known one, and there already was a fork with a small patch available. Therefore, we decided to apply the patch.

All these changes provided temporary relief. We received less negative feedback and stopped encountering dinosaurs that go along with crashes in Chrome. Not for long though. As soon as we deployed the application to a larger user base and introduced new features, new challenges arose. (But that is for the next chapter).

Words of wisdom

  • Study the tools you use, read the docs — read the docs twice 😄, and not just their signatures and basic functionality, but also the caveats of their usage.
  • Test your functionality with a large amount of real-world data, ideally considering other features and utilizing CPU throttling or using slow devices.
  • Approach the problem from different angles and propose different options. Sometimes a fix is just a fix, but other times it may require a global overhaul, both technically and from a UX perspective.

All the best!

--

--