Why The Software Engineering Department Should Stop Thinking About Optimization
In the Software Engineering Department, a common mantra is to optimize. We optimize software performance, SLOs, development speed and so on. This, as you shall see later, is the wrong north star.
The Predisposition to Optimize
It is interesting to note that software engineers are conditioned to think about optimization, not to anyone’s fault. The inclination usually comes from external peers and downstream pressure. Pick up a book on writing good code and you’re always conditioned to think about optimizing. Complains from product and customers saying that the software is “slow” conditions the engineer to want to make the software more performant.
This is further exacerbated by a general lack of interest of Software Engineers to understand the larger context by which they build their software. Pause for awhile to consider how many software engineers in general weave their activities with the needs of the CFO or the sales department? Again, this is not to demean. Rather, the thought it should invoke for the uninitiated is this, “tell me more.”
The Engineering Silo
It is common for departmental silos to emerge, again to no one’s fault. In a typical organization structure, the Product Team has its own vertical. The Engineering has its own, Sales and Marketing, Corporate, Customer Support and others also have their own. While this is perhaps the best means by which people are to be organized, this comes at a cost: contextual and collaboration barrier increases in proportion to the teams within an organization.
The engineer therefore, in his own silo, is acutely unaware of matters such as corporate budget, staffing costs, business risks, and so forth. Corporate functions tend to cushion the blow by organizing company-wide all hands with the CFO reporting the company’s financial health, HR reporting some new benefits, and various department heads giving their own high level reports. Let’s be upfront here. If the topic of discussion does not directly impact the engineer, he’ll find them unhelpful. The same is of course true around other departments.
As information get siloed around various departments, so does one observe an increase in the volume of echo chamber. The engineer therefore, instead of asking, “How can I use my skills to benefit the company?” now asks, “How can I use my skills to benefit my immediate scope of influence?”
Of course, the engineer then starts thinking about squeezing that extra 5% performance gains or that extra 0.01% crash free sessions. Why not? It makes my software better, RIGHT?
Pause for a bit though and try to answer this question in the context of the organization. “If the engineer is to squeeze in these extra gains, what opportunity is he leaving on the table?”
Every activity we say YES to also means we’re saying NO to everything else.
Stop Thinking About Optimization. Think About Value Creation.
Thinking about optimization forces the engineer to think around a very narrow cognitive radius. Thinking about value creation opens up the box in which the engineers are conditioned to think. The engineer, instead of asking “what can we optimize?” should instead ask, “where can we generate the most value?”
EXHIBIT A: Should The Engineering Department Decrease Infrastructure Cost?
The immediate answer is of course, YES!
As you shall see later however, things are not as black and white as it initially seem.
A SaaS Context
As of writing, I work in a SaaS company using a multi-tenant approach to serve our customers. We charge our customers either a one-off sum or a subscription model. Of course, to charge properly, a company has to account for the cost. What is clear to the sales division is the cost of goods sold and the size of the customers. What is unclear however, is the tenancy cost in our cloud infrastructure.
The solution of course is for the engineering team to provide this. In order for us to do that, we have to measure the amount of compute and storage we incur and split it per customer. A naive approach is to just divide the overall cost with the number of customers. It is of course a mismatch of cost attribution in terms of the tenancy cost of a company purchasing a thousand seats vs. a company with fifty.
There is an entire blog written here on how to account for tenancy cost. Long story short, after making the computation, we found out that our infrastructure cost is negligible in comparison to the cost of goods sold.
Surely, we could have decreased our infrastructure cost and that would make vanity metrics look better. However, if the company trades a limb for a peanut, is it really worth it?
Again, going back to the original question, “Should the engineering team decrease the infrastructure cost?” The answer is a RESOUNDING NO! Because we could have otherwise spent that resource elsewhere, such as reducing the cost of goods sold!
EXHIBIT B: Should The Engineering Department stop collecting logs with no immediate use to save storage space?
The immediate answer is of course, YES! But you know where this is going, RIGHT?
I used to work in a company where we log a tremendous amount of data. We use this to speed up debugging production issues and understanding customer behaviors. What baffled me was that the means by which we collect data is vastly different from the literature available online. Online resources are always focused around essential logs whereas I tend to favor logging even those where we do not know what to use them for.
At the back of my head, something bothers me. IF I am logging too much and not using more than 99% of my logs, am I really doing the company a favor? I have of course doubted myself and thought that perhaps logging only the essentials does the trick. That way, there is a sufficient tradeoff between storage cost and operational efficacy.
Then a friend asked me this question: What about the invisible cost around decision time and quality?
This is a moment where scales fell off my eyes. Making decisions is costly. Making bad decisions is doubly costly. What’s worse is that companies don’t even know it’s incurring these costs!
If collecting more logs and adding a few hundred dollars in exchange could increase the quality of decisions by even 1%, then it’s likely worth it.² Course correction incurs a significant amount of staff cost. The process of coming up with a decision costs the time of the key decision makers which could have otherwise been invested elsewhere. Since staff cost constitutes the biggest amount of expenditure in a company, it is therefore essential to make sure this resource is invested in the highest leverage work. Automating time vampires is a important.
Should The Engineering Department stop collecting logs with no immediate use to save storage space? The answer is a NO! If there’s data that one can collect responsibly, then by all means collect it. When the need for it arises, it significantly reduces the decision cost.
Think Value Creation
In general, it is unwise to invest engineering resources on matters that has zero, or worse, negative impact to the company. In other words, a threshold must be set such that engineering investments must seek to answer the question:
“What’s the return on investment (ROI)?”
Remember, improving your crash free sessions INCURS staff cost. The engineer must always think about value creation. This of course can involve talking to people in different departments. But more than that, an important organizational virtue needs to be established: data availability.
Different companies have different values in data transparency. Some companies such as Buffer are more than happy to share everyone’s salaries. At Netflix, everyone allegedly knows everything about the company. There are of course, other companies where information is siloed. One can argue around the pros and cons for days. In Software Engineering however, it is no surprise that abundance of information are directly proportional to the department’s efficacy.
If the engineering team is not fed with business operations data, they will eventually be reduced into thinking about optimization. A question therefore must be asked,
“Could we have generated better value had there been more data transparency?”
The answer is almost always yes. We have seen examples in No Rules Rules and How Google Works that deserves little parroting.
____
¹ It must be noted, as you shall later see, that premature optimization comes at the peril of the business as well.
² Of course, this is an oversimplification.