I wish I knew this before Cloud/DevOps transformation (part 1)

Sergey Y.
8 min readJul 9, 2020

--

What has changed in SW programming since I have studied it

I made an experiment — tried to make a working code again after 15 years in management. The experience I’ve gained provoked me to write this article. I learned what had dramatically changed in SW programming and why everything still takes longer than planned. Even now when there are a lot of ready-made components, cloud services, and APIs around.

Some aspects of the software development business came to me from new angles. Here, I will also describe some of my thoughts from recent years that have been accumulated in managing products/projects mainly for the financial sector and telecom.

Good news

For those who want to repeat my experience, there is “good news”. First of all, it is a buzz to make a working code that I almost have forgotten 🤘. Secondly, learning of how to code has become much easier. No clue, how we could live without Stackoverflow. There are many more built-in libs and functions of languages, an IDE development environment is a pleasure, and video tutorials for very tight ones. But what is especially great, there is now a huge number of off-the-shelf solutions: ready modules, open-source components, cloud API infrastructures, containerized services, and so on. You can easily make a mash-up of these parts, solve a large portion of technical problems, and quickly switch to building a business feature. At least, it seems so…

So-so news

That’s is all about good news, and now a “rodeo” starts. All of the above facilitates the first steps and allows you to quickly create working prototypes or proof of concept. However, if the goal is to create a stable and maintainable in long run software, then very soon the euphoria from the wide possibilities of ready-made bricks will fade away and you will have to solve difficult dilemmas associated with the reverse side of the medal of using third-party components, services, and APIs.

I will list some of the main hurdles that have transformed in recent years:

  • The complexity of validation. Everything is much more complicated here compared to what it was 15 years ago. Modern web solutions can comprise a set of multi-threaded microservices that run in distributed environments over the globe (both in-premisses and in clouds). This is no longer 2–3 monolithic blocks (previously everything often came down to a client-server-database trio run in the remotely accessible server over ssh), but a group of autonomous services with its own set of states (contexts) and execution environments. Therefore, many situations (such as the well-known race conditions) is impossible (or difficult) to simulate on a local workstation. Often, a solution consists of dozens of micro-services and client components interacting in asynchronous mode. All this creates additional difficulties of interaction, which have to be tested automatically. Continuous integration testing is now a mandatory development element, the complexity of which is often underestimated by even very experienced developers. In any case, it is almost impossible to achieve full coverage of logic with integration tests, so end-to-end testing is often possible only with a full deployment of the solution (sometimes including both backends and mobile app part), so providing an automated daily build of the solution is necessary routine (not a luxury).
  • External dependencies. For almost any task you can find someone else’s solution (component, open-source, or web service API). To some extent, it always has been the case, but now it’s much easier to find the right one, to evaluate how “live” the support of the product and the community behind it. But any such “shortcut” increases the risk of problems of external dependence and the need to invest time in its support. Using someone else’s API or framework, for example, you will have to carefully monitor the updates of the corresponding product, keep up with the installation of updates and resolve conflicts caused by changes in this product. The delay threatens to increase the snowball of technical debts, which could lead to a development collapse. Also, as your solution develops, you may find yourself in a situation where changes in the external component are vital, and it’s good if we are talking about open-source, where you can contribute with your changes in the code and hope that they will be accepted (otherwise you will have to maintain your own branch of component that you may regret very much later).
  • Security. Some time ago it was acceptable to presume that applications executed within the corporate firewalls are trusted so they could interact freely or with some basic (static) authentication, and login/passwords is good enough for users authentication. Modern security expectations went much further towards Zero Trust (when components don’t trust each other). The devs need to deal now with OIDC/OAuth2.0 standards, role-based access to every microservice, multifactor and/or biometrical authentication of users, Risk-based security (step-up), RASP (Runtime application self-protection) and sometimes ML-based security measures to define protection elements for access to services.
  • Cloudification and DevOps. This is also a consequence of the dependency on the use of the “external” turnkey blocks, in this case, the server infrastructure. Such infrastructure elements usually include HW, operating system, application servers, network infrastructure, High Availability orchestration, “elasticity”, etc. The cost of such conveniences is two additional issues that have to be solved in addition: a) Automation of the build process, integration, and deployment — so-called CI / CD pipeline; b) DevOps team organization.

Both of these items can be considered DevOps issues, but they address two different aspects; the first purely technical is a new kind of tools that needs to be mastered and additional code that needs to be written; the second is the human aspect. The developers will have to be more integrators and admins than before. The testers, admins, configuration managers, and integrators should become somewhat developers as well. Without such a fusion, when trying to separate functions, the development/deployment and modification process will become incredibly long and painful. The worst thing is that the teams will constantly be pointing at each other, saying that we did our part.

From the observations listed above, I would like to state the following conclusion:

Conclusion 1. The “cost” of new features development decreased (it became easier and faster) in the piechart of the total costs of product development, and the portion of non-functional overhead efforts (including security, code maintenance, validation and CI/CD increased significantly).

What do I mean by code maintenance? Any line of code is a bug with a 3% probability (in my case). The time spent debugging the code has greatly increased compared to what it was 15 years ago. The code has become much more “live and evolving”. We need constant updates of obsolete components, APIs, frameworks. Now systems written in the 60s (often COBOL) are still working at some critical infrastructure facilities and in old banks. For modern development, such a period of life is unthinkable.

Another consequence of increasing external dependencies was an increase in unplanned work. It is very difficult to predict the amount of effort required to upgrade caused by external dependencies, as well as resolving bugs and problems caused by changes in external APIs.

Here is an example of statistics of the efforts breakdown ( a team of 12 people) collected over several months on a relatively complex SaaS product in the financial sector.

Work types split chart
A portion of features piechart
A portion of planned features

Of course, each team will have its own statistics and it very much depends on technology, subject matter, and team maturity level.

In the above example, 78% of overhead is caused by high security and performance requirements. Such a high portion of unplanned work (about 50%) may seem to be the result of the immaturity of the team and the inability to plan work well. However, if management is not ready for such a difficult reality, it often happens that “maturity” turns to be the ability to inflate estimations or mask the buffers in work breakdowns so that they cover unplanned events.

In this study, it was important for me to collect the data of the “hard truth”. Show the planned vs unplanned, functional vs non-functional tasks, and emphasize the general tendency.

The results explain why the simplicity and minimalism defeat redundancy and wastefulness. Steve Jobs said: Apple’s success lies not in the released products, but in projects that were decided to ignore. It is extremely difficult with such an abundance of “ready-to-use offers” to refrain from rampant “buying new things” (read the creation of optional features only because there is a convenient building block for development).

Conclusion 2. It is necessary to think less about “what can be done”, having a wide choice of possibilities, and try to ask yourself more the question “what to avoid doing”.

Always remember that the difference in costs between creating a “working prototype” (POC) and a minimum viable product (MVP) can be 10 times, depending on the established criteria for availability.

I noticed one more thing. It seems to me that at least 50% of the code that is being developed is never used. For various reasons, both technical and non-technical — the need has disappeared, there was an illusion that the need existed, a mistake in the statement of the problem, an overestimation of the possibilities, an underestimation of the difficulty, or “did it for the future case just not to be idle”. For the validation of such assumptions, the modern possibilities of creating “dirty” but working code are just super fine. Getting quick feedback is very sobering and allows you to make sure there is a market fit and whether you need to charge for productive development with subsequent code support. This also allows for a more conscious choice to make or buy. Therefore …

Conclusion 3. Shit Code First

Make fast working prototypes (POC) until you are convinced that the functionality is really needed in the form that is being developed. And be prepared for the fact that most of the “shit-code” will have to be redone completely.

In Part 2 of this article, I would like to switch your attention from technical to the interpersonal and psychological aspects of software programming.

Nota Bene: Few words just to describe the context of my coding experiment. I wrote my last code in 2005. I used C ++, Delphi, PHP, SQL, and a bit of ASP.NET at that time. Within a few weeks, I managed to saddle JavaScript (Node.JS), and in another 3 months on weekends and mornings created a working bot for Messenger and Telegram. I used the Google Cloud stack (Functions, Firestore, Dialogflow). The result is a web service API and a chat-bot as client UI that can find partners for group sports in the user’s city, organize RSVPs of participants, and fill in empty spots in teams. For the curious and who can read Russian, here are links to my bots in Messenger and Telegram.

--

--