Decoding the ‘(Old) New Technical Debt’: The Risks of Unofficial & AI-Generated IaC

Alexandre Soares
15 min readAug 6, 2023

--

The proliferation of publicly available and AI-generated code has transformed the landscape of software development and infrastructure management using Infrastructure as Code. These resources present a world of possibilities, giving teams access to a vast range of pre-built solutions and the ability to generate code quickly.

In an environment where time-to-market is crucial, this provides an accelerated pathway to meet project goals and comply with deadlines. Furthermore, it offers a buffer against the ongoing talent crisis in tech, as even teams with limited expertise can leverage these resources to build complex systems. However, while these advancements paint a promising picture, they also introduce a new dimension of complexity and potential risk, opening the door to the “New Technical Debt.”

Incorporating third-party Infrastructure as Code (IaC), either because we found an available Terraform Module or an Ansible Galaxy Role or because AI suggested it without fully comprehending its inner workings, is akin to taking potent medication based solely on a friend’s advice without consulting a healthcare professional.

While the medication or the provided/found code might initially address specific symptoms or requirements, the need for understanding how they function can lead to unforeseen and potentially harmful consequences. This can manifest as security vulnerabilities, performance issues, or difficulties in troubleshooting within the infrastructure, similar to adverse reactions to a medication. Hence, just as it’s crucial to understand a drug before consuming it, it’s vital to thoroughly understand any third-party IaC code before its implementation to ensure your critical infrastructure’s well-being.

In this (long) article, I attempt to elaborate on this topic, giving my perspective on :

  • What is Technical Debt, from a Software Development and Infrastructure Management perspective
  • Briefly highlight how Infrastructure as Code attempts to address Technical Debt
  • The Six Most Common Pitfalls to Avoid
  • How to prevent these and generic rule of thumb guidelines

Defining Technical Debt

Let’s put this in perspective and attempt to bring up a shared understanding of technical debt. We can look at it from two perspectives: software development and infrastructure management.

Software Development Technical Debt

Technical debt in software development refers to the implied cost of additional rework caused by choosing the quick and easy solution instead of using a better approach that would take longer. This metaphor, coined by Ward Cunningham, likens the trade-offs in software development to financial debt.

For instance, when developers opt for a quick fix (like hardcoding a value or skipping a code review) to meet a deadline or a short-term goal, they introduce technical debt into the system.

While this allows for immediate progress, it leaves behind more complex code to understand, maintain, or build upon, increasing the likelihood of errors and bugs.

Just like financial debt, if technical debt is not “repaid” promptly by refactoring the code or improving documentation, it can accumulate ‘interest’ — leading to even higher costs in terms of time, effort, and system reliability.

While I am focusing on infrastructure management in this article, realising that we are moving towards the software development discipline arena in these days of Cloud and everything as code.

Infrastructure Management Technical Debt

In infrastructure management, technical debt refers to the additional cost and complexity resulting from infrastructure design decisions or practices that may expedite short-term goals but compromise the system’s long-term integrity, flexibility, or maintainability. If left unchecked, this debt can accrue over time, making future infrastructure changes and management increasingly complex, thus requiring a significant investment of time and resources.

It is common to use the term “snowflake” to describe a server or system that is unique in its configuration and cannot be reproduced automatically or easily.

If an organisation has several snowflake servers, these unique configurations contribute to the technical debt. Managing, updating, and troubleshooting these servers require specialised knowledge and manual effort. The inconsistencies across servers can lead to unexpected behaviours, complicate disaster recovery, and slow down the software delivery process.

Mitigating the accumulation of technical debt involves addressing these snowflakes and adopting practices that promote automated, repeatable, and consistent configuration and deployment, such as IaC and GitOps.

How Infrastructure as Code Mitigates Technical Debt

Infrastructure as Code (IaC) significantly mitigates technical debt in infrastructure management by enabling automation, consistency, repeatability, version control, and promoting clarity through declarative languages. The main characteristics of IaC are:

  1. Automation: IaC automates the infrastructure setup process, reducing the need for manual configuration. This reduces the risk of human error and the time-consuming nature of manual tasks, thus alleviating some aspects of technical debt.
  2. Consistency: IaC ensures that all environments are set up consistently, avoiding the issues associated with “snowflake” servers. This uniformity reduces the complexities and potential errors in managing unique configurations.
  3. Repeatability: IaC allows infrastructure to be set up and torn down repeatedly easily. This repeatability reduces the resources and time spent managing infrastructure, reducing technical debt.
  4. Version Control: IaC “scripts” are stored in a version control system (like Git), allowing for easy tracking of changes, rollbacks if necessary, and clear visibility of the infrastructure’s evolution. This transparency and control over the infrastructure’s history directly helps manage and minimise technical debt.
  5. Documentation: IaC acts as a form of live documentation. Since the infrastructure’s state is codified, it gives teams an accurate and up-to-date representation of the infrastructure, eliminating the discrepancies and confusion that can lead to increased technical debt.
  6. Declarative Languages: Many IaC tools use declarative languages, specifying ‘what’ the infrastructure should look like rather than ‘how’ to achieve it. This improves the clarity of the code, making it easier to understand and maintain. It significantly reduces the chances of accumulating technical debt through misunderstood or ‘smart’ code. Declarative languages also simplify the process of making changes to the infrastructure, reducing the potential for errors and thus further mitigating technical debt.
  7. Testing and Validation: IaC allows testing infrastructure changes in isolated environments before deploying to production. This practice leads to the early detection of issues, thus preventing debt accumulation due to undetected errors or problems.

Pitfalls to Avoid: Practices That Foster Technical Debt

Let’s navigate through a series of practices that, while seemingly beneficial in the immediate term, can stealthily contribute to mounting technical debt. Being aware of these scenarios can equip us to sidestep them.

Pitfall 1 (Old): Using a programmatic approach to IaC

Using a programmatic approach in your Infrastructure as Code (IaC) language can contribute to technical debt due to its inherent complexity and lack of transparency compared to declarative approaches.

A programmatic approach specifies ‘how’ to achieve a desired state, including the sequence of operations required. This introduces procedural complexity into the IaC scripts, making them harder to read, understand, and maintain. As a result, teams might spend more time troubleshooting and updating these scripts, leading to increased technical debt.

In contrast, a declarative approach specifies ‘what’ the final desired state should be without detailing its steps. This makes the code easier to understand because you only need to comprehend the intended outcome, not the entire sequence of operations. It promotes transparency, simplicity, and maintainability — all crucial factors in mitigating technical debt.

When comparing programmatic versus declarative approaches in Infrastructure as Code (IaC), it’s evident that the required skill sets and maturity levels for team members differ considerably.

Programmatic Approach:

  1. Scripting Skills: A programmatic approach relies heavily on writing procedural code, which requires a strong background in scripting languages. Team members must have a good command of the chosen language’s syntax, constructs, and best practices.
  2. Understanding of Procedures: Since a programmatic approach details ‘how’ a system should reach its desired state, team members need a deep understanding of the processes and procedures involved. This includes knowledge about the order of operations, dependencies, error handling, and more.
  3. Debugging Skills: As programmatic scripts can be complex and lengthy, the ability to debug and troubleshoot effectively is essential. Team members must be skilled at identifying and resolving issues within the code.

Declarative Approach:

  1. System Design Skills: With a declarative approach, the focus shifts from ‘how’ to ‘what’. This requires a good understanding of system design and the ability to define the system’s desired state clearly. NOTE: Sounds simple, but in fact, remember to underestimate your maturity level on technical design, the frequency that I have found high-level architectures where critical decisions are made at the implementation stage only is, unfortunately, still too high!
  2. Abstraction Capability: Declarative IaC demands the ability to think about abstract outcomes rather than concrete steps. This is a skill that might need to be nurtured over time.
  3. Adaptability: Since declarative IaC tools often handle the procedural aspects automatically, team members must trust the tooling and adapt to a ‘hands-off’ approach regarding the implementation specifics.

Regarding team maturity, a programmatic approach often demands more technical expertise and procedural knowledge. In contrast, a declarative approach leans more towards design skills and strategic thinking, which might be more prevalent in a team with broader system-level experience and maturity. Both methods require ongoing learning and adaptability as tools and best practices evolve.

Pitfall 2 (Old): Watermelon Declarative

The “Watermelon Declarative Effect” refers to the phenomenon where code in a declarative language like Terraform or Ansible is essentially written to behave like a programmatic language. This is often due to introducing control structures like loops and conditional statements, which distort these languages’ inherent ‘declarative’ nature.

This effect contributes to technical debt in several ways:

  1. Complexity: It increases the complexity of the codebase, making it harder to understand and maintain. Declarative languages simplify infrastructure management by focusing on the ‘what’ rather than the ‘how’. Distorting this principle brings back the complexity that declarative languages aim to eliminate.
  2. Predictability: It reduces the predictability of the infrastructure state. Declarative languages are designed to ensure a predictable outcome. Introducing programmatic elements can lead to different results under different conditions, making providing a consistent infrastructure state harder.
  3. Maintainability: It can make the code harder to maintain and evolve. The ‘programmatic’ parts of the codebase might require a higher level of expertise to modify safely, increasing the risk of errors and the cost of changes.

By programmatically using declarative language, teams effectively create a ‘watermelon’ — it’s green (declarative) on the outside. Still, looking inside, it’s red (programmatic), adding to the technical debt pile.

Pitfall 3 (New): Naif’s usage of Borrowed Code

One common pitfall in managing infrastructure as code (IaC) is incorporating publicly available modules or roles (like Terraform modules or Ansible roles) into one’s codebase without thoroughly understanding their functionality or potential impact.

Publicly available modules and roles can substantially accelerate the development of IaC by reusing pre-built functionality. However, their adoption can become problematic for several reasons:

  1. Limited Understanding: The user may need a deeper understanding of the module or role. This can lead to unexpected behaviour or hidden side effects, potentially impacting the stability and security of the entire infrastructure.
  2. Dependency on Unsupported Code: These publicly available modules are often provided without any guarantee of maintenance or support. Relying on such modules can expose your infrastructure to risks as the IaC tooling or the target infrastructure services evolve.
  3. Lack of Customizability: These modules are often built for general use cases and may need to align with the specific requirements of your project. Making necessary customisations might be complex and require a deeper understanding of the module than initially assumed.
  4. Hidden Technical Debt: Over time, as your infrastructure needs grow and evolve, maintaining these modules could be challenging, especially if the modules are complex or poorly documented. This, in turn, increases technical debt.

In essence, while using publicly available modules or roles can speed up initial development, doing so without a complete understanding of their workings can lead to various issues in the long run, contributing to the accrual of technical debt.

Pitfall 4 (New): Double-Edged Sword AI-Generated Code

Artificial Intelligence has made significant strides in various fields, including software development and infrastructure management. AI-generated code, while promising increased productivity and reduced time-to-market, can be a double-edged sword, presenting several potential pitfalls.

  1. Limited Understanding: Similar to publicly available modules, developers using AI-generated code might need a comprehensive understanding of the produced code. This could lead to unexpected consequences, especially when the code is complex or involves critical infrastructure.
  2. Unpredictability: while increasingly powerful, AI algorithms can still generate code that behaves unexpectedly under certain conditions. Since the logic behind AI decision-making can be opaque, debugging and troubleshooting issues in AI-generated code can be challenging.
  3. Reliance on AI: Over-reliance on AI for generating code can lead to a decline in the team’s coding skills over time. This could increase technical debt, especially when modifications or debugging are needed and AI tools cannot provide a satisfactory solution.
  4. Alignment with Best Practices: AI-generated code might only sometimes align with industry best practices or specific organisational coding guidelines. This could lead to a lower-quality codebase, harder maintainability, and increased technical debt.

AI-generated code, while an exciting capability, should be approached with a balanced view. Teams need to understand its limitations and potential for unforeseen consequences, ensuring that they do not inadvertently contribute to the accumulation of technical debt.

Pitfall 5 (Old): Ignoring the Lifecycle

A misconception that often lurks in adopting Infrastructure as Code (IaC) is treating it as a ‘fire and forget’ solution. While IaC provides a more efficient and consistent way of provisioning infrastructure, it’s essential to understand that the real added value is in the ability to manage day 2.

Provisioning-Only (Fire and Forget):

When IaC is used merely for provisioning and left unattended, it’s akin to incurring a form of ‘interest’ on your technical debt. Any modifications made to the infrastructure outside of the IaC context, whether for fixing issues or adjusting configurations, are not reflected in the IaC codebase. This discrepancy creates an ever-widening gap between the code and the actual infrastructure, which turns into a large, untracked technical debt over time. The debt becomes apparent when there’s a need to replicate the infrastructure or when a failure requires a rebuild. The IaC codebase will not accurately represent the functioning infrastructure, leading to unforeseen issues and longer recovery times.

Provisioning and Day-to-Day Management (GitOps example):

In the GitOps model, the entire infrastructure lifecycle is managed via IaC, which can significantly reduce technical debt, provided it’s done right.

Any changes to the infrastructure are made via updates to externalised configurations used by the code(both versioned and maintained in a git repository). Any changes committed to the git repository will trigger a pipeline/code, which automatically applies them. This approach promotes code maintenance (by separating logic from configuration). It helps keep the actual state of the infrastructure aligned with the defined state in principle (being the git repo as the master source of truth of the infrastructure’s desired shape).

However, it also introduces new complexities, such as managing the automated application of changes and ensuring that rollbacks can be performed safely when needed. These challenges require a disciplined approach to code management and a good understanding of the GitOps processes and tooling to prevent technical debt from accruing.

Pitfall 6 (Old): Overlooking the Need for Ongoing Code Support and Maintenance

In Infrastructure as Code (IaC), it’s easy to overlook the need for ongoing support and maintenance. Like any other piece of software, IaC code is not immune to the effects of time, and neglecting its lifecycle can lead to the accumulation of technical debt.

  1. Evolution of Infrastructure: Infrastructure needs to change over time due to business demands, technology advancements, and regulatory requirements. The IaC code must be updated to reflect these changes. Please do so to avoid misalignments between the infrastructure’s desired state (as defined in the code) and its actual state, causing technical issues and inefficiencies.
  2. Changes in IaC Tools and Cloud Services: IaC tools and the cloud services they manage are regularly updated with new features, improved designs, and sometimes even breaking changes. These updates can make existing IaC code obsolete or less optimal. Regular maintenance of the IaC codebase is required to keep up with these changes, failing which can lead to technical debt.
  3. Code Quality: Over time, as more and more changes are made to the IaC codebase, code quality is likely deteriorating. This could be due to a lack of proper documentation, inconsistent coding practices, or the absence of version control. Such practices make the code harder to understand and maintain and increase the likelihood of errors, contributing to technical debt.
  4. Security and Compliance: As cyber threats evolve and compliance regulations change, IaC code that was once deemed secure might no longer be (this may cause by flaws in the code or in the architecture that the code implements). Regular audits and updates are required to ensure that the infrastructure remains safe and compliant, and this is another form of ongoing support and maintenance that should be noticed.

Treating IaC as a one-off rather than an ongoing commitment can lead to significant technical debt. Incorporating regular IaC code reviews, updates, and maintenance into your IT operations is essential to ensure alignment with the changing landscape of infrastructure needs, tooling, and best practices.

Managing the “New Technical Debt”

It is crucial to emphasise that when managing the “New Technical Debt,” best practices, knowledge sharing, and thorough understanding are essential.

Utilising AI-generated code can enhance efficiency, but developers must maintain a solid comprehension of the code they deploy. This involves understanding the AI system’s behaviour, the impact of changing input parameters, and the potential implications on infrastructure.

The same applies to the usage of borrowed code from publicly available sources.

In this context, automated testing and rigorous code reviews become even more critical, especially for mission-critical systems. Developers must approach AI-generated code with the same thoroughness and rigour as manually written code. Shortcuts must be avoided, and immediate needs must be evaluated when facing aggressive deadlines.

When considering hiring development services, it is crucial to remember the future need to maintain the code. Vendor-supported principles typically only cover specific requirements, so any change or adaptation to the code may result in being “out of support.”

Training and educational initiatives must be in place to help developers better understand the impacts of utilising 3rd party code, AI tools, and their output. By deepening their knowledge, they can reap the benefits of these tools without accruing new technical debt.

It is important to remember that no AI can’t appropriately replace a knowledgeable developer or engineer. While AI may increase productivity, it will not replace them. Having only a junior or someone less experienced with AI support comes with a considerable associated risk level.

Best Practices Checklist

To maintain a clear view of your maturity and risks, it is vital to consider the following checklist:

· Embrace the Declarative Approach: Invest in training your teams to think abstractly and adapt to the ‘hands-off’ approach of declarative IaC. Foster an environment that nurtures these skills.

· Avoid Watermelon Declaratives: Ensure your declarative code truly encapsulates the ‘what’, not the ‘how’. Regular code reviews can help identify and avoid such inconsistencies.

· Audit Borrowed Code: Thoroughly audit any publicly available modules or roles before integrating them into your IaC codebase. Understand their functionality, possible impact, and potential security risks.

· Innovative Use of AI-Generated Code: Be cautious while adopting AI-generated code. Enhancing productivity and ensuring it aligns with your standards and best practices is beneficial. Always double-check AI-generated code for quality and understand its functionality.

· Implement Lifecycle Management: Adopt models like GitOps to manage the entire lifecycle of your infrastructure via IaC. This practice ensures you get the most value from your IaC and reduces technical debt.

· Maintain and Update IaC Code Regularly: Include IaC maintenance in your regular IT operations schedule. Update the code to reflect changes in infrastructure needs, IaC tools, cloud services, and best practices. Keep abreast of evolving industry standards and incorporate them into your operations.

· Documentation: Keeping thorough and updated documentation is an often overlooked but critical aspect of managing technical debt. It enables better understanding and troubleshooting and eases the onboarding process for new team members.

· Automate Testing and Integration: Implement automated testing for your IaC code. It will help catch potential issues early and prevent them from adding to technical debt. Use Continuous Integration/Continuous Deployment (CI/CD) pipelines for efficient testing and deployment.

· Design for Scalability and Flexibility: Keep your infrastructure design scalable and flexible to accommodate future changes easily. It will help minimise the effort to adapt to changes and thus reduce technical debt.

· Incorporate Regular Code Reviews: Regular code reviews ensure that your IaC adheres to best practices and avoids unnecessary complexity. They also provide an opportunity to share knowledge and improve team skills.

In conclusion, AI-generated and publicly available code presents a promising future for software development in general and specifically in the IaC world, but it also comes with challenges. By acknowledging and planning for the new (and old) technical debt, we can harness the power of AI/code reuse in software development while mitigating the risks.

That is all for now; I am planning deep diving in each of the above best practices in future articles!

Please note that the opinions and views expressed in this article are solely my own and do not represent my employer’s official position or policies. This is a personal commentary based on my experiences and thoughts, and although I aim for accuracy, there may be errors or omissions in the content.

--

--

Alexandre Soares

30+ years in IT, serving various roles in global & consulting firms. Now an Enterprise Architect specializing in automation, IaC & cloud technologies.