Understanding “ilities”: Key Quality Needs in Platform Engineering

Vijay Chintha
Platform Engineering Unleashed
10 min readAug 7, 2023

--

These “ilities” are the basic foundations of platform-engineered systems, ensuring that platforms are not only functional but also robust, reliable, and efficient.

Background :

It became my routine to start writing articles by giving background from my past. I noticed it when I looked back at all articles I wrote recently.

Having been a performance and platform engineering throughout my entire career, I’ve seen applications measured against various standards for their quality: reliability, stability, availability, and operability, to name a few.

Although I’ve been familiar with these terms for over 20 years, I never noticed that most quality measurement terms end with “ilities,” with the industry referring to them as “ilities.” In 2018, I heard this term for the first time when my mentor, a Distinguished Engineer, mentioned it during one of our internal talks.

I have listed 61 different “ilities” in another article on Medium which I gathered from Wikipedia. Here, I can reference them interchangeably when discussing platforms, performance, applications, databases, etc.

This Article has four sections:

· Introduction

· Common Questions that might arise for platform engineering teams

· Platforms Quality metrics explained in detail

. Conclusion

Introduction :

The technological landscape is in a state of rapid evolution. Platforms play an essential role in this dynamic ecosystem, so the demand for high-quality platform engineering is crucial.

In this context, the term “ilities” refers to the fundamental qualities that a platform should exemplify. These “ilities” are the basic foundations of platform-engineered systems, ensuring that platforms are not only functional but also robust, reliable, and efficient. Here are 21 “ilities” that are critical for platform engineering :

Adaptability,affordability,auditability,availability,compatibility,configurability,debuggability,deployability,durability,manageability,operability,portability,predictability,recoverability,reliability,reusability ,scalability,stability,testability,traceability, and upgradability.

I call them “Platform’s Circle of Standards for its Quality.”

Platform’s Circle of Standards for its Quality — By Vijay Chntha

Let's dig deeper into each standard and reference it with suitable examples for better understanding. Thanks to the modern era of technology, I had to use various industry-specific examples and AI to help develop them. It made things so much easy.

Common Questions:

Common Questions that might arise for platform engineering teams while reading this article

Why did I only choose 21 out of a total of 61 ilities that are widely used in the industry for measuring quality?

The simple answer to the above question is that I picked the essential metrics within my experience and understanding of various platforms, products, and applications I worked on.

If your platforms require more metrics to measure its quality, you can add it to your platform's circle of standards, and it's yours now.

Why should I measure the quality of the platforms we built for internal use within the org?

Measuring the quality of platforms built for internal use within an organization is just as crucial as those built for external customers. Here are compelling reasons why internal platforms should undergo quality measurement:

· Operational Efficiency: Internal platforms usually support various organizational processes. A high-quality platform ensures seamless operations, reducing bottlenecks, minimizing downtimes, and ultimately enhancing productivity.

· Cost Savings: Identifying and fixing issues early can prevent costly fixes down the road. This is especially true for platforms that may integrate with other critical systems. A minor glitch can lead to extensive downtime or other operational disruptions.

· Data Integrity: Many internal platforms handle crucial business data. Ensuring the quality of these platforms is essential to maintain data accuracy, consistency, and security. Any compromise in data integrity can have severe ramifications for decision-making and compliance.

· Scalability: As the organization grows and evolves, its internal platforms should be able to accommodate increased loads, additional features, or new integrations. Quality measurements can assess and ensure the scalability of these platforms.

· Security: Just because a platform is used internally doesn’t mean it’s immune to security threats. Ensuring its quality includes safeguarding against potential internal vulnerabilities and unauthorized access.

· Standardization: Measuring and ensuring quality helps standardize platforms across the organization, making training, support, and maintenance more streamlined.

· Improved Decision Making: Quality metrics can provide insights into how platforms are performing, guiding improvements and helping leaders make informed decisions regarding IT investments.

· Support & Maintenance: A high-quality platform typically requires less frequent and less intensive support, reducing the burden on IT teams and allowing them to focus on other critical areas.

· Long-term Vision & Alignment: Internal platforms should align with the organization’s long-term vision and strategic goals. Quality assurance ensures that the platform remains relevant, adaptable, and in sync with the organization’s evolving needs.

· Compliance & Governance: Depending on the industry, there might be regulatory requirements related to data handling, privacy, or other operational aspects. Even internal platforms need to adhere to these standards to prevent compliance breaches.

· Change Management: Organizations often undergo change, whether it’s due to mergers, restructurings, or shifts in strategy. High-quality internal platforms can better adapt to these changes, ensuring continuity and stability.

In essence, the quality of internal platforms directly impacts the efficacy and smoothness of an organization’s operations. It’s an investment in the organization’s infrastructure that can yield dividends in the form of efficiency, employee satisfaction, and long-term adaptability.

Platforms Quality metrics explained in detail:

Platforms Quality metrics — Vijay Chintha

Without further due, I would like to dive deep into the list above.

  1. Adaptability

Definition: The ability of a system or platform to adjust itself efficiently and quickly to changes in its environment or requirements without requiring extensive modifications.

Example: Consider a cloud-based platform engineered to support e-commerce websites. If the platform is adaptable, it can adjust to new payment gateways, integrate emerging technologies like AR (augmented reality) for product previews, or handle varying traffic loads during sales or holidays. Such adaptability ensures that as the market changes and new requirements arise, the platform can accommodate without requiring a complete overhaul.

2. Affordability

Definition: The degree to which a system or platform is cost-effective, not just in its initial implementation but also throughout its lifecycle, offering good value for its cost.

Example: A startup company is looking to deploy its application but has limited financial resources. A cloud platform that offers a “pay-as-you-go” pricing model, rather than a substantial upfront cost, makes it affordable for the startup. Over time, as the startup grows, it can then opt for more dedicated resources or better service tiers. The platform’s affordable entry point encourages adoption and allows businesses to scale their costs as they grow.

3. Auditability

Definition: The capability of a system or platform to provide transparent, clear, and traceable records of activities, especially those that have regulatory, security, or compliance implications.

Example: A healthcare company uses a platform to store patient data. Regulations require that all access to patient data be tracked and logged. An auditable platform will provide detailed logs about who accessed the data, when, from which IP address, and what operations they performed. Such audit trails are crucial for compliance, understanding security incidents, and maintaining trust with customers and regulators.

4. Availability

Definition: The degree to which a system or platform is operational and accessible when required, ensuring minimal downtime and maximizing uptime.

Example: Consider a global streaming service like Netflix. Customers expect content to be available 24/7 without disruptions. To ensure high availability, the underlying platform might be engineered to use multiple data centers across the world, with automatic failover. If one data center faces an issue (e.g., power outage, natural disaster), the platform can reroute user requests to another operational data center, ensuring uninterrupted service.

5. Compatibility

Definition: The ability of a system or platform to work harmoniously with other systems, devices, or components without conflict or the need for significant modifications.

Example: An IoT (Internet of Things) platform is designed to support various devices: smart fridges, thermostats, wearable devices, and more. Each of these devices might run on different operating systems, use different communication protocols, or have varied data formats. A well-engineered platform ensures compatibility with a wide range of devices, allowing manufacturers and developers to easily integrate their products without significant rework. This universal compatibility makes the platform more appealing to a broad audience.

6. Configurability

Definition: The ability of a system or platform to be tailored to different scenarios or requirements by adjusting settings or parameters without altering the core codebase.

Example: An e-commerce platform might allow merchants to configure tax settings based on their location, customize the appearance of their storefront, or set shipping rules. Merchants can adjust these settings to suit their unique needs without having to modify the platform’s core code.

7. Debuggability

Definition: The capability of a system or platform to be easily diagnosed for errors or issues, enabling faster and more efficient troubleshooting.

Example: A cloud-based application platform might provide detailed error logs, real-time monitoring dashboards, and tools to recreate user sessions. These features help developers identify the root cause of issues quickly, without having to sift through mountains of data or guess at potential problems.

8. Deployability

Definition: The ease with which a system, application, or platform can be transitioned from a development environment to a production environment, considering factors like automation, consistency, and risk.

Example: Consider a microservices architecture where each service is containerized using a tool like Docker. With container orchestration tools such as Kubernetes, new versions of services can be rolled out smoothly, tested in isolation, and scaled as needed. This architecture and toolset ensure high deployability.

9. Durability

Definition: The ability of a system or platform to withstand wear, pressure, or damage, ensuring longevity and reliable performance over time.

Example: A data storage platform might utilize techniques like data replication across multiple geographic locations, checksum validation for data integrity, and regular backups. Such measures ensure the platform retains and protects data over the long term, even in the face of hardware failures or other unforeseen issues.

10. Manageability

Definition: The capability of a system or platform to be efficiently administered, monitored, and maintained, often through user-friendly interfaces or automation tools.

Example: An IT infrastructure management platform might offer dashboards that display real-time system health metrics, automated alerts for potential issues, and tools for batch software updates or configurations. This allows IT teams to oversee vast infrastructures with efficiency and proactivity, rather than being reactive to issues.

11. Operability

Definition: The ability of a system or platform to be easily and efficiently operated within its expected environment and conditions, with minimal intervention.

Example: An online content management system (CMS) might have a user-friendly interface, clear documentation, and built-in monitoring tools. These features allow content creators and website administrators to smoothly operate the CMS, publish content, monitor website health, and troubleshoot minor issues without deep technical expertise.

12. Portability

Definition: The capability of a system or platform to be transferred or adapted easily from one environment to another without significant changes.

Example: A software application written in a cross-platform development framework like React Native can be deployed to multiple operating systems (e.g., iOS, Android) with minimal changes. This portability allows developers to reach a broader audience without creating separate versions for each operating system.

13. Predictability

Definition: The degree to which a system or platform behaves consistently under given conditions, producing expected outcomes without surprises.

Example: A cloud-based storage service might guarantee a specific latency and throughput for file access. Clients of this service can then design their applications knowing these specifications will be met, ensuring predictable performance for end-users.

14. Recoverability

Definition: The ability of a system or platform to return to a state of normalcy after a disruption or failure, often through backup, restoration, or failover processes.

Example: A database system might have automated backups taken every hour and stored in multiple geographical locations. If a data center faces an issue (e.g., a power outage or hardware failure), the database can be quickly restored from the most recent backup in another location, ensuring minimal data loss and downtime.

15. Reliability

Definition: The capability of a system or platform to perform its required functions accurately and consistently over time, even under adverse conditions.

Example: A global e-commerce platform needs to handle thousands of transactions per minute. The platform uses redundant server architectures, real-time data replication, and advanced error-handling mechanisms. As a result, even if individual components fail, the platform as a whole remains operational, ensuring customers can make purchases without interruptions.

16. Reusability

Definition: The capability of a system, component, or platform to be used in multiple scenarios or applications without significant modification.

Example: A software library developed for image processing can be integrated into various applications, such as photo editing software, augmented reality apps, and camera apps. The reusable nature of the library saves development time and ensures consistency across different products.

17. Scalability

Definition: The ability of a system or platform to handle growing amounts of work or to be enlarged to accommodate that growth.

Example: A social media platform initially designed for a few thousand users utilizes cloud-based infrastructures and distributed databases. As its user base grows into the millions, it can dynamically allocate more resources and distribute traffic to handle the increase, demonstrating scalability.

18. Stability

Definition: The resilience of a system or platform to remain functional and unaltered in its intended purpose despite disturbances or disruptions.

Example: An online banking platform, even during peak transaction times or external cyber-attacks, maintains its performance, ensures data integrity, and doesn’t crash, showcasing its stability.

19. Testability

Definition: The ease with which a system or platform can be tested to ensure it works as intended and to detect defects or issues.

Example: A software application is built using a modular architecture, and each module comes with a set of automated unit tests. This structure allows developers to verify the functionality of individual parts and the system as a whole easily, ensuring high testability.

20. Traceability

Definition: The ability to chronologically interrelate uniquely identifiable stages, entities, or events in a process or system, allowing for transparency and historical assessment.

Example: In a software development project, every code commit is linked to specific requirements or bug tickets. This linkage ensures that any piece of code can be traced back to its origin or purpose, aiding in understanding changes, making future modifications, or investigating issues.

21. Upgradability

Definition: The ease with which a system or platform can be modified to enhance capabilities or to include new features.

Example: A smart home device has firmware that can be remotely updated by the manufacturer. As new features are developed or bugs are identified, the manufacturer pushes firmware updates, ensuring that users always have access to the latest capabilities without needing to purchase new hardware.

Conclusion :

I referenced numerous terms from the examples set by distinguished engineers, visionaries, and thought leaders in the industry. They laid the groundwork for the next generation to fully utilize these concepts. If any specific term conflicts with a trademark, I am more than willing to replace it with an alternative.

Special thanks to Wikipedia and my mentor, a silent rockstar who prefers to remain anonymous.

--

--

Vijay Chintha
Platform Engineering Unleashed

Platforms, Performance, Scalability, SRE, DevOps,Kubernetes, AWS, Big-Data are my strengths. Technical Management and Strategic Leadership where I am heading to