From Systems Administrator to Platform Engineer — Roles Beyond Titles
Gone are the days when IT roles were compartmentalised into a few distinct categories. Today, as technology continues to expand and scale at an unprecedented rate, the spectrum of IT roles has broadened immensely. A Google search reveals over 250 distinct IT roles, each with unique responsibilities and expertise (see this example), as illustrated in the below word cloud image.
If we narrow to the IT Operations domain, and aside from cloud and technology-specific, I would like to focus on four key roles: System Administrators, Operators, Platform Engineers (including Cloud) and Site Reliability Engineers. Why these?
These roles are foundational to modern IT infrastructure. They directly interact with, maintain, and ensure the reliability of the systems upon which all applications run. By understanding these roles, we get a deep insight into the backbone of any technology-driven organisation: System Admins and Operators often represent the traditional IT approach, while Platform Engineers and SREs are more contemporary roles emerging with the DevOps culture. Examining these roles can effectively bridge the gap between legacy systems and new, agile methodologies.
Platform Engineers and SREs (Site Reliability Engineers) are roles born from the DevOps movement. They embody the principles of infrastructure as code, automation, continuous integration, and delivery, making them essential for a DevOps-centric discussion. SREs, in particular, focus on the reliability, uptime, and overall health of applications and services, aligning closely with the goals of DevOps to ensure high availability and quick recovery from failures.
Covering these roles offers a cohesive view of the IT operational landscape from a traditional and modern perspective, allowing for a comprehensive understanding of how different practices, tools, and philosophies have evolved and integrated over time.
This article focuses on clarifying a misconception that I usually see (Systems Administrator and Systems Engineer) and starting from there to the definition of Platform Engineer (also different from Systems Engineer). While at first sight, this may look like taxonomy or theorisation around titles, it is vital to understand what each role exists for, what each role is aiming to address, and under which context. The most frequent situation I see is the usage of new names/designations without realising/understanding their impact and, more importantly, what is required for these to succeed.
I had a manager who often used a fitting analogy to drive home this point: just because someone wears a Cristiano Ronaldo jersey and steps onto the field doesn’t mean they’ll play like him.
With all being as code and cloud-based, do I still need an Infrastructure Team? Why not have the Development Team running it?
Imagine a hospital where surgeons perform operations and take on the roles of anesthesiologists, nurses, and administrative staff. A surgeon might be exceptional at performing a procedure, but if they’re also responsible for administering anaesthesia, preparing surgical instruments, and managing the patient’s paperwork, there’s an increased chance of errors. A misjudged dosage or a poorly sterilised instrument can have dire consequences. Similarly, developers excel at writing and updating code. But when they’re also tasked with managing cloud infrastructure without the proper expertise, inadvertent mistakes can compromise the entire system’s stability and security, putting the digital health of a company at risk. Just as in medicine, specialised roles exist in IT for a reason: to ensure precision, safety, and efficiency.
Consider the following table describing a standard set of knowledge areas and how they are covered by infrastructure and development teams.
While there are overlaps in the distinct areas, you can see that the focus, objective, scope of work and deepness of knowledge vary, in some areas considerably.
To have technology addressing specific requirements/needs, you need to have these on your radar, therefore, several potential issues can arise when development teams lead infrastructure management without appropriate checks and balances. Here’s a list of common flaws or problems associated with this approach (some aren’t exclusively from Dev running Infra only, but they are more common in these situations):
- Lack of Operational Best Practices: Developers might prioritise features or speed over operational robustness, potentially leading to systems that work well in development or test environments but falter under real-world loads or scenarios.
- Inadequate Security: Developers may not always be trained in the nuances of infrastructure security, which can result in vulnerabilities being unintentionally introduced or overlooked.
- Scalability Issues: Development-led infrastructure might not anticipate growth or traffic spikes, leading to performance issues when scaling becomes necessary.
- Limited Monitoring and Logging: The focus might be on getting the application to work rather than ensuring it’s adequately monitored or logged, making troubleshooting more challenging.
- Lack of Redundancy: Without a proper infrastructure focus, insufficient redundancy might lead to potential single points of failure.
- Inefficient Resource Utilization: Developers might overprovision (leading to cost inefficiencies) or underprovision resources (leading to performance issues).
- Configuration Drift: Without standardised tools or practices for infrastructure as code, manual changes can lead to configuration drift, where infrastructure states diverge over time.
- Incomplete Documentation: Development teams might prioritise code documentation over infrastructure documentation, making onboarding or troubleshooting more challenging for new team members or outside consultants.
- Neglect of Backup and Disaster Recovery: Backup and disaster recovery might be an afterthought, putting data and system availability at risk.
- Short-term Focus: Infrastructure might be designed with short-term goals rather than long-term sustainability or flexibility.
- Vendor Lock-in: Developers might choose specific services or platforms based on familiarity rather than a broad evaluation, potentially leading to vendor lock-in.
- Inconsistent Environments: Without standardised provisioning and configuration, development, staging, and production environments might differ, leading to unexpected behaviours when deploying.
- Lack of Separation of Duties: Without clear role definitions, responsibilities are omitted, leading to potential oversights.
While developers can/should undoubtedly be involved in and even lead infrastructure management, especially in DevOps cultures, it’s essential to be aware of these potential pitfalls and address them proactively. Ensuring collaboration between development and dedicated operations or infrastructure professionals can help mitigate these risks.
I have a great System Administrator what does it differ from System Engineering, can’t he become one?
System administration involves overseeing and managing a computer system's hardware and software, ensuring its availability, performance, and security. On the other hand, system engineering applies engineering principles to design, develop, and maintain systems. While system administrators are often viewed as managers focused on the day-to-day upkeep of systems, system engineers take a broader perspective, encompassing the entire life cycle of a system, from planning to implementation.
As technology has advanced, the roles of system administrators and engineers have seen overlaps, leading to some confusion. With increasing responsibilities, system administrators have found themselves doing tasks typically reserved for engineers, like scripting and automation. Additionally, the DevOps movement has further blurred the lines, calling for tighter collaboration between development and operations.
However, it’s crucial to note that while the roles might intertwine, they each have distinct responsibilities. System administrators focus on ensuring systems run efficiently daily, handling software updates, backups, and user support. System engineers, meanwhile, dive deeper into system architecture, design, and integration, ensuring components work harmoniously. You cannot just pick up a Systems Administrator and expect him to start performing as a Systems Engineer.
Transitioning from a Systems Administrator to a Systems Engineer entails facing multiple challenges and adjusting daily activities. Administrators specialise in operating and troubleshooting existing systems, whereas engineers need a comprehensive understanding of system design, integration, and stakeholder management. This change demands a skillset shift, a proactive mindset, enhanced interdepartmental communication, strategic resource allocation, and overcoming resistance to change. Regarding daily tasks, engineers shift from troubleshooting to designing systems and engage more in stakeholder meetings, project planning, risk assessments, and detailed documentation. Additionally, they focus on integration testing and regularly conduct cost analysis and budgeting, ensuring the financial viability of projects.
Even if a Systems Administrator performs engineering Tasks in your organisation, understand that the core activities/concern is moving/different.
If I have Systems Engineers, do I need Systems Administrators?
The requirement for both System Engineers (SEs) and System Administrators (SysAdmins) is contingent on an organisation’s size, complexity, and IT needs. SysAdmins excel in day-to-day IT operations, addressing immediate system issues, possessing specialised software and hardware skills, providing end-user support, ensuring continuous resource monitoring, and maintaining security and compliance. Their expertise is paramount in larger organisations with intricate IT infrastructures, especially those with heritage systems or significant security and compliance needs.
Conversely, organisations with smaller, simpler IT infrastructures might forgo dedicated SysAdmins. Such decisions are common for companies primarily relying on cloud services, where many operational tasks are cloud-provider-managed. The rise of automation and the DevOps culture can also render some traditional SysAdmin roles redundant or merged with development duties. Similarly, organisations focusing on product development and using third-party IT solutions, or those with budgetary constraints, might have SEs filling both roles until dedicated SysAdmins become feasible.
In conclusion, while there’s a trend, especially in start-ups and tech-focused companies, to merge roles and embrace concepts like DevOps, the traditional Systems administrator role is still crucial for many organisations. The decision to have both Systems Engineers and Administrators should be based on your IT environment's specific needs and scale. Ideally, having both balances innovation (SEs) and daily operational excellence (SysAdmins).
Understanding How to Scale Systems Engineers
As mentioned above, system engineering applies engineering principles to design, develop, and maintain systems. In an Enterprise, there are multiple business applications, creating a context challenge of needing multiple engineers if you follow a path of by application/system. In most cases, your engineer will start to have a pile of workloads, not having enough deepness on each one.
On the other hand, keep adding more engineers and expect linear improvements in output or efficiency. As the teams and team size, communication, management, and coordination overhead increases, leading to diminishing returns.
Well, this problem doesn’t affect only engineers, and Matthew Skelton and Manuel Pais, in their book “Team Topologies”, provide a strategic framework to address these challenges.
I strongly recommend you read the book, as I am only touching the surface of the description and characteristics of each of the concepts.
“Team Topologies” emphasises the importance of organising IT teams based on the flow of change and the cognitive load of the systems they’re responsible for. The book presents four fundamental team patterns (Stream-aligned, Enabling, Complicated Subsystem, and Platform teams) and three interaction modes (Collaboration, XaaS, and Facilitation) as a framework. The central conclusion is that by deliberately designing and evolving the structure and interactions of teams in response to the software systems’ needs, businesses can achieve faster flow, higher reliability, and improved adaptability in their software delivery, leading to a more effective organisation overall.
Even if not with the consistency stated by the authors of “Team Topologies”, the type of relationship that we see more often resembles the collaboration interaction pattern. Trying to rely on this type of interaction will lead to the scalability challenges mentioned.
Consider that XaaS interaction comes at a price, it assumes a level of standardisation that does not apply by default or to everything in most organisations (heterogeneous workloads/systems).
Platform Engineering Defined
It should be clear what Platform Engineering is by now. Merge the definition of System Engineers with the will to provide a specific platform/service as a service (XaaS Interaction) and you get:
Platform Engineering involves designing, implementing, and managing a technological foundation (or platform) that accelerates software delivery by providing easy-to-consume, standardised, and reusable solutions and tools. This platform typically abstracts infrastructure complexities, offers services, and embeds best practices, enabling development teams to focus on creating business-specific functionalities quickly and reliably. A platform engineer, therefore, not only ensures the technical robustness of the platform but also strives to enhance developer experience and productivity.
For successful platform engineering, prioritising standardisation is crucial while keenly addressing platform consumers' needs. A service can be highly scalable and easily maintainable, but its adoption will be limited if it doesn’t cater ctrum of consumer requirements.
Reflecting on the differences between sysadmins and engineers underscores engineers' need to evolve. They must move beyond traditional practices and controls, embrace a holistic, loosely coupled view of services rather than just deployment for a purpose, and adopt a ‘shift-left’ approach as the “service” will need to exist and be maintained independently of the teams/workloads lifecycle that use it.
In the context of GitOps/DevOps, the “shift-left” approach refers to a methodology where teams aim to address and rectify issues earlier in the software development lifecycle (SDLC) rather than during the later stages. By shifting tasks, testing, and feedback mechanisms “to the left” (meaning earlier in the process), teams can identify and remedy defects, security vulnerabilities, and other concerns at a stage where they are easier and less costly to fix.
The shift-left strategy is especially pertinent in DevOps and GitOps environments because it emphasises continuous integration, continuous delivery, and rapid, iterative development. Implementing a shift-left approach often involves more thorough early-stage testing, increased developer responsibility for code quality, and early collaboration between development and operations teams. The overall goal is to improve software quality, speed up delivery cycles, reduce costs associated with late-stage defects, and ensure security and compliance from the outset.
Conclusion
In the evolving landscape of technology, understanding the distinctions and overlaps between roles such as Systems Administrator, Systems Engineer, and Platform Engineer is crucial. While intertwined, these roles offer unique value to an organisation.
The shift towards DevOps and cloud-based operations underlines the need for specialised knowledge in development and infrastructure realms.
Balancing innovation and daily operational excellence becomes pivotal. As the IT ecosystem continues to transform, roles must adapt, highlighting the importance of a cohesive team structure and the adoption of modern methodologies like GitOps. Grasping these nuances ensures businesses remain agile, efficient, and ahead of the curve.
Please note that the opinions and views expressed in this article are solely my own and do not represent my employer’s official position or policies. This is a personal commentary based on my experiences and thoughts, and although I aim for accuracy, there may be errors or omissions in the content.