AI Ops and NetOps Evolution

Jamie McGregor
Deloitte UK Engineering Blog
10 min readJan 25, 2024

An overview

Throughout the evolution of network operations, tools have constantly played a vital role. Network Operations (NetOps) is a term used to describe the operational activities involved in the lifecycle of network infrastructure. This can involve the management, monitoring and maintenance of networks and this is where AI will have an impact. Traditionally monitoring is done through dashboards, the “eyes” of network engineers, monitoring network health. These dashboards combine real-time and historical data as a consumable overview. However, the emergence of AI technologies has transformed and reshaped the landscape in unprecedented ways.

Through the growth of intent-based networking, AI has emerged as a significant change in NetOps. Harnessing machine learning algorithms and predictive analytics, AI-driven NetOps has enabled companies to gain invaluable insights into their network behaviour, taking a proactive approach to identifying and rectifying potential vulnerabilities, bottlenecks, and performance issues. This shift from reactive to proactive network management has enhanced operational agility but also translated into cost savings and an improvement in operations.

As we delve into the trends that are shaping the AI and NetOps landscape, it becomes evident that the next phase will see even more advancements in AI for networking and cyber security. These measures comprise predictive threat modelling, intelligent threat response and enhanced user authentication protocols. With technologies such as 5G, IoT (Internet of Things), and the ever-expanding digital ecosystem, the collaboration between AI, NetOps, and cyber security is poised to create a future where networks seamlessly adapt and evolve to protect against emerging cyber threats.

AI and NetOps Transformation for Networks and Security

As we explore the potential, it is clear that AI and NetOps will keep progressing. The upcoming stage may witness the arrival of additional AI integrated into platforms and flexible operating methodologies. Businesses will strive to decipher business objectives and implement measures to minimise manual intervention within their infrastructure. In this ever-changing scenario, AI’s contribution to network troubleshooting and management will play a significant role in propelling networks forward in an expanding digital world. This revolution is beginning to improve efficiency, for example via customer service chatbots, helping to gather more troubleshooting details on the Network Support tickets from the customers, or in a more direct way, as many vendors are starting to use AI and ML elements as a part of the management or analytic engines, like Aruba Networks Central for the wireless management of Cisco AI Network Analytics, which enables businesses to allocate more funds towards projects that boost innovation.

AI and NetOps Architecture for Networking

The new innovative method for AI and NetOps architecture will pose many challenges for companies in resolving issues related to the vast amounts of data and events being collected. Companies must centralise this information and take action based on these events. Through centralisation of the data that they receive, it’ll become possible to investigate the data collection and simplify the infrastructure to then learn what is important to view and abstract for the network administrator to take action. In Figure 1, we have a visual representation of the collection of network data into a network/security platform. The following is a workflow from the data collection to the actions taken below:

1. Data Collection:

Network devices and sensors collect vast amounts of data related to traffic, performance, and security events.

2. Data Aggregation:

Collected data is aggregated and sent to a central platform.

3. Data Pre-processing:

Raw data is pre-processed to clean and format it for analysis. This step may involve data normalisation, transformation, and filtering.

4. Data Storage:

Processed data is stored in databases or data lakes for historical analysis and reference.

5. AI Analytics Layer:

Machine Learning Models: AI models, such as anomaly detection, pattern recognition, and predictive analytics, are applied to the data to identify trends, anomalies, and potential issues.

Training: Models are trained on historical data to improve accuracy and performance.

6. Decision Making:

Automated Actions: Based on AI analysis, automated actions can be taken to optimise network performance, mitigate threats, and resolve issues.

Alerts and Notifications: Notifications are sent to network administrators for critical situations that require human intervention.

7. Feedback Loop:

Data from automated actions and human interventions is collected to refine AI models over time, improving their accuracy and effectiveness.

8. Visualisation and Reporting:

Insights from AI analysis are visualised through dashboards and reports, providing a clear view of network health, performance, and security.

9. Network Administrator:

Network administrators work in tandem with AI, using the insights provided to make informed decisions and implement strategic changes.

An image of an AI process model
Figure 1 — AI Process Model

Network Management

The trend in network management is shifting from managing platforms on-premises to deploying services in the cloud and managing them in the cloud. Cloud-based management offers many advantages, including quick and easy provisioning, effortless scalability, and flexible configuration changes. Examples of these can be found on every cloud platform such as connectivity services on Azure (e.g. Azure DNS) or Edge Network Services on AWS (e.g. Amazon Cloud Front). This approach reduces operational overhead and creates a more efficient model for network management.

Cloud management offers the advantage of centralised control, exemplified by a unified dashboard. Network administrators are empowered through this dashboard to configure, monitor and maintain network devices scattered across several geographical locations. This centralisation streamlines the sophisticated web of management tasks, provides greater visibility, and transforms troubleshooting processes.

The allure of cloud management is further elevated by the prospect of remote accessibility, an imperative facet in today’s era of widespread remote work arrangements. Cloud management, in its essence, empowers network administrators with the capability to remotely access and manage network resources. The ability to swiftly respond and adjust to network issues from virtually anywhere becomes important, leading to accelerated incident resolution and an overall augmentation of operational agility.

The prevalent subscription-based or pay-as-you-go pricing model that is becoming increasingly popular dispenses with the need for upfront capital expenditures on hardware. The resultant reduction in maintenance costs and the cultivation of a predictable cost landscape become magnets for organisations seeking to maximise resource utilisation, given that there are appropriate guardrails and systematic reviews ensuring cost compliance. There are benefits of having on-premise infrastructure compared to the cloud-based as a business, for example, you might want a secure and confirmed monthly cost but this could mean less flexibility in feature enrichment.

The deployment of network services and updates is yet another factor where cloud-based solutions shine. This marks a departure from the era of manual device configuration, where the introduction of updates or alterations necessitated painstaking efforts. Cloud management streamlines the process by enabling administrators to cascade updates and modifications from a centralised vantage point, thereby ensuring consistency in configurations and the timely rollout of security patches.

Delving deeper, cloud management’s allure is bolstered by its use of advanced analytics and reporting functionality. This infusion of insights, a direct outcome of centralised cloud management, empowers network administrators with valuable data. It highlights pathways towards optimising network performance, unearths nascent trends, and shows a quicker path for more informed decisions. This centralised data can also be used to power AI-driven algorithms (e.g. Dynatrace) which play a pivotal role; in predicting and mitigating potential network issues; dynamic network allocation; security and threat intelligence and Intent Based Networking. This creates space and allows network administrators to allocate more time for innovation rather than maintenance.

With data and telemetry gathered and correlated, you can detect unusual network behaviour using a cloud-based network management platform. Allowing you to look at a single pane for your cyber security and network, centralised management brings correlation and additional context. Reducing the need for manual digging, you can employ AI network/security management platforms to give greater insight and get a clearer picture of your security landscape. One of the key issues as engineers for businesses is the vast amount of tools to keep track of the health of the infrastructure. There’s always been a push for a centralised single pane or a command centre, however, the underlying foundations must be built first for these systems to complement each other or be consolidated to minimise the impact of missing important information.

AI and DevOps have a seamless union that serves to infuse newfound agility into network operations. Infrastructure alterations and adjustments are adapted to work with application deployment cycles, bringing in changes that are orchestrated in tandem, fostering an environment of iterative evolution.

As we navigate networking, cloud management is moving towards greater agility, efficiency, and adaptability. Cloud-based management empowers organisations to streamline practices, increase remote accessibility, and respond to network demands.

Network Transformation with Overlay Networks

The way of viewing the network is shifting. The concept of underlay and overlay networks has introduced a new method for network administrators to manage and optimise their networks. These concepts have changed the landscape of networking by offering more flexibility, scalability, and management of network resources. For example, in the good old days changes to the network routing would require manual configuration and sometimes site visits. Now overlay software-defined networks allow for centralised control, and elastic bandwidth adjustments on a need-by basis. It can be done by an engineer, but equally AI-enabled predictive analysis would be a good fit for this type of capacity management.

The traditional method for networks uses a single physical network infrastructure with no logical abstraction, whereas, in the transformation of networks, there are now underlay networks referred to as the underlying physical infrastructure of the network and links. Overlay networks are virtual networks created on top of the underlay, using new methods such as software-defined networking (SDN). Overlays are seen as abstracted layers, allowing for easier management and greater customisation of these virtual networks.

The impact of underlay and overlay networks can be seen in several key areas:

1. Flexibility and Scalability:

Underlay and overlay networks provide a more flexible and scalable network. Underlay networks can be optimised for physical connectivity whereas overlay can be created, modified, and scaled when needed without the need for significant changes to the underlay.

2. Virtualisation and Multi-Tenancy:

Overlay networks allow the possibility of segmentation, allowing applications to share the same physical infrastructure without interfering with each other. This isolation enhances security and simplifies network management. This can be seen as a new innovative method of utilising VLANs (Virtual Local Area Networks).

3. Centralised Management and Automation:

Overlay networks are often managed through centralised controllers, enabling network administrators to configure, monitor, and manage the entire network from a single interface. Automation can be leveraged to dynamically adjust overlay configurations based on network traffic and demands, leading to more efficient resource utilisation.

4. Traffic Optimisation:

Overlay networks can optimise traffic flows by dynamically rerouting traffic to avoid congestion or bottlenecks in the underlay network.

5. Disaster Recovery and Redundancy:

Overlay networks can be designed to provide redundancy and disaster recovery capabilities. In the event of a failure in the underlay network, overlay networks can automatically reroute traffic to ensure uninterrupted service.

6. Hybrid and Multi-Cloud Deployments:

Overlay networks are particularly valuable in hybrid and multi-cloud environments. They allow organisations to create consistent network architectures across different cloud providers and on-premises data centres, simplifying network management and ensuring seamless connectivity.

In principle, the introduction of underlay and overlay networks has transformed network operations by decoupling the logical network from the physical infrastructure. This separation empowers organisations to achieve greater agility, scalability, and efficiency in managing their networks, contributing to improved overall performance and enhanced user experiences. With the extrapolation of underlay vs. overlay, you can see patterns and potential breaches clearer through the overlay. AI will take a larger role in the overlay due to underlay being standardised through foundational work that can be carried out and overlays can be monitored for routing or optimisation based on the use case of the business.

Conclusion

The combination of AI and NetOps has revolutionised networking, leading to an era of innovation, resilience, and efficiency. Networks are advancing drastically, and administration is simplified thanks to the insights AI provides on network ops and security. This convergence has changed how networks are managed, secured, and optimised. Industry leaders are investing heavily in integrating AI with networks, resulting in a refined landscape of cybersecurity and network performance. AI can process substantial amounts of data in real-time, predict potential issues, and automate responses, helping us adapt to evolving threats.

As the IT industry undergoes a shift, NetOps is adapting to this change by adopting agile methodologies. This is leading to a transformation of the NetOps role, which now includes expertise in security, DevOps, and networking. The abstraction of underlay vs. overlay is contributing to more innovation in the overlay space, enabling a more comprehensive approach to aligning networks with strategic goals, such as hybrid cloud adoption. Thanks to AI and NetOps, management is becoming more centralised, routine tasks are being automated, and proactive actions are being emphasised. This streamlining is helping companies become more efficient, resulting in greater user satisfaction. To prevent financial waste and maximise benefits for the business, it is important to ensure that customers select the appropriate subscriptions that meet their needs. This will help ensure that the business gets the features that will be most beneficial to its operations.

The integration of AI and NetOps has become increasingly important as networks become more complex. This convergence has allowed NetOps to expand beyond traditional limits, creating agile, secure, and responsible networks that can adapt to dynamic demands. Similar AI technologies used in medicine can also benefit the networking industry. The synthesis of technology and operational principles has improved cybersecurity and opened new avenues for growth, ensuring that networks remain resilient pillars of modern enterprises. The future holds promise for further innovations as AI and NetOps propel networking into uncharted territories, where possibilities are limited only by the horizon of imagination. When implementing AI within a network, it is crucial to consider the establishment of governance by the customers themselves. This will help ensure that AI is not misused and is directed towards enabling the user, rather than hindering their progress.

If you are interested in reading more about AI and Networks, have a read of my colleagues recent experiments with LLM and Data networking.

Contributors: Anna Tsyganova, Nial Majeed, Tarab Shakeb

Note: This article speaks only to my personal views/experiences and is not published on behalf of Deloitte LLP and associated firms, and does not constitute professional or legal advice.

All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement.

--

--

Jamie McGregor
Deloitte UK Engineering Blog

Manager at Deloitte. Certified AWS Cloud Practitioner. Designs scalable and resilient network architectures.