Make Network an Autonomous Product

Min He
10 min readJul 9, 2020

--

Art Work Credit: Khalid Saleh

Since 2018, the “Autonomous Things” is on Gartner’s Top 10 Strategic Technology Trends list. Autonomous things, which include drones, robots, ships, and appliances, exploit AI to perform tasks usually done by humans.

Here we examine a subset of “Autonomous Things” focusing on the software systems, and especially on the self-optimization capability.

To continuously optimize a software product, traditionally, there are two approaches:

  1. Closed-source Approach: The company puts its engineering resource to enhance its product continuously.
  2. Open-source Approach: A community effort to improve the product.

Now with ML(Machine-Learning), a third approach has emerged: Autonomous Approach. In this approach, a company or open-source community builds a product initially; the continuous improvement is made through mining the insights hidden in the accumulation of usage data automatically. More data delivers better optimization.

Aidan Cunniffe, the CEO of Optic, defines the Autonomous Product in his blog “Autonomous Products”:

An Autonomous Product is a product that gets better on its own, without active, manual, participation from engineers. There’s always a place for engineers to write the initial code and hit ‘Run’ but these products will improve themselves much more effectively than any team of humans could.

The Autonomous Approach is exciting in two aspects:

  1. Potential enormous productivity gain with this new approach. If you invest x resource in improving your product, the productivity gain is linear x with the Closed-source Approach. With the Open-source Approach, you may get 10x. However, it could be exponential exp(x) with the Autonomous Approach because the product is improved automatically with minimum human involvement.
  2. The products can optimize themselves to the level that no human teams can achieve.

It sounds like magic, but if you look around, many Autonomous Products already exist.

Examples of Autonomous Products

One extreme example of an Autonomous Product is AlphaGo Zero, a computer program developed by Google’s DeepMind to master the games of Go. The training of the game starts from ‘tabula rasa’ (blank slate), in the case of Go, randomly place the piece. And then “self-play” to get more data to optimize its strategy automatically through reinforcement learning. The following chart is the result. With 40 days of training, AlphaGo Zero becomes the best Go player in the world. Later Google published AlphaZero, which is a generalized version of AlphaGo Zero that can learn through “self-play” and be the best in the world in Chess and Shogi without human intervention. The more AlphaZero, the better it becomes. Right now, the highest Elo Rating for the human Go player is 3765. It takes the system less than 2 days of self-training to become better than any human professional Go players.

AlphaGo Zero Skill Progressive Graph

Another Autonomous Product example is Google Search; the initial PageRank algorithm Brin and Page developed used the crowd to train their models. Every time someone linked to another page or clicked a link on Google, Google Search became smarter. Nowadays, Google Search becomes even more sophisticated in deciding the ranking of a page with the aggregation of hundreds of “Signals”. RankBrain is an ML/AI based algorithm that generating Signals based on historical data.

Oracle Autonomous Database is yet another example of an Autonomous Product. It is a cloud database that uses machine learning to eliminate the human labor associated with database tuning, security, backups, updates, and other routine management tasks traditionally performed by database administrators (DBAs).

Two common approaches for most RDBMS vendors to solve database tuning problem:

  1. Optimizing the database internals with rules or heuristics so that it merely runs faster while requiring less tuning. Methods such as in-memory columnar structures have helped in this regard, but not entirely addressed the problem.
  2. External software analyzes the database structure and provides tuning guidance. This approach is nearly always based on known issues and problem patterns, and the software tends to be driven by a static set of rules. Also, even RDBMSs called “autonomic” usually require the DBA to actually take the tuning action because a measure of human judgment is generally required. This approach has obvious limitations. According to IDC(International Data Corporation) 2018 report “Oracle’s Autonomous Database: AI-Based Automation for Database Management and Operations”:

Adjustments based on rules and patterns are not sufficient, and the human intervention required to apply such maintenance would make this service too expensive to represent a significant improvement over the prevailing do-it-yourself model.

Oracle Autonomous Database takes a different approach. It uses ML instead of coded rules and heuristics, which proved to be more productive, covered more complex scenarios, and delivered more optimized tuning results. More importantly, the system becomes smarter and smarter, with more data available.

Can Network Be an Autonomous Product?

Shall we build future networking software as Autonomous Products? After all, many optimization problems exist in all aspects of the network O&M problem space, from day -1, day 0, day 1 to day 2. The optimization level has a significant impact on Capex and Opex and the ability to provide innovative new services quickly. Self-optimization is not only highly desirable but also one of the critical requirements of Autonomous Driving Network vision. So the answer is a definite YES.

We are familiar with the Self-Organizing Networks (SON)concept, which was added on top of the LTE system as a set of features to address use cases to achieve self-configuration self-optimization, self-healing, and self-protection functions. The use cases and accompanied SON features have been gradually enhanced and added as part of the standard.

The “self-optimization” function in SON directly relates to the “Autonomous Product” concept here. In the mobile network, it includes functionalities like:

  • CCO (Coverage and Capacity Optimization) — the adaptation of transmission parameters (e.g., TxPower, antenna tilt and/or azimuth) to overcome coverage holes, improve the overall signal level and quality.
  • MLB (Mobility Load Balancing) — adjusts handover and cell reselection parameters to balance the traffic load among cells (including multi-layer, multi-RAT and multi-carrier options).
  • MRO (Mobility Robustness Optimization) — adjusts handover and cell reselection parameters to minimize handover failure rate and ping-pong rate.
  • SON for ICIC and eICIC (Inter-cell Interference Coordination) — coordination of cell-cluster spectrum usage for interference minimization in time and frequency dimensions, including macro and HetNet scenarios.
  • ESM (Energy Saving Management) — mechanisms for switching off cells during low traffic periods and adjusting neighboring cells transmission parameters for coverage assurance in the areas where the cells were switched off.
  • RACH Optimization — adjustments of the RACH parameters to improve the access probability and decrease access delays in different cells (especially significant at tracking areas borders).
  • SON for AAS (Adaptive Antenna Systems) — cell splitting/merging to achieve a balance between capacity improvements and interference.

SON brings automation to network operations. But they would not necessarily qualify as an Autonomous Product if those functions are achieved through explicit rule-based coding or heuristics.

The Network’s SON optimization algorithms need to be replaced by an ML-based algorithm to become an Autonomous Product.

Below is one example of LTE CCO (Coverage and Capacity Optimization) using the Fuzzy Q-Learning algorithm. Q-Learning (QL) is a particular type of Reinforcement Learning (RL) that can solve optimization problems when the system model is not available as a closed-form expression. The following charts demonstrate ML-based self-optimization algorithms deliver good results.

Performance gain with ML-based LTE parameters self-optimization (Moazzam I. Tiwana, 2014)

We can find similar examples in SOON (Self-Organizing Optical Network) as well.

  1. The below chart shows the ML model anticipates traffic tidal wave, preemptively migrate traffic to reduce network blocking probability. ML-based migration strategy outperforms the simple shortest first, low load first heuristic algorithm strategy.
ML-Based Traffic Pattern Prediction (YongLi Zhao, et al., 2018)

2. The below chart shows the ML-based IN-PWR-LOW alarm prediction. More training data delivers better prediction results.

ML-Based Alarm Prediction (YongLi Zhao, et al., 2018)

3. The below chart shows routing and wavelength assignment optimization results. The ML-based algorithm delivers more optimized results than the simple shortest first-fit heuristic algorithm when sufficient datasets are available for training the ML model.

ML-Based Routing and Wavelength Assignment (YongLi Zhao, et al., 2018)

Accompanying with the apparent benefits of building a product as an Autonomous Product, there are challenges and risks. Sometimes the cost of overcoming the obstacles and mitigating the risks are too high to implement as an Autonomous Product, careful analysis is required.

The Challenges and Risks of Building Autonomous Product

Numerous obstacles exist to make the network as an Autonomous Product, here list a few:

Lack of computing power

Machine learning requires a great deal of computing power. Normally the ML model is developed and trained offline in the cloud where abundant computing resources are available. The trained model will then be used runtime in the product. For those model that requires a near-real-time model update, special architecture are required. For network products, a well-architected Autonomous Driving Network Infrastructure is essential to be able to address the requirements in ML model development for Network Element (NE) layer, Network layer, and Cloud layer.

The suboptimal result due to inadequate datasets

ML model training and verification need a large dataset to output acceptable optimized output. It is very challenging to get a sufficiently large enough dataset with an adequate variety of data coming from different types of users and use cases that help broaden the algorithms’ applicability. The data accumulation process can take a very long time.

One possible approach is to use the rule-based or heuristics approach while collecting data to training the ML models continuously in the background. Shift to ML approach when the ML output demonstrated to be better than heuristics with high confidence.

Source: Ge Wang, 2019

Another possible solution is using the so-called Human-In-The-Loop(HITL) learning. HITL leverages both human and machine intelligence to create machine learning models. A desirable active learning system can prompt bot trainers to verify the outputs that have a low confidence score and validate those judgments before feeding them back into the model to reduce the time to establish a good model.

The exposed vulnerabilities

Self-leaning via usage data opens the door for external force influencing internal logic. If not handle well, it can cause dire consequences.

The Microsoft Tay Bot, a chatbot built to let users on the Twitter train the bot and let it learn by itself, is an exact Autonomous product. But it was shut down only after 16 hours after its launch. Because it was gamed by users with mal-intent to retrain the bot with hate speech and extreme right-wing propaganda by using the ‘repeat after me’ function. Later on, Tay Bot is replaced by Zo Bot. But Zo Bot was also shut down two years later due to the same issues.

In his article published in Nature, “Why deep-learning AIs are so easy to fool,” Douglas Heaven explores the vulnerabilities of DNN based deep learning ML in image recognition. He quotes the finding of Dan Hendrycks, a Ph.D. student in computer science at the University of California, Berkeley.

Like many scientists, he has come to see them as the most striking illustration that DNNs are fundamentally brittle: brilliant at what they do until, taken into unfamiliar territory, they break in unpredictable ways.

So don’t forget to assess the risks of data vulnerability when designing an Autonomous Product and its impact if somebody able to game the system. Because, “Deep learning AIs: with great power comes great fragility” ( Douglas Heaven, 2019)

Takeaways

On marching to Autonomous Driving Network (ADN), design and build self-optimize network functions are highly desirable and essential to deal with the ever-evolving complex network environments. Not merely for the sake of productivity gain, more importantly, it can optimize the network to the degree that never possible by using a traditional rule-based or heuristic approach.

The ML-based algorithms enhance the optimization level through data. Human’s involvement in the process is not making code changes instead of setting optimization goals, preparing data, and evaluating results

In the transition to the Autonomous Driving Network (ADN), we need to be fully aware of the difficulties and new vulnerabilities it introduced. Continue to develop a more advanced ML algorithm and always place a set of guardrails in the system to catch potential shortfalls is essential.

Ready or not, more and more Autonomous Products are coming to us. How do I know? Because

“The world is one big data problem.” ~Andrew McAfee.

One More Thing

Autonomous Products can potentially trigger new business ideas just because the data can influence their output.

Search Engine Optimization (SEO) consultant is an example. An SEO consultant is an expert on search engine optimization and is paid by businesses and site owners to give them advice on how to get higher rankings, more targeted traffic, and ultimately more profits for their websites. It is a job required expertise. Below is the famous Periodic Table of SEO Factors 2020. It describes the factors that can generate positive or negative “signals” for Google Search Engine page rank.

I wonder if one day that Network Service Optimization (NSO) consultant may emerge, whose job is to help his/her clients get high quality but the low-cost traffic route by gaming network traffic optimization Autonomous Product.

Not sure, but maybe.

References

Aidan Cunniffe, 2017 Autonomous Products

Oracle, 2019 Autonomous Database for Dummies

RCRWireless News, 2016 Tech leaders agree self-organizing network technology will be key to managing the complexity of next-generation 5G mobile networks

YongLi Zhao, et al., 2018 SOON: self-optimizing optical networks with machine learning

Marcin Dryjanski,2018 Self-Organizing Networks — current features and evolution

Ge Wang October 20, 2019 Humans in the Loop: The Design of Interactive AI Systems

Douglas Heaven, 2019 Why deep-learning AIs are so easy to fool

Moazzam I. Tiwana, 2014 Self Organizing Networks: A Reinforcement Learning approach for self-optimization of LTE Mobility parameters

IDC Report, 2018 Oracle’s Autonomous Database: AI-Based Automation for Database Management and Operations

--

--