Photo by Dominik Scythe on Unsplash

What is the AI ‘State of the Art’?

Artificial Intelligence is evolving at a rapid pace. In this chapter, we attempt to present the latest and greatest technologies, models, and methods available in the broader AI space.

George Krasadakis
Published in
16 min readOct 30, 2023

--

We asked various thought leaders to summarize the state-of-the-art of AI regarding aspects such as Computer Vision, Natural Language Understanding, Content Understanding, Decision-making, and Robotics. We asked the leaders to describe the latest advances, list the most important open-source AI frameworks and models, and explain how Quantum computing is expected to boost the power of current AI technologies.

Eva Agapaki, Dima Turchyn, Emma Duckworth, Netanel Eliav, and Mike Tamir share their insights.

Eva Agapaki

Artificial Intelligence Assistant Professor — University of Florida • USA

AI equips machines with various types of human capabilities such as the ability to sense, see, make decisions and react. AI has seen tremendous hype and investment both in academia and industry, becoming a research hotspot in multiple disciplines, with the most obvious ones being technology, finance, marketing, and autonomous vehicles, but it has also gained traction and is rapidly emerging in healthcare, law and design disciplines.

AI is not a new concept; Warren McCulloch and Walter Pitts invented threshold logic in 1943 by creating a computational model for neural networks based on mathematical concepts and algorithms[1]. The enabling drivers of AI technologies are the large amounts of high-dimensional data and advanced machine learning algorithms that automatically recognize patterns in order to make informed decisions.

The next breakthroughs will give machines the possibility of surpassing human senses for the better of humanity. - Dr. Eva Agapaki

There are four machine learning categories that these algorithms fall into: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning (RL). Supervised learning algorithms rely on large amounts of manually annotated data to learn patterns, in order to make predictions on new, unseen data. The most common supervised learning algorithms are classification and regression.

Classification algorithms predict the output of discrete values, such as judging whether a photo depicts a dog or a cat, whereas regression algorithms predict the output of continuous values, for example, house prices based on historical data. On the other hand, unsupervised algorithms do not have a requirement for labelled data, given that data can be clustered based on their characteristics, for example, customer segmentation based on their preferences. Semi-supervised algorithms only need a few labelled training samples and they are currently a hot research area. Some examples include self-training, active learning, and graph-based semi-supervised learning.

RL algorithms rely on a rewards hypothesis by selecting actions that maximize the total future reward. Those actions may have long-term consequences and the reward is not immediate. Some examples include financial investments and scheduling optimization of capital projects. OpenAI Gym[2] is an open-source platform for developing RL algorithms.

One of the most prevalent sub-fields of AI is Natural Language Processing (NLP), where computers recognize and understand human text language. There are multiple NLP applications such as sentiment analysis, information extraction, and machine translation.

Transformers (such as GPT-3[3] and BERT[4]) have been widely used. Transformers have a looping mechanism that acts as a highway to allow information to flow from one step to the next and retain sequential memory. This makes them ideal for speech recognition, language translation, and stock predictions. State-of-the-art (SOTA) transformers for NLP tasks can be found at multiple open-source Github repositories[5].

Another AI sub-field is Computer Vision — often referred to as Machine Perception. The goal of these algorithms is to enable computers to visualize the world as humans do. The most widely adopted Computer Vision problem is object recognition in 2D and 3D data. There are multiple tasks involved with image recognition such as image classification, segmentation, and object detection. Some of the most commonly used Convolutional Neural Networks for object detection are YOLOv3[6] and deep residual networks[7] with the SOTA being Vision Transformers[8].

Decision-making is another complex task that involves data analysis and merging information from disparate sources while leveraging information importance. AI can solve competitive human-level tasks and even beat humans, for example, when AlphaGo defeated the world chess champion using RL algorithms.

Despite the advances in every AI sub-field, there are significant challenges to overcome. Some of these include explainability of the developed models (technical challenge), algorithmic bias (technical and societal challenge), and transparency in usage (societal, political, and legal challenge). Quantum computing can assist in mitigating some of these obstacles. It can be used to rapidly train and generate optimized ML algorithms (using superposition and entanglement).

A recent open-source library for quantum ML is Tensorflow Quantum (TFQ)[9] which combines a suite of quantum modelling and Machine Learning tools. Some contributions of Quantum AI are: quick and optimal weight selection of neural networks, faster encryption based on quantum search, and quantum algorithms based on the Hamiltonian time evolution that represent problems with optimal decision trees faster than random walks. A summary of AI models, papers, datasets, and open-source libraries can be found at Stateoftheart.ai[10].

The evolution of AI spanning across disciplines has been inspired by advances in biology, cognition, and system theories. The next breakthroughs will not only give machines more logical reasoning capabilities but also the possibility of surpassing human senses for the better of humanity.

Dr. Eva Agapaki is the Director of the Digital Twins research lab at the University of Florida with experience in applied machine learning projects in academia and industry.

Dima Turchyn

Artificial Intelligence Product Marketing Lead, CEE Region — Microsoft • Czechia

During the latest several years, I’ve witnessed various interesting trends in AI systems available on the market that solve real-life use cases. A new generation of models, especially transformer-based models, re-shaped the foundation of what is possible in areas like Natural Language Processing (NLP). Then, as generally a bigger number of parameters in those models showed improvement in model performance, the competition of ‘who’s model has more billions of parameters’ emerged which in fact did work. As an example, models like the Megatron-Turing NLG model with 530B parameters not only push SOTA for some tasks but do so across a broad set of tasks including completion prediction, reading comprehension, commonsense reasoning, natural language interfaces, and others.

The cycle from new model introduction to implementation significantly shortens. - Dima Turchyn

And while this does not mean we are any close to general AI systems, there is a higher level of generalization that models extract when compared to smaller, task-specific models. Those large models come at a cost though, since they require a large pool of resources and are only becoming popular due to the availability of powerful optimization and distribution learning algorithms.

For example, the abovementioned Megatron-Turing NLG model leverages the DeepSpeed library, allowing to build pipeline parallelism to scale model training across nodes. By the way, cost (including environmental) to train such large-scale models means that effective re-use of those models across use cases is key for gaining positive net value. And, at the same time, those same powerful libraries, next generation algorithms, and cloud resources are available to literally anyone: as an example, using those same libraries and the cloud ML platform allowed one of our customers — the University of Pecs in Hungary — to train Hungarian language model in just several months from idea to production, with total resources cost of just around $1000.

Another interesting observation is the emergence of models which are multi-task, multi-language, or models that are built for one domain of AI tasks and then are successfully applied to another. For example, applying attention models initially developed for NLP to image recognition tasks shows impressive results. And even more so, new models emerge which learn simultaneously on different types of modalities — or multimodal architectures.

Most prominent are probably recent architectures that use a combination of language and image data — they learn from both the objects in an image and the corresponding text — leading to knowledge that can be applied to a range of tasks, from classification to generating image description or even translation or image generation.

This paradigm of combining narrow learning tasks into a more general model which learns on many tasks simultaneously is also leveraged in the new generation of Language Models. As an example, Z-code models take advantage of shared linguistic elements across multiple languages to improve the quality of machine translation and other language understanding tasks. Those models take advantage of both transfer learning and multitask learning from monolingual and multilingual data to create a language model, improving tasks like machine translation by a significant margin across languages.

This same approach is used in Florence 1.0 model, which uses XYZ-code, a joint representation of three cognitive attributes: monolingual text (X), audio or visual sensory signals (Y), and multilingual (Z), which allowed to advance multi-task, multi-lingual Computer Vision services for tasks like zero-shot image classification, image/text retrieval, object detection, and question answering.

There are of course many other models and developments, which push the boundaries of what is possible with ML models. Working with the customers and partners on their AI projects, what I am always looking for is how we can apply all of that state-of-the-art research to real-life customer projects. Using a large-scale model has its challenges from its size to inference costs and many approaches emerge to build sparser, more efficient models.

As an example, the abovementioned Z-code models use a ‘mixture of experts’ approach, which means only a portion of a model is being engaged to complete a task. As a result, customers can make use of those powerful developments almost immediately after its introduction. Customers can today build applications leveraging Z-code models, or use multilingual language models with Cognitive Services APIs, or even apply powerful large-scale models like Open AI as a managed cloud service.

In general, this is probably the most impactful observation I record for myself in the latest couple of years: not only new models emerge, but those new generation models are also significantly decreasing the cycle from introduction to actual, applied implementations. This has its obvious benefits, but also imposes significant risks.

For example, the latest speech generation services are available to anyone as easy-to-use APIs along with many other AI services now reaching human parity. This general availability increases the need to take responsibility for how those services are being applied. This is a separate topic by itself, as approaches to address those risks span across cultural, technological, and policy dimensions, like gating some of the services and reviewing the use case each time those services are being deployed, helping to ensure that those powerful advancements are applied not only where it can be used, but where it actually should be.

Dima Turchyn is working with analytical technologies and Machine Learning for 20+ years and has a broad background in Business Development, IT and Marketing. Most recently, he leads Microsoft’s AI product marketing for CEE region covering 30+ countries.

Emma Duckworth

Director of Data Science — GSK Consumer Healthcare • UK

When thinking about advanced, real-world applications of AI, I’m most excited by technology that is making a tangible impact on decisions or processes. Particularly those that are making sophisticated recommendations to a degree that hasn’t been possible historically by people or other methods.

One such cutting-edge application is the use of AI to optimally orchestrate large complex systems, in particular global supply chains. Supply chains are systems that are not just complicated, but complex as they have many individual legs or components with contradictory incentives. To achieve overall system optimisation, we require AI at every stage of the solution, from process mining that aims to understand the system, to running our final end-to-end simulations and optimisations.

Quantum computing holds great potential and could cause a real step-change. - Emma Duckworth

This innovative AI is moving the dial in the way we run supply chains and find solutions to important global challenges such as supply chain resilience and sustainability goals. Such AI models are generally bespoke compilations of elements that are themselves state-of-the-art. Much of the active innovation being worked on here is still in development, often featuring collaboration between academia and industry. This is due to the scale of the problem, the uniqueness of supply chains, and the required access to data and computing resources.

An example of a component where advancements are being made is process mining. Here we may use commercial tools to understand how the system is configured as a starting point for our model. We may then pull on another active area of AI, time-series forecasting. ML models are starting to outperform statistical methods when it comes to accuracy and performance in real-world applications. In the M5 forecasting competition, there was compelling evidence of this when LightGBM (a decision tree-based machine-learning model) had widescale adoption by most leading entries. Additionally, in complex systems we are often considering multiple forecasts, solutions such as LightGBM that support hierarchical forecasting are therefore beneficial.

Other AI technologies whose outputs may be included in complex systems are Deep Learning applications such as Computer Vision or NLP. Here we see rapid innovation on a different trajectory. The scaling of models and available training sets that have been made widely available by technology companies such as Google and non-profits such as ImageNet, has led to commoditisation of Computer Vision applications.

There are now numerous off-the-shelf solutions that can be applied to real-world problems with relatively small amounts of customisation. This is feasible via Transfer Learning, that is, the process of using a pre-trained model as a starting point for a model for a new task. For example, your starting model may be able to identify dogs in a photo, you can then additionally train it to identify your pet among others, similarly to how Apple and Meta can start to quickly recognise your friends in photos. For example, say we want to use Computer Vision to measure quality on the manufacturing line, we can very quickly build a good proof of concept using offerings such as Microsoft’s Cognitive Services or Google’s AutoML.

We then need to use optimisers to build our scenarios and solve the problem. Here again, there are many different solutions available. Searching the problem space and optimising for solutions in complex systems is a big problem and can quickly become computationally expensive. It’s an area where Quantum computing holds great potential and could cause a real step-change, in particular Quantum Annealing. This refers to an optimisation process for finding the global minimum of an objective function, particularly effective when there are many local minima, such as across our complex supply chain.

Finally, supply chains are also areas where developments in the governance of AI, accountability of model recommendations, and model transparency and explainability are very important. Unlike many digital/ web applications of AI, we are faced with the challenge of how to ‘start small’ when innovating & testing a model that may e.g. deploy a shipping container full of products.

Supply chain scenarios require high confidence in models when applying recommendations to a real-world system for the first time. The implications of errors can be significant, particularly in highly regulated supply chains where there are safety requirements on the quality of the output. It is therefore critical that academia & industry work closely to develop frameworks so that business and regulators are in-step with the cutting-edge algorithms. Collaboration is a massively important aspect of state-of-the-art AI innovation. Only by working with regulators we can deploy state-of-the-art AI and realise the potential of this incredible technology.

Emma Duckworth leads the global Data Science team at GSK Consumer Healthcare. She is passionate about AI ethics, diversity in data science and uses AI to solve big, strategic problems such as accessibility & sustainability of everyday healthcare. Excited by innovation, she pulls on her startup experience to build and scale AI products.

Netanel Eliav

Chief Executive Officer & Technology Development Specialist — SightBit LTD • Israel

The answer is not straightforward. First, we need to define what we mean by state-of-the-art (SOTA). In simple terms, SOTA refers to AI at its best. This is when the AI has reached its full potential in terms of performance and capability. The definition of AI SOTA changes with time and with the advancement of technology. For example, in 1997, IBM’s Deep Blue was considered to be AI SOTA.

AI on Quantum computers will make breakthroughs in solving some of the world’s most pressing problems. - Netanel Eliav

But today, a computer beating a human player at chess would not be considered AI SOTA anymore, because computers and the AI field have since surpassed that level.

There are many AI technologies in the market today, but the most advanced ones are based on Deep Learning and Machine Learning algorithms. Deep learning has been around for a few decades, but only recently has it had a tremendous breakthrough in accuracy and precision. One of the most fascinating aspects of Deep Learning is that it can process data in a more human-like way. The most advanced Deep Learning networks available today are:

- Convolutional Neural Networks (CNNs) are algorithmic architectures that have been used in everything in our lives, from facial recognition to image classification and even at the backend of simple object tracking technologies that can be found in our phones. Companies like Google and Facebook use it in many products to provide users (and themselves) more value.

- Generative Adversarial Networks (GANs) are algorithmic architectures that use two Neural Networks. They put the two networks against each other in order to generate new, synthetic instances of data that can pass for real data or as real data. The most popular example of its use is Deep-Fake, which can manipulate videos so that they look real and are hard to distinguish from reality.

- Recurrent Neural Networks (RNNs) and the sub-type Long-Short Term Memory Networks (LSTMs) are a special type of artificial Neural Network adapted to work for time series data or data that involves sequences. Those types of algorithms consider the missing dimension of time in CNNs and keep the data’s history. The most popular example of its use is in autonomous cars: as they use feeds of data from sensors to navigate through traffic and avoid obstacles on the road, Object Detection by itself is not enough.

Startups and big companies fine-tune those networks to build more specific models to be used by many sectors, such as the military, autonomous cars, smart cities, and more. Many previously unsolved tasks in the fields of Natural Language Understanding, Computer Vision, and Robotics are now solved by those algorithms.

Those technologies have been around for a while, and are available for free to anyone who needs them — thanks to open-source libraries and frameworks like:

- PyTorch — was created by Facebook and accessible on GitHub.

- Caffe — has been funded by Berkeley Vision and Learning Center (BVLC).

- TensorFlow — was developed by Google.

- Keras — was developed as part of the research project and maintained by François Chollet.

- Detectron2 — which was developed by Facebook with the help of the tech community, including the author of these lines.

The main limitation of AI is on the hardware side. Researching and developing new models takes a lot of processing power from GPUs and CPUs with TPU limitations.

There is new progress in Quantum Computing that may bring good news and it will have an enormous impact on the future of AI and other fields. The development of quantum computing will also help to advance Artificial Intelligence because it will allow for more complex simulations and algorithms to be run. Soon, quantum computers and AI will be used together to make breakthroughs in solving some of the world’s most pressing problems.

Technology is advancing at a rapid pace. The advancements in AI are making it possible for machines to learn, perceive, and understand the world around them. The future of technology will be amazing.

Netanel Eliav is a CEO and Founder at SightBit — An Artificial Intelligence Startup Using Deep Learning and Computer Vision to Save Lives. He is a former Product Manager and ex-Technology Tech Lead Specialist at the Office of the Prime Minister.

Mike Tamir

Chief ML Scientist, Head of ML/AI — Susquehanna International Group • USA

I am consistently impressed with how far we have come in natural language understanding and processing more generally. Language is fundamental to certain kinds of understanding and while the mega-sized language models developed in recent years are (in the end) language models and do not necessarily show a true understanding of the text, we have come a long way from the naive picture of machines only being able to process language through rules and heuristics caricatured by philosopher’s like Searle and his ‘Chinese [translation] room.’[11]

The biggest advancements in language have come from a shift from the direct embedding of text tokens to more inductive methods of embedding text in context. Over the past several years similar advancements have been made in understanding data that can be coded as graphs, relating different entities (nodes) to other entities. The coevolution of these parallel application areas has not yet been fully explored and heralds a very productive line of growth that we can expect in the future. While these research advances have yet to come to fruition, for significant practical applications in medicine, research advances in protein folding and genetics, they have the potential for one day making this a reality.

Another area of remarkable improvement is Reinforcement Learning (RL). While a lot of the fundamental optimization paradigms in RL have remained unchanged, our ability to build estimators that guide how RL agents navigate, understand, encode and then evaluate their behavior in an environment has dramatically improved with the benefit of Deep Learning research over the past several years. This research has the potential to solve the major hurdles that still exist in practical applications, ranging from adaptive safety tests for self-driving cars (which operate in very complex high stakes environments) to tactically responding to fake news threats, and more.

Mike Tamir, PhD is a data science leader, specializing in deep learning, NLP, and distributed scalable machine learning. Mike is experienced in delivering data products for use cases including text comprehension, image recognition, recommender systems, targeted advertising, forecasting, user understanding, and customer analytics. He is a pioneer in developing training programs in industry-focused machine learning and data science techniques.

[1] McCulloch, W.S. and Pitts, W., 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), pp.115–133.

[2] Gym (openai.com)

[3] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. and Agarwal, S., 2020. Language models are few-shot learners. Advances in neural information processing systems, 33, pp.1877–1901

[4] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[5] For example: https://github.com/huggingface/transformers

[6] Redmon, J. and Farhadi, A., 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

[7] He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

[8] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

[9] TensorFlow Quantum

[10] Stateoftheart AI

[11] Chinese room — Wikipedia

Excerpt from 60 Leaders on AI (2022) — the book that brings together unique insights on the topic of Artificial Intelligence — 230 pages presenting the latest technological advances along with business, societal, and ethical aspects of AI. Created and distributed on principles of open collaboration and knowledge sharing: Created by many, offered to all; at no cost.

--

--

George Krasadakis
60 Leaders

Technology & Product Director - Corporate Innovation - Data & Artificial Intelligence. Author of https://theinnovationmode.com/ Opinions and views are my own