The immediate future of AI: Action Transformers

Jesus Templado González
ROMPANTE
Published in
5 min readDec 26, 2023

Over recent years, the expansion of Transformer models has resulted in tons of people learning about the possibilities of AI through Generative models (yes, ChatGPT again!) .

However, in this article we discuss an advancement that can, potentially, re-shape the business world: Action Transformer models.

Action Transformers will transform how humans interact with devices. Picture/artwork generated with Midjourney by Mr Newq.

Automation, a key milestone in the pursuit of General Intelligence

Given that the next wave of models is pursuing General Intelligence, Action Transformers play a key role in this pursuit by helping with General-Purpose Automation.

Automating software tasks used to be a challenge that has now been well overcome. You may already be rightly thinking about out-of-the-box software solutions that help with digital automation by allowing users to program repetitive actions through a simple UX interface. There’s not much intelligence in this procedure since robots follow predefined processes.

Software landscape of “single” purpose automation tools. Credit: Workfellow.ai

However, the next milestone would be General-Purpose automation which we all expect would be taking us closer to general intelligence capabilities for real business scenarios.

Action Transformers in General-Purpose automation

Have you heard of a start-up called Adept? They became kind of well-known in early 2022 as they created ACT-1 when aiming to develop “useful general intelligence.” They started with an all-star scientific-technical team comprising former members of top-tier AI firms such as DeepMind or OpenAI. This team focused on creating intelligent agents by developing a newer clever interface that served as an intermediary between humans and technology.

ACT-1, the most known large-scale Action Transformer was born then.

It was trained and designed as a digital agent that interacts with diverse computer programs and software applications, essentially enhancing Human-Computer Interaction (HCI) which in practice is leveraging a linguistic front-end towards devices and computers like many of us already use today.

Highly simplified view of a Human-Computer Interaction (HCI)

In simple terms, this model can handle a variety of tasks involving following multiple steps, actioning different tools, and accessing different websites, all with various levels of complexity. It’s capable of working with a range of software tools at various stages of a process and can even incorporate user feedback to refine its performance. Users need to know how to communicate their desires first.

Where they stand out is in automatically executing software actions that may be beyond human expertise while managing tasks across various applications.

How would Action Transformers work in a real business scenario?

Just like a user simply writes a command, prompt, or query to ChatGPT, with Action Transformers like ACT-1, humans will be able to use written or spoken natural language to request models a desired outcome.

In this process, using Natural Language Processing (NLP) and Machine Learning (ML) algorithms, Action Transformers understand the user’s command and translate it into a series of actions that a computer can execute to accomplish each task. These actions or requested steps are subsequently converted into a series of API calls or equivalent executable actions by the system.

Four examples of how this would work in real life with different levels of complexity:

  • A financial controller aims to create a custom spreadsheet that summarises the OPEX, with a simple command such as “Create a spreadsheet summarising my OPEX for this specific period”, the model would use its knowledge of available spreadsheet software (i.e. Google Sheets or any other ) to create the spreadsheet and populate it with the right data that may be gathered also from other disperse documents.
  • Similarly, a marketing executive may need to measure and report the return on investment of the weekend’s campaign after buying a specific media inventory. With a simple command such as “Provide me with the conversion KPIs that the Friday-Sunday campaign has achieved and include a chart to ease comprehension” as a prompt. Action Transformer would use its understanding of activation platforms, media buying platforms, and spreadsheets to execute the necessary steps and to prepare the requested information.
  • When it comes to tasks that users may not know how to do themselves, such as performing simple or complex analyses (i.e. coding with Visual Studio) or building interactive dashboards (i.e. PowerBI) these models may infer what the user means from context. For example, if a user types “standard deviation” the Action Transformer can infer that they want to calculate or measure of how dispersed the data is in relation to the mean or specific metric like “sales” or just request more information. This ability to understand the context and infer user intent can help users perform tasks more quickly and accurately.

As you can see, by leveraging the concept of LLMs, Action Transformers can interact with diverse User Interfaces, applications, software solutions or even websites seamlessly, and they can even do it with no training.

There are many potential uses for ACT-1 or similar models. This example is about adding a new lead to the CRM

In a futuristic business scenario, ACT-1 would allow a financial controller or a data analyst to avoid learning software like Excel or PowerBI; instead, they could directly assign tasks to ACT-1, allowing executives to focus on more intellectually demanding tasks like creativity or strategy.

Weaknesses and Way Forward

While we appreciate ACT-1 models may soon be functioning flawlessly even at complex tasks for humans, currently there are caveats:

  • Business users struggle to determine its reliability, especially for tasks that are beyond their technical skills and knowledge, i.e. merging datasets using Python.
  • The successful output of a request is highly dependent on how the user has written or communicated the query.
  • Although intended for a higher or more versatile purpose than GPT-4 in terms of automation, Action Transformers’ success and effectiveness are also, somehow limited by the amount of data they are trained on and by the list of predicted “next actions” based on previous ones.
  • They still lack the ability to follow common sense, to express intent, or have a deep understanding of how the real world works.

Conclusion

Action Transformers (like ACT-1) mark a major advancement in the realm of Human-Computer Interaction, opening new avenues for creative and progressive innovations.

These models allow deconstructing and automating business users’ command into smaller, manageable actions or steps. It can even improvise new ones and help the user approach and execute tasks unknown or complex.

Although this field of development may be a significant step towards general intelligence and automation, there is still a long way to go.

--

--

Jesus Templado González
ROMPANTE

I advise companies on how to leverage DataTech solutions (Rompante.eu) and I write easy-to-digest articles on Data Science & AI and its business applications