AI Developments to Watch in 2024

ReadyAI.org
ReadyAI.org
Published in
8 min readJan 10, 2024

By: Rooz Aliabadi, Ph.D.

In 2024, the landscape of generative AI is set to transform dramatically, especially for the everyday user. Having invested significantly in generative AI, major tech corporations face the challenge of monetizing their advancements. Google and OpenAI are leading the charge, focusing on accessibility: they’re creating platforms that empower people to tailor language models to their specific needs without programming expertise. These web-based tools open the door for anyone to become a generative AI app creator.

This shift means generative AI is poised to become genuinely helpful to the average person. Today, I expect a surge in individuals experimenting with diverse AI models. Cutting-edge models like GPT-4 and Gemini are now multimodal, capable of understanding and generating text, images, and videos. This versatility paves the way for innovative applications. For instance, a real estate agent could use these models to automatically create listing descriptions by feeding in text from old listings, uploading new photos and videos, and letting the AI generate content seamlessly.

However, the success of this approach is contingent on the reliability and ethical performance of these models. Language models are notorious for fabricating information, and generative models often contain biases. There’s also the risk of hacking, mainly if these models access the internet. These are significant hurdles that tech companies have yet to overcome. As the initial excitement declines, it will be crucial for these companies to offer solutions to these issues to maintain their customer base.

Personalized Chatbot Revolution

2024 marks a pivotal moment for tech enterprises deeply invested in generative AI as they face the challenge of demonstrating profitability from their innovations. Leading this venture, giants like Google and OpenAI are adopting a ‘less is more’ strategy. They’re rolling out intuitive platforms to enable individuals to craft tailor-made chatbots. These platforms, requiring no coding proficiency, simplify personalizing powerful language models to meet unique user requirements. Both companies have introduced web-based interfaces, transforming anyone into a creator of generative AI applications.

This development is set to make generative AI genuinely accessible and helpful to the everyday, non-technical individual. I anticipate a surge in engagement with a vast array of AI models. Advanced AI technologies such as GPT-4 and Gemini, which are multimodal and can interpret and generate text, images, and videos, are at the forefront of this change. Their multifaceted capabilities could lead to a variety of novel applications. Take, for example, a real estate agent who could integrate text from past listings and effortlessly produce a sophisticated AI model to create similar content, upload new property videos and photos, and request the AI craft detailed property descriptions.

However, the effectiveness of this innovative approach largely depends on the reliability and ethical integrity of these models. There are prevalent issues with language models fabricating information and generative models needing to be more robust with biases. Additionally, the vulnerability to hacking becomes a concern, particularly when these models have internet access. These are significant challenges that remain unresolved by the tech companies. As the initial fascination with these advancements diminishes, it will become imperative for these companies to address these issues and offer viable solutions to their users.

Video Revolution in Generative AI

The rapid evolution of generative AI is transforming the extraordinary into the everyday. In 2022, the first wave of this technology brought photorealistic images to the mainstream, with tools like DALL-E, Stable Diffusion, and Adobe’s Firefly creating many stunning visuals ranging from surreal to sublime. Yet, amidst the creative explosion, there were instances of derivative art and problematic content.

Now, the spotlight turns to text-to-video, poised to amplify the capabilities and challenges of text-to-image on a grander scale.

Just a year ago, early models that stitched together still images into brief video clips showed promise, albeit with limitations in fluidity and realism. However, advancements in this domain have been swift and impressive. Runway, the company behind Stable Diffusion, is frequently updating its generative video models. Its latest, Gen-2, produces short videos of remarkable quality, nearing the standards of industry leaders like Pixar.

Runway’s commitment to this field is further exemplified by its annual AI film festival, offering a $60,000 prize pool to showcase AI-generated movies. This year’s event will feature the top 10 films in New York and Los Angeles.

Major film studios, including Paramount and Disney, are also exploring generative AI for various production aspects. This technology is used for tasks like lip-syncing in multiple languages and enhancing special effects. A notable example is the use of a de-aged deepfake Harrison Ford in “Indiana Jones and the Dial of Destiny,” signaling just the beginning of this trend.

Beyond the entertainment industry, deepfake technology is gaining traction in marketing and training. Companies like Synthesia enable the creation of deepfake avatars from a single performance, claiming that 44% of Fortune 100 companies now use their technology.

This burgeoning capability to create high-quality content with minimal input raises significant concerns in the acting community, as evidenced by last year’s SAG-AFTRA strikes. The ethical implications and the changing landscape of filmmaking are at the forefront of discussions, as noted by filmmaker and creative technology consultant Souki Mehdaoui. The evolution of generative AI in video is not just a technological leap; it’s reshaping the essence of storytelling and content creation.

Unchecked AI-Generated Election Misinformation in 2024

The upcoming elections in 2024 are poised to face an unprecedented challenge with the rise of AI-generated disinformation and deepfakes. Drawing lessons from recent political events, it’s evident that such technologies are becoming a significant issue. For instance, in Argentina, presidential candidates have already employed AI to create false images and videos to disparage their rivals. Slovakia witnessed the rapid spread of deepfakes targeting a liberal party leader with damaging and false narratives. In the United States, certain groups, receiving indirect encouragement from various political figures, are using AI to propagate racist and sexist memes.

The impact of these manipulations on election outcomes is difficult to quantify, but their increasing presence is alarming. The ease of creating convincing deepfakes, once a task requiring substantial technical expertise, is now alarmingly simple and accessible due to advancements in generative AI. This raises the stakes in discerning reality in the digital realm, especially in an already tense and divided political landscape.

The realism of AI-generated content is reaching a point where even trusted sources can be deceived. A notable example is the influx of AI-created images falsely representing the Israel-Gaza conflict into stock image libraries like Adobe’s.

As we approach this critical year, efforts to combat the spread of AI-generated misinformation are still nascent. Solutions like watermarking, such as Google DeepMind’s SynthID, are primarily optional and only partially effective. Additionally, social media platforms often need to improve in addressing misinformation. The year ahead is crucial and will serve as an extensive, real-world test in identifying and countering AI-generated falsehoods. This process will be pivotal in maintaining the integrity of democratic processes.

The Rise of Versatile Robots

The burgeoning field of generative AI is ushering in a new era of robotics characterized by more versatile, multitasking machines. This shift is inspired by recent trends in AI, moving from multiple specialized models to single, comprehensive models capable of handling various tasks.

Historically, AI focused on dedicated models for tasks like image recognition, drawing, or captioning. However, the advent of large-scale models such as OpenAI’s GPT-3 and multimodal models like GPT-4 and Google DeepMind’s Gemini has changed the landscape. With some fine-tuning, these models can tackle diverse tasks, from coding to creative writing and visual challenges.

This paradigm is now influencing robotics. Instead of designing robots for singular tasks — one for flipping pancakes, another for opening doors — a universal model could equip robots with multitasking abilities. This concept has seen significant progress in 2023.

DeepMind’s Robocat, an evolution of the previous year’s Gato, exemplifies this. Robocat can autonomously generate data from its experiences, learning to operate various robot arms, a departure from the norm of training on a specific arm. Another milestone was the introduction of RT-X, a general-purpose robotic model, alongside a substantial new training dataset developed in collaboration with 33 university labs. Research teams like RAIL at UC Berkeley are exploring similar technologies.

A significant challenge in this field is data scarcity. Generative AI benefits from vast internet-scale datasets of text and images, while robotic learning sources are comparatively limited. Lerrel Pinto and his team at New York University tackle this by developing techniques for robots to learn through trial and error, creating their training data. Pinto has engaged volunteers in a more grassroots project to gather home video data using iPhones attached to trash pickers. Large companies like Meta have also begun releasing extensive datasets, like Ego4D, for robot training.

This approach has parallels in the development of autonomous vehicles. Companies such as Wayve, Waabi, and Ghost are at the forefront of using single, large-scale models for vehicle control, diverging from the traditional method of employing multiple smaller models for specific driving functions. This innovation has allowed smaller entities to compete with industry leaders like Cruise and Waymo. Wayve, for instance, is already testing its technology on London’s challenging streets.

The principles of generative AI are shaping the future of robotics, paving the way for robots capable of handling many tasks and transforming their roles in both industrial and domestic settings.

This article was written by Rooz Aliabadi, Ph.D. (rooz@readyai.org). Rooz is the CEO (Chief Troublemaker) at ReadyAI.org

To learn more about ReadyAI, visit www.readyai.org or email us at info@readyai.org.

--

--

ReadyAI.org
ReadyAI.org

ReadyAI is the first comprehensive K-12 AI education company to create a complete program to teach AI and empower students to use AI to change the world.