AI Weekly Buzz: Innovations, Insights, and Industry Trends July 28th — August 2nd

Olamide T. Iselowo
Artificial Synapse Media
10 min readAug 3, 2024

Happy new month! This week has been quite eventful in the AI / Tech space. We have got interesting updates in generative models, AI updates for programmers that has huge potentials to yield many more innovations and sweet gist from the AI automobile sphere. Here is a breakdown of AI developments from July 28 to August 2, 2024.

LLM and Search Updates First

A. Gemini 1.5 Pro (Experimental 0801) tops LLM Chart

It is a mid-sized multimodal model designed for scalability across various tasks, achieving performance levels comparable to the 1.0 Ultra, Google DeepMind’s most advanced model to date. Additionally, it features an innovative experimental capability for understanding long contexts. The Gemini 1.5 Pro includes a standard 128,000-token context window, but a select group of developers and enterprise clients can access up to a 1 million-token context window through AI Studio and Vertex AI in a private preview. As Google DeepMind rolls out the full 1 million-token context window, they are actively optimizing to improve latency, lower computational demands, and enhance user experience.

The tool was released earlier in the year so the news is not its release. Gemini 1.5 Pro trended this week because an experimental version (0801) was made accessible to the public for testing and feedback and in the process, took first place as top LLM on the LMSYS Chatbot Arena leaderboard. LMSYS Chatbot Arena is a crowdsourced open platform for LLM evaluations. 123 models were compared with about 1,575,901 votes as at the time of this article. In the overall category, Gemini 1.5 Pro ranked #1. It ranked highly in Mathematics and multi-lingual capabilities. It was not #1 but still performed well in Coding and Hard Prompts.

Well done to Google DeepMind

B. Gemma Family Updates

Google DeepMind also celebrated new releases this week. The Gemma 2 family welcomed new members.

Gemma 2 2B

In June, they launched Gemma 2, their latest top-tier open models available in two sizes: 27 billion (27B) and 9 billion (9B) parameters.

Performance
Since its introduction, the 27B model has rapidly ascended the ranks, becoming one of the top-performing open models on the LMSYS Chatbot Arena leaderboard. It has even outperformed many well-known models that are more than twice its size in actual conversational use.

Safety and Accessibility
The goal of the new 2B was not just performance but to enhance both safety and accessibility, they released a new version of the 2 billion (2B) parameter model this week. This model incorporates advanced safety features while delivering an impressive balance of performance and efficiency.

ShieldGemma

They developed a suite of advanced safety classifier models, based on Gemma 2, to filter harmful content in both the input and output of AI models. These state-of-the-art classifiers aim to protect users by targeting hate speech, harassment, sexually explicit material, and other harmful content.

Gemma Scope

They introduced a set of tools designed to help researchers understand how Gemma 2 makes decisions. This comprehensive, open suite of sparse autoencoders — specialized neural networks — provides detailed insights into the model’s inner workings, enhancing interpretability.

C. Open AI’s SearchGPT

They introduced a prototype of new search features that combine the capabilities of their AI models with web information to provide fast, timely answers with clear and relevant sources. This launch is limited to a small group of users and publishers for feedback. There’s speculation that this development might challenge Google Search. It will be interesting to see the outcome.

To join the waitlist to try it out — SearchGPT Prototype

How about Images and Videos?

A. Open AI Advanced Voice Mode

Open AI made advanced voice mode to some ChatGPT Plus users. This was done to serve as testing stage for the product, but the results have been very interesting. This is the feedback from users so far

1. Speak like humans in many languages (45 languages according to Open AI)

2. Real-time translations

3. Casual storytelling

4. Conversational

5. Contains sound effects to suit the subject of discussion

6. Fun and fast learning

B. Midjourney V6.1

V6.1 offers significant improvements in image quality, coherence, and text accuracy, along with new upscaling and personalization models. The updates make it smarter, faster, clearer, and more aesthetically pleasing.

Key Features in V6.1:
- Enhanced image coherence, improving the accuracy of arms, legs, hands, bodies, plants, animals, and more.
- Superior image quality with fewer pixel artifacts and enhanced textures, skin tones, and retro 8-bit styles.
- Greater precision and detail in small image features like eyes, small faces, and distant hands.
- New upscalers that deliver significantly better image and texture quality.
- Approximately 25% faster processing for standard image tasks.
- Improved accuracy in rendering text when using quotations in prompts.
- A new personalization model that offers better nuance, surprise, and accuracy.
- Personalization code versioning, allowing the use of any personalization code from previous tasks with the corresponding model and data.
- Introduction of a `-q 2` mode, which takes 25% longer to process but may add more texture at the cost of reduced image coherence.
- Overall enhancement in the visual appeal across the board.

C. Runway Gen-3 Alpha

Gen-3 Alpha represents a new frontier in high-fidelity, fast, and controllable video generation. This advanced tool can produce highly detailed videos featuring complex scene transitions, a variety of cinematic styles, and intricate art directions. Creating an impressive video relies heavily on a descriptive yet clear prompt. Users can also incorporate an input image to serve as the initial frame of their video. Currently, Gen-3 Alpha supports 16:9 aspect ratios, and any input images not fitting this criterion can be cropped accordingly after selection.

D. Rendernet Narrator

Transform your visual storytelling with the ability to create character-driven images and videos. Upload a video, add a script, and effortlessly sync the character’s lip movements to your words. Narrator enables you to bring your characters to life with ease. Any video featuring a person can be lip-synced, especially if the person is facing the camera and speaking. Discover Narrator, your go-to AI platform for crafting lifelike human character images and videos, and let your creativity shine!

E. Meta Segment Anything Model 2 (SAM 2)

https://github.com/facebookresearch/segment-anything-2

Presenting Meta Segment Anything Model 2 (SAM 2) — the pioneering unified model for real-time, prompt-enabled object segmentation in both images and videos. SAM 2 is released under the Apache 2.0 license, allowing anyone to utilize it for developing their own applications.

F. Black Forest Labs FLUX.1 Models

Black Forest Labs, founded by Robin Rombach and Patrick Esser, has introduced FLUX.1, an open-source text-to-image model offered in Pro and Schnell versions. FLUX.1 is renowned for its exceptional quality and speed, with some users favorably comparing it to DALL-E. The model is accessible on platforms such as Replicate and Glif and is considered a significant advancement in multimodal AI. Additionally, Stability AI has unveiled Stable Fast 3D, a model capable of transforming a single input image into a detailed 3D asset in just 0.5 seconds.

FLUX is a new open-source image generator comparable to Midjourney. While Midjourney excels in aesthetics and skin texture, FLUX is superior in text and anatomy. Users can access FLUX via FAL or Replicate, with image generation costs ranging from $0.003 to $0.05, depending on the model size, and processing times between 1 to 6 seconds.

Key Features:
- A suite of text-to-image models available in various variants.
- FLUX.1 Pro is noted for its cutting-edge capabilities in image detail, prompt adherence, and style diversity.
- Models are accessible through an API and on GitHub.

G. Stability AI introduced Stable Fast 3D

Stable Fast 3D is Stability AI’s latest innovation in 3D asset generation technology. This cutting-edge model can convert a single input image into a detailed 3D asset in just 0.5 seconds, representing a significant advancement in both speed and quality for 3D reconstruction.

The release of the Stable Fast 3D model and its accompanying technical report, which details the approach to achieving rapid inference speeds with optimized illumination and material parameters, has been met with enthusiasm and acclaim from the AI community.

Applications:
Pre-production Experimentation: Utilize the quick inference time to experiment efficiently during the pre-production phase.
Static Game Assets: Ideal for creating background objects, clutter, and furniture in gaming environments.
E-commerce: Perfect for generating 3D models for online retail platforms.
AR/VR: Facilitates the rapid creation of models for augmented and virtual reality applications.

Stable Fast 3D sets a new benchmark in 3D asset generation, combining unparalleled speed with high-quality output.

Something for Programmers — “Every developer can be an AI engineer”

A. GitHub Models

GitHub Models grants developers access to premier AI models like Llama 3.1, GPT-4o, GPT-4o mini, Phi 3, and Mistral Large 2. These models are available in the GitHub Marketplace, allowing for easy testing and integration into your code, thereby simplifying AI-driven development. This platform enhances productivity and fosters innovation by providing advanced tools and resources for smarter application development. GitHub Models makes it straightforward to utilize state-of-the-art AI technology, pushing the limits of software development within the familiar GitHub ecosystem. Whether you’re looking to enhance your project’s capabilities or explore new AI technologies, GitHub Models offers the support necessary to achieve your development objectives.

Exciting Partnerships

A. Hugging and NVIDIA

Hugging Face has established itself as a leading platform for AI models and is now the preferred destination for AI developers, enhancing the accessibility of AI technology. Developers can leverage the power of seamless deployment with NVIDIA NIM, starting with models like Llama 3 8B and Llama 3 70B, on their preferred cloud service providers, all accessible directly from Hugging Face.

NVIDIA NIM in collaboration with Hugging Face, offers superior throughput and near-100% utilization with multiple concurrent requests, enabling enterprises to generate text three times faster. In generative AI applications, token processing is a critical performance metric, and increased token throughput directly translates to higher revenue for enterprises.

B. Canva X Leonardo.Ai

Since launching Canva in 2013, their mission has been to empower everyone to design. As of July 30, Leonardo AI, a generative AI company, has joined Canva to build the world’s leading design AI technology. With Leonardo AI’s advanced foundational model and a team of 120 top-tier researchers, engineers, and designers, this acquisition enhances Canva’s ability to strengthen its expanding suite of AI products. The investment in continued research and innovation aims to unlock the future of visual AI.

AI in Automobiles

Tesla’s FSD Expansion in China

Tesla is making notable progress in its Full Self-Driving (FSD) technology, with recent updates and user feedback showing increased system capabilities. The company has also initiated steps to establish an insurance brokerage in China, which is viewed as a critical move for the potential introduction of FSD in the region. This effort aligns with Tesla’s broader strategy to globalize its FSD technology, with speculation that China may follow the US as the next area to see FSD implementation. These advancements have been well-received by Tesla enthusiasts and investors, who view FSD as a transformative development for both the company and the automotive industry.

Beta Gist

iOS 18.1 Beta: iPhone, iPad and Mac Apple Intelligence available to a chosen few

The beta testing for Apple Intelligence is currently restricted to U.S. users. Apple specifies that both the device and Siri languages must be set to U.S. English, and the device region must be the United States. Apple Intelligence is not available in the EU or China.

Initially set to launch this fall with iOS 18, iPad OS 18, and macOS Sequoia, the full release of Apple Intelligence has been delayed. According to Bloomberg, the new iPhone 16 will debut in September with the iOS 18 beta, while Apple Intelligence will be released a few weeks later with iOS 18.1.

Apple Intelligence introduces several new features focused on writing assistance, image creation and editing, and enhancing Siri’s capabilities. Key features include:

1. Writing Tools
2. Genmoji (Generated Emoji)
3. Image Playground
4. Supercharged Siri — Natural Language Enabled:
a. Personal Contacts
b. Improved Conversation via Typing to Siri
5. Enhanced Priority Notification
6. Generated Memory Movie
7. Prioritize Privacy

For iPad OS users with an Apple Pencil, additional features are available. Smart Script in Notes will tidy up and smooth out handwritten text, while the new Math Notes calculator will solve equations and create interactive graphs with a single tap.

That’s it for the week, we hope you have enjoyed the read!
Leave us a follow on our social handles if you’d like to stay informed of the latest cutting-edge developments in AI.
See you next week!

--

--

Olamide T. Iselowo
Artificial Synapse Media

Data Scientist, AI Practitioner, Lover of Mathematics and Logic, Fitness Enthusiast