AI Weekly Buzz: Innovations, Insights, and Industry Trends August 11th — 17th

Published in

Artificial Synapse Media

6 min readAug 17, 2024

Welcome to another edition of AI Weekly Buzz. From groundbreaking developments in AI technologies to significant regulatory moves, here’s a detailed look at the key happenings from August 11 to August 16, 2024.

New Releases

xAI’s Grok 2 Welcomed New Family Members

The language model, Grok-2, got two beta releases this week. They are

Grok-2 is a state-of-the-art AI assistant with advanced text and vision understanding capabilities, integrating real-time information from the 𝕏 platform, accessible through the Grok tab in the 𝕏 app.
Grok-2 Mini is a small but capable model that balances the speed and quality of answers.

Two key areas focused on evaluating the model’s capabilities were following instructions and providing accurate information. Grok-2 has shown significant improvements in reasoning with retrieved content and its tool-use capabilities, such as correctly identifying missing information, reasoning through sequences of events, and discarding irrelevant posts.

Both Grok-2 and Grok-2 mini demonstrate significant improvements over the previous Grok-1.5 model. They achieve performance levels that are competitive with other frontier models in areas such as graduate-level science knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems (MATH). Additionally, Grok-2 excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA).

If you are a Premium or Premium+ subscriber, update to the latest version of the 𝕏 app to beta test Grok-2. Grok-2 and Grok-2 mini will also be available to developers through xAI’s enterprise API platform later this month.

Google NeuralGCM

Neural GCM is a model for the Earth’s atmosphere. It combines physics-based simulations with Artificial Intelligence neural networks to simulate Earth’s atmosphere. The capabilities include

Simulation in 24 hours of 70,000 days on a single TPU, unlike a regular high-resolution physics-based model that simulates 24 hours in only 19 days with 13,000 CPUs
Better precision for current and future climate predictions.
NeuralGCM beat traditional models in 95% of 2–15 day forecasts in tests.
More speed, as NeuralGCM gives forecasts with 15–50% less error than state-of-the-art models in some minutes, unlike traditional methods that require a few days.

A New Research Scientist

Sakana AI recently introduced “The AI Scientist”, a system that can automate the entire research process, enabling models like Large Language Models (LLMs) to perform research independently. In their report,

Sakana AI introduced an automated peer review process to evaluate generated papers, write feedback, and improve results. It can determine generated papers with near-human accuracy.
The automated scientific discovery process is repeated to iteratively develop ideas in an open-ended fashion and add them to a growing archive of knowledge, thus imitating the human scientific community.
In this first demonstration, “The AI Scientist” researches diverse subfields within machine learning research, discovering novel contributions in popular areas, such as diffusion models, transformers, and grokking.

The AI Scientist automates the entire research lifecycle, from generating novel research ideas, writing any necessary code, and executing experiments to summarizing experimental results, visualizing them, and presenting its findings in a complete scientific manuscript.

OpenAI introduces SWE-bench Verified.

Evaluation metrics are just as important as models. This subset of SWE-bench evaluates models’ ability to solve real-world software issues. The benchmark involves giving agents a code repository and issue description and challenging them to generate a patch that resolves the problem described by the issue. OpenAI’s version of the SWE-bench is a preparedness evaluation that improves the benchmark by reducing the potential to underestimate or overestimate model performance. Areas that were improved on

The unit tests used to evaluate the correctness of a solution are often overly specific and, in some cases, unrelated to the issue. This can potentially cause correct solutions to be rejected.
Many samples have an issue description that is underspecified, leading to ambiguity about the problem and how it should be solved.
It is sometimes tricky to reliably set up the SWE-bench development environments for the agents, inadvertently causing unit tests to fail regardless of the solution. In such cases, perfectly valid solutions might be graded as incorrect.

The dataset used for this benchmark can be found at Huggingface.

AI in Mobile Devices

New Pixel Phones, Pixel Watch 3 and Pixel Buds Pro 2

Pixel phones now feature Gemini, an AI assistant that assists with note-taking, setting reminders, and answering questions.
New AI-powered camera features on Pixel phones include “Add Me,” which allows you to insert yourself into group photos, and “Super Res Zoom Video,” enabling you to zoom in on videos without compromising quality.
The Pixel Watch 3 can automatically detect when you’re sleeping and activate Bedtime Mode, conserving battery and silencing notifications.
Pixel Buds Pro 2 are equipped with a new chip that drives next-generation Active Noise Cancellation, doubling the noise reduction compared to the previous model.

Oppo Mobile Devices

Earlier in the week, the company confirmed the Pro+ mid-ranger is getting four Generative AI features on August 22 via a software update. A vanilla mid-ranger Oppo F27 5G is also in the works. AI Features to expect are

OTA update with AI Eraser 2.0, which removes background objects in the F27 Pro+
AI Smart Image Matting 2.0, which crops objects as stickers
AI Studio, which generates images out of drawings
AI LinkBoost to enhance call quality and optimize network usage.

Partnerships

Palantir x Microsoft

This is a first-of-its-kind collaboration between technology and national security agencies. The partnership involves

Bringing some of the most sophisticated and secure cloud, AI, and analytics capabilities to the U.S. Defense and Intelligence Community
Palantir will deploy its suite of products—Foundry, Gotham, Apollo, and AIP—in Microsoft Azure Government and the Azure Government Secret (DoD Impact Level 6) and Top Secret clouds.
Providing the Defense and Intelligence communities with boot camp experiences to try out the technology

“This expanded partnership between Microsoft and Palantir will help accelerate the safe, secure, and responsible deployment of advanced AI capabilities for the US government,” said Deb Cupp, President of Microsoft Americas.

Word on the Street

Language AI

Over seven thousand languages are spoken in different parts of the Earth. Some are gradually going extinct, and many people are unable to fully understand others because of language barriers. There is a need for Language inclusion, and that is what Google offers using Language AI.

Their research includes

…

That’s it for the week. We hope you enjoyed the read!
Please follow us on our social media handles to stay informed of the latest cutting-edge developments in AI.
See you next week!