AI Weekly Buzz: Innovations, Insights, and Industry Trends July 14th — 20th
Welcome to another edition of AI Weekly Buzz. There have been new releases, updates and various discussions during the week. From groundbreaking developments in AI technologies to significant regulatory moves, here’s a detailed look at the key happenings from July 14 to July 19, 2024.
Let’s start with some new tools
Tools of The Week
A. GPT-4o mini
We have seen progress in the field of Generative Pretrained Transformers (GPTs) since its inception and on Thursday, Open AI released their most cost-efficient version yet. It supports similar languages as GPT 4o and the aim is to make the use of AI in applications more accessible and more cost-friendly.
Capabilities of GPT-40 mini
- The small model currently supports text and vision. Open AI promises that it will support inputs and outputs in the form of text, images, video, and audio in the future.
- It supports low latency and a large volume of model context. This means faster responses and capabilities to handle large input e.g. an entire code base.
- GPT-4o mini can handle non-English text, thanks to the GPT-4o capabilities it inherits.
Model Evaluation
Open AI partnered with Ramp, Superhuman and other companies to understand possible applications and limitations of the model.
GPT-4o mini was compared to Gemini Flash, Claude Haiku, GPT-3.5 Turbo and GPT-4o based on reasoning tasks, multimodal reasoning and math and coding proficiency. The diagram below gives details.
Let’s break down the Evaluation Benchmarks …
MMLU: Measuring Massive Multitask Language Understanding (MMLU) is a benchmark for evaluating the capabilities of language models. It consists of about 16,000 multiple-choice questions spanning 57 academic subjects including mathematics, philosophy, law, and medicine. It is one of the most commonly used benchmarks for comparing the capabilities of large language models
GPQA: GPQA stands for Graduate-Level Google-Proof Q&A Benchmark. It’s a challenging dataset designed to evaluate the capabilities of Large Language Models (LLMs) and scalable oversight mechanisms. GPQA consists of 448 multiple-choice questions meticulously crafted by domain experts in biology, physics, and chemistry. These questions are intentionally designed to be high-quality and extremely difficult.
DROP: Discrete Reasoning Over the content of Paragraphs (DROP) is a crowdsourced, adversarially-created, 55k-question benchmark. Systems must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs, as they remove the paraphrase-and-entity-typing shortcuts available in prior datasets.
MGSM: Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems. The same 250 problems from GSM8K are each translated via human annotators in 10 languages. GSM8K (Grade School Math 8K) is a dataset of 8.5K high-quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
MATH: MathEval is a benchmark dedicated to the holistic evaluation on the mathematical capacities of LLMs. It covers a wide range of topics and difficulties, providing a comprehensive test of mathematical problem-solving skills
HumanEval: OpenAI developed HumanEval and released it in 2021 with the Codex model. The HumanEval benchmark is a dataset designed to evaluate the code generation capabilities of large language models (LLMs). It consists of 164 hand-crafted programming challenges, each including a function signature, docstring, body, and several unit tests, averaging 7.7 tests per problem. These challenges assess a model’s understanding of language, algorithms, and simple mathematics, and are comparable to simple software interview questions.
MMMU: The MMLU (Massive Multitask Language Understanding) benchmark evaluates language models across a diverse set of 57 subjects. It tests both world knowledge and problem-solving abilities.
For more info check out this link: OPENAI GPT-4o mini
B. Mistral NeMo
Mistral AI in collaboration with NVIDIA released a 12B small model with 128k context length and under the Apache 2.0 license.
Capabilities of Mistral Nemo
- It is designed for multilingual applications and is strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
- It compresses natural language text and source code more efficiently than the SentencePiece tokenizer used in previous Mistral models. It is much better at following precise instructions, reasoning, handling multi-turn conversations, and generating code.
For more info: Minstal Nemo
C. Sage
Cropin, powered by Google Gemini launched a real-time AI-infused agricultural platform this week. The company aims to transform the agricultural space with the following benefits:
1. Informed Decision-making: Imagine querying critical crop data (lifecycles, climate, soil) and receiving clear insights in seconds — how would this transform decision-making for stakeholders?
2. Global Visibility for Procurement: Could pinpointing ideal locations for crops anywhere in seconds, with real-time production data, help mitigate risks associated with fluctuating yields warranting last-minute adjustments?
3. Optimized Crop Selection: By analyzing crop lifecycles and weather dependencies, what if identifying the best regions for cultivation (reducing trial periods) and predicting future yields (for governments to ensure food security) was as easy as asking a question?
4. Predictive Insights: What if you could predict underperforming crops, understand edge cases, and even identify potential problem regions — empowering proactive responses?
To read more: Cropin Sage
D. Stability AI’s SD3 Stable Diffusion 3 Medium
Stable Diffusion 3 Medium is Stability AI’s most advanced text-to-image open model yet. It is now available on TensorArt to let users train models and earn $70-$100 weekly.
For more information: Stability AI’s SD3
E. FireBench: Using high-performance computing to advance machine learning and wildfire research
FireBench is a high-resolution, simulation dataset designed to advance wildfire research. FireBench enables investigations of wildfire spread behaviour and the coupling between atmospheric hydrodynamics and fire physics by extending beyond just fire states to also include a comprehensive list of flow field variables in three dimensions. It also supports the development of robust and interpretable ML models by capturing the underlying dependencies between relevant variables.
For more information: FireBench
…
AI in the Geopolitical Arena
- European Union
In March 2024, the European Parliament approved the Artificial Intelligence Act, which takes a risk-based approach to ensure companies release products that comply with the law before they are made available to the public. The Act categorized risks into minimal risk, high risk and unacceptable risks. The EU lawmakers say that this is important to reduce the dissemination of deepfakes usually caused by hallucination of models, especially because they could mislead voters during elections.
Implications of this Act manifest as Meta has announced it will not be launching its upcoming multimodal AI model in the European Union due to regulatory concerns this week. This is like Apple’s move to exclude the EU from its Apple Intelligence rollout. Meta’s Llama 3 model will still be launched, but only a text version will be available in the EU.
…
Word on the streets of AI
- Meta in talks for 5% stake with Ray-Ban maker, EssilorLuxottica, intensifying Smart Glasses goals
For more information about the smart glasses: Meta Smart Glasses
2. OpenAI is in talks with Broadcom to develop new custom AI chip
OpenAI has been in talks with semiconductor designers including Broadcom about developing a new chip, as the artificial intelligence company looks to ease its reliance on Nvidia and bolster its supply chain.
3. Open-source GPU Kernel Modules by Nvidia in the upcoming R560 driver release
Capabilities
1. Heterogeneous memory management (HMM) support
2. Confidential computing
3. The coherent memory architectures of our Grace platforms
For more information: Nvidia Open-Source GPU Kernel Modules
4. Apple to undertake new Immersive Video series, films, and concerts on Apple Vision Pro beginning July 18
The projects include:
- Boundless
- Wild Life
- Elevated
- Submerged
- 2024 NBA All-Star Weekend
- An Immersive Experience from The Weeknd
- Red Bull: Big-Wave Surfing
- Empowering Filmmakers with New Tools for Apple Immersive Video
For more information: Apple Vision Pro Series
…
The world of AI keeps getting more exciting each week, with more capabilities and better implementations. I cannot imagine what next week would bring but my hopes are high.
Keep an eye out for next week’s edition of AI Weekly Buzz, where more innovations, insights and trends will be delivered. Keep exploring till then.