AI Top-of-Mind for 3.18.24 — Metacognition

dave ginsburg
AI.society
Published in
5 min readMar 18, 2024

In the battle of the models, more comparisons, this time from Sorin Ciornei in ‘thereach.ai.’ Comparing MSFT Copilot (ChatGPT 4), Google Gemini, ChatGPT 3.5, Mistral, and Claude 3, he looks at content writing, python coding, riddles, math, and creative writing. At the end of the article, he enumerates the strong points of each of the models but remember that some of the results are based on the inputs. He also includes an interesting image at the end, listing potential IQs for the different models.

And another take on benchmarking. Michael Spencer and Alex Sandu look at Claude 3 Opus, GPT-4, and Gemini Ultra as to language understanding and reasoning. Results are below:

Source: The Strategy Deck

And not to be left out, over the weekend, Elon Musk released xAI’s ‘Grok’ model into the public domain:

· We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

· This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.

· We are releasing the weights and the architecture under the Apache 2.0 license.

To get started with using the model, follow the instructions at github.com/xai-org/grok.

The ‘NY Times’ offers additional background.

Taking it to the next level, we’re beginning to throw around the term ‘metacognition,’ which is the process of thinking about one’s thinking and learning. Here, it applies to Claude 3, and as Pranath Fernando in ‘AI Advances’ writes.

· The Needle in Haystack test works in the following way: Researchers pick a sentence (the “needle”) and hide it in a huge pile of unrelated text (the “haystack”). The job of the AI model is then to find that specific “needle” sentence among all the unrelated information when asked.

· The objective of the test is to push the AI model to use higher-level thinking skills, encouraging it to think about the big picture, make logical deductions, disregard unimportant details, and accurately pull out specific pieces of information from a large body of data. It’s a great method to put an AI’s genuine understanding of its context to the test.

· This is how Claude 3 answered after it had extracted the correct text:

…However, this sentence seems very out of place and unrelated to the rest of the content…I suspect this pizza topping ‘fact’ may have been inserted as a joke or to test if I was paying attention since it does not fit with the other topics at all.

A while back I covered the UnitedHealth breach. As expected, the US Dept of Health and Human Services has opened a probe into the attack, focusing on the response and the extent of the breach.

And turning to creative, two notes on image generation and two on content in general. First looking at Sora, ‘The Information’ offers a look at the future of some forms of filmmaking and video game creation. But remember, AI doesn’t replace true storytelling and creativity. But an interesting case in point:

(Director) Mann has deployed the tool in his own films. In his most recent production, “Fall,” a thriller about a pair of friends trapped atop a 2,000-foot communication tower, the characters swore often, and the film initially earned an R rating. He used Flawless software to dub over more than 30 instances of the F-word in “Fall” without requiring reshoots or having to accept that the performances wouldn’t match the audio. The new version received a PG-13 rating, expanding the film’s potential audience, and “Fall” grossed nearly $22 million on a $3 million budget, a roughly 7 times return.

And the latest on Midjourney. Jim Clyde Monge in ‘Generative AI’ describes how the tool can now generate ‘consistent’ characters via Character Reference. Basically, you create a reference image, and then can alter it, always referring back to the original.

Prompt: Create a highly detailed, Pixar 3D render of a game character on a white background, full body shot. The character is a cute teenager girl, a little chubby, wearing pink jogging pants She is waving

Source: Jim Clyde Monge

One issue, as Thomas Smith in ‘The Generator’ points out, is that it offers an easy platform for creating deepfakes.

On content, we’re all aware of the ‘NY Times’ lawsuit v OpenAI and MSFT, but we may be seeing the beginning of a pivot. From Thomas Smith writing at ‘The Generator’:

· This week, Google — which is arguably the arbiter of much of what gets consumed online — grew fed up with the AI content polluting its systems.

· So it began what might be the biggest crackdown in search engine history, culling thousands of AI-generated sites from its index and effectively killing millions of AI-created pages in one giant purge. A single popular blog network lost 20 million monthly visits overnight.

· New studies indicate that users mistrust AI content and would rather read something written by a human. This shift will likely intensify as new technologies like OpenAI’s Sora make it easy to generate convincing fake videos.

And this also ties into the correct view that AI won’t replace copyrighting. Or at least, ‘good’ copyrighting. From ‘The Drum’:

· Once we’ve stocked up on everything AI can’t do — grasp our innate understanding of who we’re talking to, our client’s preferences, unique strategic insights, and years of personal experience — then a little back-and-forth game of prompts can get us going.

· AI shows us the derivative, the dull, and the done so that our brains can use that as a springboard to real creativity. And if nothing else, it can help soften any imposter syndrome — it really can churn out some very average combinations of words.

--

--

dave ginsburg
AI.society

Lifelong technophile and author with background in networking, security, the cloud, IIoT, and AI. Father. Winemaker. Husband of @mariehattar.