Reading Digest, July #16

Daniel Chen
Journey Into AI with Aili
19 min read4 days ago

--

Hey there, my fantastic readers! I hope you’re ready for another exciting edition of my daily reading digest. If you’re new here, prepare to be amazed by the fascinating content I’ve curated just for you. And if you’re a regular, thank you for your continued support — it means the world to me!

Today’s digest is a true feast for the curious mind, ranging from the recovery of business travel spending in all regions except Asia and Europe to the bizarre call Biden made to Harris headquarters hours after dropping out of the race. We’ll explore the latest developments in AI, including California’s new AI bill, the rise of emotional divestment, and why AIs need to stop and think before they answer.

But that’s not all — we’ve got some juicy pieces on the world of tech and business, such as Elon Musk’s X testing letting users request Community Notes on bad posts, how a software update from cyber firm CrowdStrike caused one of the world’s biggest IT blackouts, and Netflix’s focus on ad tech. We’ll also take a closer look at the fight over California’s new AI bill and the concerns about passkeys.

For the curious minds out there, we’ll dive into the world of remarketing statistics, how PPC and SEO can work together, and why management is so male. We’ll also explore the rise of emotional divestment, the cultivation of cults, and what happens when SaaS companies stay private for longer.

But that’s just the tip of the iceberg, my friends. From the United Nations’ AI policy grab to the latest research on ternary, quantized, and FP16 language models, this digest has something for everyone. We’ll even take a closer look at the ghost of Jeffrey Epstein endorsing Trump at the RNC and the potential for a conflict-borne superbug to bring on our next pandemic.

So, grab your favorite beverage, get comfortable, and join me on this thrilling journey through the world of online content. I can’t wait to hear your thoughts and reactions in the comments below!

Happy reading, my incredible friends!

Business travel spending recovers in all regions but Asia and Europe

The article discusses the recovery of global business travel spending in 2023, with some regions reaching pre-pandemic levels while others still lag behind. Key points:

  • Global business travel spending in 2023 increased 30% to $1.34 trillion, but remains 7% below pre-pandemic levels.
  • North America, Latin America, the Middle East and Africa have recovered to pre-pandemic spending levels.
  • The Asia-Pacific region saw the fastest growth, led by South Korea and India, but China’s recovery is still lagging.
  • Western Europe’s spending increased 33% in 2023 but is still 6% below pre-pandemic levels.
  • The industry association predicts spending will reach a new record of $1.48 trillion by the end of 2024.

Biden makes bizarre call in to Harris headquarters hours after dropping out of race

The article discusses President Biden’s withdrawal from the presidential race and his endorsement of Vice President Harris, who is expected to become the Democratic Party’s nominee.

Takeaways from the congressional grilling of Secret Service Director Cheatle on the Trump assassination attempt | CNN Politics

The article discusses the testimony of US Secret Service Director Kimberly Cheatle before the House Oversight Committee regarding the security failures that led to the recent assassination attempt against former President Donald Trump. The key points covered include:

  • Cheatle acknowledged the incident was the “most significant operational failure at the Secret Service in decades” and the worst since the 1981 assassination attempt on President Reagan.
  • Lawmakers from both parties called for Cheatle’s resignation, but she insisted she would remain in her role.
  • Cheatle provided carefully worded answers and non-answers to questions about the security breakdown, citing the ongoing FBI investigation.
  • The article also discusses the use of an AR-15-style weapon in the attack and the broader issue of gun violence in the US.

Harris has support of enough Democratic delegates to become party’s presidential nominee: AP survey

The article discusses Vice President Kamala Harris securing the support of enough Democratic delegates to become her party’s nominee against Republican Donald Trump, after President Joe Biden’s decision to drop his bid for reelection. The article covers the quick coalescing behind Harris by the Democratic party, her campaign’s record-breaking fundraising, and her plans to unite the party and win the election.

Elon Musk’s X tests letting users request Community Notes on bad posts

The article discusses the expansion of Twitter’s fact-checking service, now called Community Notes, to allow users on Elon Musk’s X platform to request fact-checking on problematic posts.

Inside the fight over California’s new AI bill

The article discusses a bill introduced by California state Senator Scott Wiener (D-San Francisco) called the “Safe and Secure Innovation for Frontier Artificial Intelligence Models” (SB 1047). The bill requires companies training “frontier models” that cost more than $100 million to do safety testing and be able to shut off their models in the event of a safety incident. The article covers the challenges and criticisms of the bill from the tech industry, as well as Wiener’s responses.

Concerns about passkeys

The article discusses concerns about the implementation and implications of passkeys, a new authentication technology. It highlights issues around the lack of user control and migration options, the potential for discrimination against certain passkey apps, and the centralization of power that passkeys could enable.

How a software update from cyber firm CrowdStrike caused one of the world’s biggest IT blackouts

The article discusses a global IT outage caused by a software update from the cybersecurity firm CrowdStrike, which led to a cascade effect across various industries, including banking, healthcare, and air travel.

🍿 Netflix: Ad Tech Focus

The article provides an overview of Netflix’s Q2 FY24 earnings report, covering key metrics, financial performance, and strategic initiatives such as the introduction of an ad-supported tier, expansion into live sports, and the growth of its gaming business. It also discusses Netflix’s viewership trends, content strategy, and competitive positioning in the streaming market.

15 Remarketing Statistics to Prove How Valuable It Is

The article discusses the effectiveness of remarketing, a digital marketing strategy where businesses target ads towards people who have previously visited their website. It presents 15 statistics that demonstrate the benefits and impact of remarketing, covering areas such as customer acquisition, brand awareness, click-through rates, and campaign performance.

4 Ways PPC and SEO Can Work Together (And When They Can’t)

The article discusses the relationship between search engine optimization (SEO) and pay-per-click (PPC) advertising, and how they can work together to improve marketing efforts.

The Rise of Emotional Divestment

The article explores the widespread and insidious shifts in how we experience each other face-to-face, not just in our most intimate relationships but also in our communities and public spaces, which have been accelerated by the pandemic. It discusses how the focus on facing dark realities through deep connection has been replaced by a divestment from communal experiences of sadness and all external emotional sources, leading to a state of merciless sterility and impatience with in-person interactions.

Why is Management So Male?

The article discusses the gender gap in senior management positions, particularly in the context of a large European manufacturing firm. It examines the drivers behind the lower application rates of women for promotion and leadership roles, despite their higher success rates when they do apply.

Cultivating Cults

The article discusses the importance of storytelling and belief in building and investing in tech companies. It explores the concept of “cults” and how they can be both powerful and dangerous, and how every company, movement, and organization is essentially a cult. The article also covers the role of the “Chief Evangelist” in amplifying a company’s story, the power of controversy, capital, competence, and charisma in storytelling, and the importance of “getting away with it” as a storyteller. Finally, the article delves into the nature of good and bad cults, the role of truth and criticism in building a successful cult, and the importance of carefully choosing what to worship.

Eschewing IPOs: What happens when SaaS companies stay private for longer

The article discusses the recent trends in the SaaS IPO market, including the maturity of SaaS companies going public, the implications for employees, VCs, and public market investors, as well as the debate around the optimal timing for companies to go public.

AI paid for by Ads — the gpt-4o mini inflection point

The article discusses the potential of using AI-generated content, specifically the GPT-4o mini model, to create dynamic, ad-supported blog posts. It explores the cost of generating AI content compared to the potential revenue from ad impressions, and whether this could become a viable business model for the future of the internet.

Working with AI (Part 2): Code Conversion

https://aili.app/share/2uvnnzC1LJ6KkFrgrJqmwX

The article discusses how Mantle, a software development company, leveraged large language models (LLMs) to streamline the process of converting a prototype project into a production-ready codebase. The key points covered include:

  • The common challenge of converting a prototype project into a production-ready codebase, and the reasons why organizations often undertake such efforts.
  • Mantle’s approach of using an LLM to translate the prototype code written in R into their standard production tech stack of Golang and ReactJS.
  • The techniques used to provide the LLM with the necessary context, including inserting the existing prototype codebase, summarizing the target architecture and libraries, and incorporating screenshots of the existing application.
  • The iterative process of generating code file-by-file, starting from backend to frontend and leaf node files, and the challenges around output token limitations.
  • The benefits of this approach, including saving two-thirds of the time needed for the code conversion task.

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

The article discusses the task interference problem in multimodal large language models (MLLMs) and proposes a Mixture of Multimodal Experts (MoME) architecture to mitigate it. The key components of MoME are:

  • Mixture of Vision Experts (MoVE): Adaptively aggregates visual features from various vision encoders using an Adaptive Deformable Transformation (ADT) module and an instance-level soft router.
  • Mixture of Language Experts (MoLE): Incorporates sparsely gated experts into the language model to achieve performance gains with minimal computational overhead.

The authors demonstrate that MoME can effectively adapt to task differences in both vision and language modalities, leading to significant performance improvements across various vision-language tasks compared to generalist MLLMs.

Inside the United Nations’ AI policy grab

The article discusses a forthcoming UN report that proposes the creation of a global AI governance forum to address the proliferation of AI-related initiatives and the lack of representation from the global South.

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

The article presents a comprehensive study of ternary, quantized, and FP16 language models, called the Spectra LLM suite. It includes 54 language models ranging from 99M to 3.9B parameters, trained on 300B tokens. The suite includes FloatLMs, post-training quantized QuantLMs (3, 4, 6, and 8 bits), and ternary LLMs (TriLMs) — an improved architecture for ternary language modeling that outperforms previously proposed ternary models. The article evaluates the performance of these models across various benchmarks related to commonsense, reasoning, knowledge, and toxicity, and provides insights into their training dynamics and scaling trends.

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

The article investigates the impact of vocabulary size on the scaling of large language models (LLMs). It proposes three complementary approaches to predict the compute-optimal vocabulary size, considering the trade-off between model complexity and computational constraints. The key findings are:

  1. Vocabulary parameters should be scaled slower than non-vocabulary parameters as models become more computationally intensive.
  2. Most existing LLMs use vocabulary sizes that are smaller than the optimal, as predicted by the proposed approaches.
  3. Adopting the predicted optimal vocabulary size consistently improves downstream performance over commonly used vocabulary sizes.

Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together

The article discusses the problem of optimizing the performance of natural language processing (NLP) systems that are built as multi-stage pipelines involving multiple distinct language models (LMs) and prompting strategies. The key points are:

  • NLP systems are increasingly taking the form of multi-stage pipelines with multiple LMs and prompting strategies.
  • The authors address the challenge of how to fine-tune such systems to improve their performance.
  • They cast this as a problem of optimizing the underlying LM weights and the prompting strategies together.
  • They consider a realistic scenario where there are no gold labels for any intermediate stages in the pipeline.
  • They evaluate approximate optimization strategies where they bootstrap training labels for all pipeline stages and use these to optimize the pipeline’s prompts and fine-tune its weights alternatingly.
  • Experiments on multi-hop QA, mathematical reasoning, and feature-based classification show that optimizing prompts and weights together outperforms optimizing just prompts or just weights.

Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation

The article introduces FLAMe, a family of foundational autorater models that can perform various quality assessment tasks. FLAMe is trained on a large and diverse collection of curated and standardized human evaluations derived exclusively from permissively licensed datasets. The article demonstrates FLAMe’s strong zero-shot generalization abilities, outperforming models trained on proprietary data like GPT-4 and Claude-3 on many held-out tasks. FLAMe can also effectively serve as a powerful starting point for further downstream fine-tuning, as shown in the case of reward modeling evaluation. The article also presents a computationally efficient approach to optimize the FLAMe multitask mixture for targeted distributions. Overall, FLAMe variants outperform popular proprietary LLM-as-a-Judge models across various autorater evaluation benchmarks, while exhibiting lower bias and effectively identifying high-quality responses for code generation.

Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation

The paper investigates the challenges faced by Large Language Models (LLMs) in addressing knowledge-intensive queries and factual question-answering tasks, despite their proficiency in generating coherent text. To mitigate this, the authors explore Retrieval-Augmented Generation (RAG) systems that incorporate external knowledge sources, such as structured knowledge graphs (KGs). However, the authors observe that LLMs often struggle to produce accurate answers despite access to KG-extracted information containing necessary facts.

The study analyzes error patterns in existing KG-based RAG methods and identifies eight critical failure points, categorized into Reasoning Failures and KG Topology Challenges. The authors find that these errors predominantly occur due to insufficient focus on discerning the question’s intent and adequately gathering relevant context from the knowledge graph facts.

Drawing on this analysis, the authors propose the Mindful-RAG approach, a framework designed for intent-based and contextually aligned knowledge retrieval. This method explicitly targets the identified failures and offers improvements in the correctness and relevance of responses provided by LLMs, representing a significant step forward from existing methods.

Qwen2 Technical Report

This report introduces the Qwen2 series, the latest addition to the Qwen family of large language models and large multimodal models. The Qwen2 series includes a comprehensive suite of foundational and instruction-tuned language models, ranging from 0.5 to 72 billion parameters, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning.

Weak-to-Strong Reasoning

The paper introduces a weak-to-strong learning framework that enables a strong language model to autonomously refine its training data and enhance its reasoning capabilities, without requiring input from a more advanced model or human-annotated data. The framework consists of two stages: (1) supervised fine-tuning on a selective small but high-quality dataset, and (2) preference optimization on contrastive samples identified by the strong model itself. Experiments on the GSM8K and MATH datasets demonstrate significant improvements in the reasoning capabilities of Llama2–70b using three separate weak models. The method is further validated on the challenging OlympicArena dataset, where Llama3–8b-instruct effectively supervises Llama3–70b.

Prover-verifier games improve legibility of llm outputs

The paper studies how to make the outputs of large language models (LLMs) more legible to humans. It proposes a training algorithm inspired by the Prover-Verifier Game, where a “helpful” prover is trained to produce correct and convincing solutions, while a “sneaky” prover is trained to produce incorrect but convincing solutions. The goal is to train a robust verifier that can distinguish between the helpful and sneaky provers’ outputs. The paper finds that this training approach can improve the legibility of the helpful prover’s solutions to both the verifier and to human evaluators, while also making the verifier more robust to adversarial attacks.

I watched my first and last horror film aged 12 and I’ve not been the same since

The article is a personal reflection by the author on their fear of horror movies, particularly the 2005 film “Hide and Seek”, and how it has impacted their life and career.

Could a Conflict-Borne Superbug Bring on Our Next Pandemic?

The article discusses the growing threat of drug-resistant bacteria, or “superbugs,” that are spreading in conflict zones and the potential for these superbugs to cause a global pandemic. It examines how the devastation of war creates an environment that allows these superbugs to thrive and spread, and how the issue has been largely overlooked by governments and the medical community.

Why AIs Need to Stop and Think Before They Answer

The article discusses the concept of “chain of thought prompting” — a method for getting the most out of AI assistants like ChatGPT. It explores how this technique can lead to more sophisticated and accurate results compared to simply asking the AI to perform a task without any planning or reasoning steps.

At the End of Empire

The article discusses the assassination attempt on former US President Donald Trump, and how it fits into the broader context of America’s changing political landscape. It explores the questions surrounding the incident, the role of government agencies and security forces, and the implications for the country’s future.

RNC Ends on High Note as Ghost of Jeffrey Epstein Endorses Trump

The article reports on a surprise appearance by the ghost of Jeffrey Epstein at the Republican National Convention, where he endorsed Donald Trump for president.

Don’t Rock the Vote

The article discusses the shifting political landscape in the United States, where high voter turnout is now seen to benefit the Republican party rather than the Democratic party. It examines polling data, demographic trends, and election results that suggest Democrats perform better in low-turnout environments, contrary to the conventional wisdom that increasing voter turnout helps Democrats.

The New Pornographers — THE BITTER SOUTHERNER

The article discusses the diverse and often strange content that can be found on the social media platform TikTok, including:

  • “Viral” dances created by Black creators
  • Niche communities like housewives who obsessively organize their homes and document it
  • Unusual or entertaining content creators like attractive farriers, tree-chopping men, and masked chefs
  • Sick or disabled people sharing their daily routines
  • An endless stream of meme-like videos where people mimic or riff on popular trends
  • The author’s father, a technologically competent older man, introducing the author to Haitian TikTok content

The article also explores the power of TikTok’s algorithm in surfacing content tailored to individual users, and how this has created a new digital economy where some creators achieve fame and success while many others struggle.

Project Strawberry, OpenAI’s Leaked Breakthrough

The article discusses a new AI model called “Strawberry” developed by OpenAI, which is claimed to have significant improvements in reasoning capabilities compared to current large language models (LLMs). The article provides insights into the technical details of Strawberry, including how it enhances the model’s System 1 (fast, intuitive) and System 2 (slow, deliberate) thinking capabilities. It also discusses the potential implications of Strawberry for OpenAI’s future and the broader AI industry.

Don’t Google It

The article discusses the deterioration of Google’s search engine, which was once a “glorious feature” that allowed users to find almost any information, but has now been compromised by SEO tactics, ads, and Google’s own manipulation of search results. The article also covers the recent launch of Google’s AI Overviews feature, which aims to provide AI-generated summaries of search results, but has been plagued by inaccurate and misleading information pulled from questionable sources like The Onion and Reddit. The article argues that this move by Google is reckless and could be dangerous, as many users trust Google as the primary source of information.

Reaching Peak Meeting Efficiency

The article discusses the importance of meetings in building a high-performance team with shared values, and provides insights into common meeting dysfunctions and best practices for effective meetings.

Is This Our Best Bet at Conquering AI Reasoning?

The article discusses the limitations of current large language models (LLMs) in terms of their reasoning capabilities, and proposes methods to improve their performance through data augmentation and training techniques.

Marc Andreessen’s Guide to Finding Product-Market Fit (PMF) !

The article discusses the concept of product-market fit (PMF) and provides a comprehensive guide on how to identify and achieve it. It covers both qualitative and quantitative aspects of PMF, including definitions, metrics, and real-world examples.

‘Slop’ and ‘Content’

The article discusses the concept of “slop” in the context of modern media and entertainment, exploring how the abundance of content and the prioritization of scale and popularity over quality have led to a proliferation of forgettable and artless “slop” across various platforms and genres.

Slop is the new name for unwanted AI-generated content

The article discusses the emergence of the term “slop” as a new term to describe unwanted AI-generated content, similar to how “spam” became the term for unwanted emails. The author is a proponent of using large language models (LLMs) for personal productivity and building applications, but believes that sharing unreviewed, artificially generated content with others is rude. The author proposes “slop” as the ideal name for this anti-pattern and suggests that “slom” could be a good name for the equivalent subset of spam that was generated with AI tools.

After years of uncertainty, Google says it won’t be ‘deprecating third-party cookies’ in Chrome

The article discusses Google’s decision to keep third-party cookies in its Chrome browser, despite previous plans to phase them out.

Our website: https://aili.app

Follow us on X (Twitter): https://x.com/aili_app

Join our discord channel: https://discord.gg/CQtysdQfDM

--

--