Reading Digest, September #2

Daniel Chen
Journey Into AI with Aili
13 min readSep 4, 2024

--

Hey there, my amazing readers! I hope you’re ready for another exciting edition of my daily reading digest. If you’re new here, get ready for a wild ride through the fascinating world of online content. And if you’re a regular, thank you for your continued support — it means the world to me!

Today’s digest is a true buffet of captivating topics, ranging from the growing threat to OpenAI to the idea that not everything is physics. We’ll explore how to change a chatbot’s mind and dive into a thought-provoking thread by @martin_casado on Thread Reader App.

But that’s not all — we’ve got some intriguing pieces on the latest developments in AI and tech. From Amazon’s launch of a new AI-powered Alexa using Anthropic’s Claude to what a great investor can do for you, this digest has something for everyone. We’ll even explore the startup world outside Silicon Valley and China’s robot makers chasing Tesla to deliver humanoid workers.

For the tech enthusiasts among us, we’ve got articles on the release of Re-LAION 5B, a transparent iteration on LAION-5B with additional safety fixes, and a critical look at the “debt-trap diplomacy” narrative. We’ll also take a closer look at the great stagnation in machine learning and why the loneliness epidemic is so hard to cure.

But that’s just the tip of the iceberg, my friends. From the truth about ‘close door’ elevator buttons to AI and the decline of human intelligence, and the superpower of boredom that you’re still afraid to use, this digest covers a wide range of topics that are sure to pique your interest. We’ll even explore the law of funnels, AI as a collaborator rather than a replacer, and why relying on clowns churning out guides written by ChatGPT won’t help you learn how to spot AI.

So, grab your favorite beverage, get comfortable, and join me on this thrilling journey through the world of online content. I can’t wait to hear your thoughts and reactions in the comments below!

Happy reading, my incredible friends!

The Ontological Shock of AI

The article explores the concept of “ontological shock” — the disorientation and confusion that arises when encountering something that subverts our basic assumptions about reality. It discusses how the emergence of AI represents a similar challenge to our deeply held views of how the world is structured, and how much of the discussion around AI is devoted to preserving our ontological security rather than understanding AI as it is.

No, Elon, you can’t make a “WeChat of the West”

The article discusses Elon Musk’s plans to transform Twitter, which he acquired in 2022, into an “everything app” or “WeChat of the West.” It explores the challenges and skepticism surrounding Musk’s vision, as well as the cultural and competitive differences between the Chinese super-app WeChat and the potential for a similar platform in the West.

If you want to learn how to spot AI, don’t rely on clowns churning out guides written by ChatGPT.

The article discusses the issue of AI-generated “how to spot AI” guides that provide misleading information. The author argues that these guides are often created by inexperienced or malicious actors, and that readers should be wary of such content. The article provides guidance on how to identify genuine, reliable sources of information on AI detection.

AI as a Collaborator, not a Replacer

The article discusses the impact of generative AI tools on the creative industries, and how artists, writers, musicians, and other creators can adapt to and leverage these new technologies.

The Law of Funnels

The article discusses the “Law of Funnels” — the principle that the more steps a person has to go through to complete a task, the less likely they are to actually complete it. The author explains how this law applies across various domains, from product development to customer acquisition, and provides three rules to combat the negative effects of the Law of Funnels:

  • Reduce all user flows to the fewest number of possible actions
  • Have low friction steps occur before high friction steps
  • Optimize funnels for the most valuable cohort of users, not just the ones most likely to convert

Boredom: The Superpower You’re Still Afraid to Use

The article discusses the importance of boredom and the negative consequences of constantly seeking stimulation and avoiding moments of boredom. It argues that boredom serves important functions, such as sparking creativity, self-reflection, and personal growth, and that learning to embrace boredom can lead to positive outcomes.

AI And The Decline Of Human Intelligence

The article discusses the potential threat of AI making us dumber by enabling us to outsource our skills and cognitive abilities to AI systems, leading to a decline in our own skills and knowledge over time. It proposes using AI as a knowledge partner and tool to enhance our own learning and critical thinking, rather than just offloading tasks to it.

The Truth About ‘Close Door’ Elevator Buttons

The article discusses the functionality of the “close door” button in elevators, explaining that it is often a placebo button that does not actually close the doors faster. This is due to regulations from the Americans with Disabilities Act (ADA) that require elevators to remain open for a minimum amount of time to allow people with disabilities to enter.

Why Is the Loneliness Epidemic so Hard to Cure?

The article explores the modern phenomenon of loneliness, its causes, effects, and potential solutions. It delves into the historical context, the scientific research on the physiological impacts of loneliness, and the societal changes that have contributed to the current “epidemic” of loneliness.

Machine Learning: The Great Stagnation

The article discusses the stagnation in the field of machine learning research, where incremental work and “SOTA chasing” have become the norm, leading to a lack of true innovation. It also highlights some promising developments and approaches that could help revitalize the field.

Against the “debt-trap diplomacy” narrative

The article discusses China’s “Going Out” strategy, which has led to a substantial increase in overseas lending by China, making it the world’s largest official creditor. It examines the concept of “debt-trap diplomacy” and provides a critical analysis of this hypothesis, using examples such as the Mombasa–Nairobi Standard Gauge Railway in Kenya and the Hambantota Port in Sri Lanka. The article also explores the fragmented nature of China’s development financing system and the relative autonomy of state-owned enterprises, which may undermine the notion of a well-coordinated plot to create debt traps.

Diffusion Models Are Real-Time Game Engines

The paper presents GameNGen, a neural model that can interactively simulate the classic game DOOM at over 20 frames per second. GameNGen is trained in two phases: (1) an RL agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. The paper demonstrates that GameNGen can achieve a visual quality comparable to the original game, with human raters only slightly better than random chance at distinguishing short clips of the game from the simulation.

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

The article explores the trade-offs between generating synthetic data using a stronger but more expensive (SE) language model versus a weaker but cheaper (WC) language model for improving the reasoning performance of language models. It evaluates the generated data across three key metrics — coverage, diversity, and false positive rate — and finds that data from WC models may have higher coverage and diversity, but also exhibit higher false positive rates. The article then finetunes language models on data from SE and WC models in different settings — knowledge distillation, self-improvement, and a novel weak-to-strong improvement setup. The results show that models finetuned on WC-generated data consistently outperform those trained on SE-generated data across multiple benchmarks, challenging the prevailing practice of relying on SE models for synthetic data generation.

Law of Vision Representation in MLLMs

The article presents the “Law of Vision Representation” in multimodal large language models (MLLMs), which reveals a strong correlation between the combination of cross-modal alignment, correspondence in vision representation, and MLLM performance. The authors quantify these two factors using the cross-modal Alignment and Correspondence (AC) score, and find that the AC score is linearly correlated to model performance. By leveraging this relationship, they are able to identify and train the optimal vision representation without requiring finetuning the language model every time, resulting in a 99.7% reduction in computational cost.

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

The article proposes a diffusion-based approach for Text-to-Image (T2I) generation with interactive 3D layout control. The key points are:

  • Existing layout control approaches are limited to 2D layouts, require static layouts beforehand, and fail to preserve generated images under layout changes, making them unsuitable for applications that require 3D object-wise control and iterative refinements.
  • The proposed approach leverages depth-conditioned T2I models and introduces a novel approach for interactive 3D layout control, replacing 2D boxes with 3D boxes and revamping the T2I task as a multi-stage generation process.
  • The approach uses a Dynamic Self-Attention (DSA) module and a consistent 3D object translation strategy to seamlessly add objects to the scene while preserving existing contents.
  • Experiments show the approach can generate complicated scenes based on 3D layouts, outperforming depth-conditioned T2I methods and other layout control methods in object generation success rate and preserving objects under layout changes.

LLM Agents, Part 6 — State Management

The article discusses the role of State Management in improving the performance and reliability of multi-agent systems, building on the foundations of Service-Oriented Architecture (SOA) and Event-Driven Architecture (EDA).

Exclusive: Workers at Google DeepMind Push Company to Drop Military Contracts

The article discusses a dispute within Google between workers in its AI division, DeepMind, and its Cloud business over the company’s contracts with military organizations. Nearly 200 DeepMind workers signed a letter calling on Google to drop these military contracts, citing concerns that the technology is being used for military purposes, which they say violates Google’s own AI principles.

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

The article discusses the problem of load imbalance in Mixture-of-Experts (MoE) models, which can lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but this introduces undesired gradients that can impair model performance. The paper proposes a novel approach called “Loss-Free Balancing” that controls load balance without introducing interference gradients.

Towards Real-world Event-guided Low-light Video Enhancement and Deblurring

The paper addresses the novel research problem of event-guided low-light video enhancement and deblurring. The key contributions are:

  • Designing a hybrid camera system using beam splitters and constructing the RELED dataset containing low-light blurry images, normal sharp images, and event streams.
  • Developing a tailored framework for the task, consisting of two key modules:
  1. Event-guided Deformable Temporal Feature Alignment (ED-TFA) module to effectively utilize event information for temporal alignment.
  2. Spectral Filtering-based Cross-Modal Feature Enhancement (SFCM-FE) module to enhance structural details while reducing noise in low-light conditions.
  • Achieving significant performance improvement on the RELED dataset, surpassing both event-guided and frame-based methods.

Releasing Re-LAION 5B: transparent iteration on LAION-5B with additional safety fixes | LAION

The article discusses the release of an updated version of the LAION-5B dataset, called Re-LAION-5B, which has been thoroughly cleaned of known links to suspected child sexual abuse material (CSAM). It highlights the importance of open and transparent datasets for reproducible machine learning research, and the challenges in ensuring the legal compliance of such large-scale datasets gathered from the public web. The article outlines the steps taken by LAION to partner with organizations like the Internet Watch Foundation (IWF) and the Canadian Children Protection organization (C3P) to identify and remove links to suspected CSAM, as well as the removal of other sensitive data in cooperation with Human Rights Watch (HRW).

China’s robot makers chase Tesla to deliver humanoid workers

The article discusses China’s push into the emerging industry of building battery-powered humanoid robots, which are expected to replace human workers in assembling electric vehicles (EVs) on assembly lines. It highlights how China is leveraging its strengths in supply chain integration, mass production capabilities, and government support to drive the development of this technology, drawing from the formula behind its initial EV drive more than a decade ago.

The startup world outside Silicon Valley

The article discusses the concept of “camel” startups, which are companies that strive to be profitable from the first day, in contrast to the traditional “unicorn” startups that focus on rapid growth and funding. It highlights the challenges faced by startups in regions outside the major tech hubs and how the camel approach can be more suitable in these environments.

What a great investor can do for you. 📈

The article discusses how Paul Graham, the co-founder of Y Combinator, helped Airbnb in its early days when the founders were struggling. It highlights how Graham believed in the founders’ potential even when he was unsure about the idea, and how he actively supported them by introducing them to investors and encouraging them to take a chance on Airbnb.

Amazon to Launch New AI-Powered Alexa Using Anthropic’s Claude

The article discusses Amazon’s plans to release a revamped version of its Alexa voice assistant, powered by AI models from Anthropic’s Claude, rather than Amazon’s in-house AI technology.

Thread by @martin_casado on Thread Reader App

The article discusses the current state of large language models (LLMs) and the challenges in achieving significant scale increases between model versions, such as from GPT-3 to GPT-4.

How Do You Change a Chatbot’s Mind?

The article discusses the author’s efforts to improve his reputation with AI chatbots, which have been critical of him in the past. It explores various techniques the author tries to manipulate the chatbots’ responses, including:

  • Enlisting the help of a startup called Profound that specializes in “AI optimization” to analyze how chatbots view the author
  • Inserting “strategic text sequences” and invisible white text on his website to steer chatbots’ responses
  • Experimenting with these techniques on AI models like Llama 3 and finding that they can indeed influence the chatbots’ opinions of him

The article also discusses the broader implications of AI chatbots being so easily manipulated, questioning whether we can trust them with important tasks if they are so gullible.

Not everything is physics — Inverted Passion

The article discusses the author’s journey of exploring fundamental questions about reality, physics, and the limitations of reductionist thinking. It covers the author’s perspectives on the primacy of intuition in mathematics, the existence of multiple “worlds” beyond the physical universe, and the contextual nature of truth. The article argues against the idea of a single, unified theory that can explain everything and advocates for embracing the richness and diversity of different domains of knowledge.

The Threat to OpenAI Is Growing

The article discusses the growing competition faced by OpenAI, the maker of ChatGPT, from startups and tech giants like Meta and Google that are offering open-source AI models. It explores the advantages of open-source AI, such as lower costs and better customization for specific tasks, and how this is challenging OpenAI’s closed-source approach. The article also touches on the debate around transparency and safety concerns between closed and open-source AI systems.

Our website: https://aili.app
Notion Site: https://ailiapp.notion.site/
Follow us on X (Twitter): https://x.com/aili_app
Join our discord channel: https://discord.gg/CQtysdQfDM

--

--