Weekly AI and NLP News — October 23rd 2023
Attention in near-linear time, StackOverflow lays off staff, and GPT4-V visual prompt injection
Published in
4 min readOct 23, 2023
Here are your weekly articles, guides, and news about NLP and AI chosen for you by NLPlanet!
😎 News From The Web
- After ChatGPT disruption, Stack Overflow lays off 28 percent of staff. Stack Overflow is letting go of 28% of its employees due to advancements in AI technology like ChatGPT. Chatbots like ChatGPT provide efficient coding assistance and heavily rely on content from sites like Stack Overflow. However, an important question arises regarding the sustainability of chatbots that gather data without benefiting their sources.
- Fuyu-8B: A Multimodal Architecture for AI Agents. Adept has introduced Fuyu-8B, a powerful open-source vision-language model designed for comprehending and responding to questions regarding images, charts, diagrams, and documents.
- AI Could Help Brain Surgeons Diagnose Tumors in Real Time. “Sturgeon” is an AI model utilizing nanopore sequencing to quickly and precisely diagnose brain tumors, revolutionizing medical treatment. By mimicking human brain activity and employing algorithms, Sturgeon can recognize patterns and provide accurate diagnoses within 40 minutes.
- OpenAI halted the development of the Arrakis model. OpenAI’s plans for developing the AI model Arrakis to reduce compute expenses for AI applications like ChatGPT have been halted. Despite this setback, OpenAI’s growth momentum continues, with projected annual revenue of $1.3 billion. However, they may face challenges with Google’s upcoming AI model Gemini and scrutiny at an AI safety summit.
- Baidu to integrate ERNIE 4.0, which ‘rivals’ GPT-4, into Search. Baidu unveils ERNIE 4.0, a potential competitor to OpenAI’s GPT-4, with plans to transform their search engine responses by offering personalized and flexible answers instead of traditional links and results. The exact launch date is yet to be revealed.
- OpenAI has quietly changed its ‘core values’. OpenAI has recently updated its core values, with a focus on Artificial General Intelligence (AGI). The company now prioritizes AGI as its main aim, making it the top value. The revised values also include being intense and scrappy, scaling, creating products people love, and fostering team spirit.
📚 Guides From The Web
- Why LLaVa-1.5 is a Great Victory for Open-Source AI. LLaVa-1.5, a smaller yet powerful alternative to OpenAI’s GPT-4 Vision, proves the potential of open-source models for Large Multimodal Models (LMMs). It emphasizes the significance of understanding multimodality in AI, debunking doubts about the feasibility of open-source approaches.
- Transformer Math 101. This content provides important numbers and equations for working with large language models (LLMs). It covers topics such as compute requirements, computing optima, minimum dataset size, minimum hardware performance, and memory requirements for inference.
- GPT-4 Vision Prompt Injection. Vision Prompt Injection is a vulnerability that allows attackers to inject harmful data into prompts via images in OpenAI’s GPT-4. This poses a risk to system security, as attackers can execute unauthorized actions or extract data. Defending against this vulnerability is complex and may affect the model’s usability.
- Advanced Data Analysis with GPT4: Mapping European Tourism Trends. GPT-4’s advanced data analysis capabilities, particularly in data visualization, have been showcased using European tourism data. The process involved exploring the dataset and creating detailed visualizations, demonstrating the efficiency and speed of GPT-4.
- GPT-4 is Getting Faster. GPT-4 is rapidly improving its response speed, particularly in the 99th percentile where latencies have decreased. Both GPT-4 and GPT-3.5 maintain a low latency-to-token ratio, indicating efficient performance.
🔬 Interesting Papers and Repositories
- HyperAttention: Long-context Attention in Near-Linear Time. HyperAttention is a novel solution that addresses the computational challenge of longer contexts in language models. It outperforms existing methods by using Locality Sensitive Hashing (LSH), resulting in considerable speed improvements. HyperAttention excels on long-context datasets, making inference faster while maintaining a reasonable perplexity.
- DeepSparse: Enabling GPU-Level Inference on Your CPU. DeepSparse is a powerful framework that enhances deep learning on CPUs by incorporating sparse kernels, quantization, pruning, and caching of attention keys/values. It achieves GPU-like performance on commonly used CPUs, enabling efficient and robust deployment of models without dedicated accelerators. Additionally, DeepSparse supports popular architectures such as LLMs, BERT, ViT, ResNet, and others, making it versatile for various hardware setups from cloud to edge.
- Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflection. Self-RAG is an enhanced model that improves upon Retrieval Augmented Generation (RAG) by allowing language models to reflect on passages using “reflection tokens”. This improvement leads to better responses in knowledge-intensive tasks like QA, reasoning, and fact verification. Self-RAG outperforms other leading LMs and retrieval-augmented models, such as ChatGPT and Llama2-chat.
- BitNet: Scaling 1-bit Transformers for Large Language Models. BitNet is a 1-bit Transformer architecture designed to improve memory efficiency and reduce energy consumption in large language models (LLMs). It outperforms 8-bit and FP16 quantization methods and shows potential for effectively scaling to even larger LLMs while maintaining efficiency and performance advantages.
- PaLI-3 Vision Language Models: Smaller, Faster, Stronger. PaLI-3, a smaller, faster, and stronger Vision Language Model (VLM), outperforms models 10 times its size. It utilizes a ViT model trained with contrastive objectives, which allows it to excel in multimodal benchmarks.
- Octopus: Embodied Vision-Language Programmer from Environmental Feedback. Octopus, a Vision-Language Model (VLM) like GPT-4V, is being utilized to solve tasks in a game similar to GTA. It demonstrates strong abilities in decoding both visual and textual tasks. Octopus can generate detailed action sequences and executable codes, effectively handling various levels of complexity.
- MemGPT: Towards LLMs as Operating Systems. MemGPT, a new language model, employs virtual context management inspired by hierarchical memory systems found in traditional operating systems. It can enable the creation of conversational interfaces that adapt in real-time during prolonged interactions.
Thank you for reading! If you want to learn more about NLP, remember to follow NLPlanet. You can find us on LinkedIn, Twitter, Medium, and our Discord server!