[Jun] ML Community — Highlights and Achievements

Nari Yoon
Google Developer Experts
10 min readJul 18, 2024

Let’s explore highlights and accomplishments of the vast Google Machine Learning communities over the month. We appreciate all the activities and commitment by the community members. Without further ado, here are the key highlights!

Featured Stories

ML Training Campaigns Summary

The ML Training Campaigns (ML Study Jams, ML Paper Reading Clubs, ML Math Clubs, ML Writing Clubs) help the community start and dive into ML theory and applications by themselves. Below are some of the stories and achievements in H1 2024.

ML Paper Reading Clubs by AI/ML GDE Aye and Women in AI Myanmar

ML Paper Reading Clubs | 39 events by 15 communities
AI/ML GDE Aye Hninn Khine (Thailand), the organizer of Women in AI Myanmar, hosted virtual Paper Reading Clubs to upskill Burmese developers displaced around the world. 90% of the presenters and 80% of the participants were women from Myanmar actively working in the ML industry (Playlist: 16 videos with 1000+ views).

ML Math Clubs by Hasanul & Machine Learning, AI, Deep Learning & NLP Community — Bangladesh

ML Math Clubs | 43 events by 13 communities
Zarin Saima Roza (Bangladesh) and Hasanul Banna Himel (Bangladesh) participated in presenting ML Math Clubs with Machine Learning, AI, Deep Learning & NLP Community — Bangladesh. Zarin led sessions on statistics (Playlist: 5 videos with 1000+ views) and Hasanul led sessions on calculus, probability, and linear algebra (Playlist: 13 videos with 500+ views).

ML Paper Writing Clubs | 8 projects led by 5 communities
AI/ML GDE Martin Andrews and Sam Witteveen’s paper, using Gemini-Pro-1.0, Proving that Cryptic Crossword Clue Answers are Correct, has been accepted to the ICML’24 Workshop on LLMs and Cognition and has also been published on arXiv.

AI/ML GDE Rabimba Karanjai’s paper, SolMover: Feasibility of Using LLMs for Translating Smart Contracts, has been accepted in FSE 2024 and AIWare’ 24. It has also been published on arXiv.

ML Study Jams | 185 events organized by 129 communities
TFUG Islamabad (Playlist) and TFUG Durg (Playlist) hosted a series of ML Study Jams with videos covering topics from introduction to ML to advanced applications.

ML Developer Journey

AI/ML GDE Aritra Roy Gosthipaty (India) contributed to KerasNLP to push its integration with the Hugging Face further. Users can now load Transformers models directly into KerasNLP from the Hugging Face Hub. This integration has enabled users to use Gemma and Llama3 and will open up to other models. Check out his PR on Github and his article on Hugging Face, Announcing New Hugging Face and Keras NLP integration to learn more. He used Colab in developing & testing this.

Activities by ML Products

Gemini

Building a generative AI-based GeoGuesser (repository) by AI/ML GDE Dimitre Oliveira (Brazil) explains how he made a model generating hints for the game, GeoGuesser in which you are placed on a random location on Google Maps and have to guess the location. He used Gemma to make it generate 3 kinds of hints: audio, text and image.

Use Gemini Flash to Analyze the Video by AI/ML GDE Yucheng Wang (China) introduced Gemini Flash 1.5 features and how to use it to analyze Youtube videos step by step.

Multimodality with Gemini-1.5-Flash: Technical Details and Use Cases by AI/ML GDE Rubens Zimbres (Brazil) provides technical details about Gemini 1.5 Flash, benchmarks comparison, advantages of multimodality capability with image, audio, video examples. He also explains the concept of knowledge distillation.

Hands-On with Gemini 1.5 Flash: Build a Python chat app in minutes! by AI/ML GDE Tomasz Porozynski (Poland) explained how to start building your first application using Gemini Flash API, and which you need to choose AI Studio or Vertex AI Studio for your project.

Gemini Flash 1.5 Multimodal App — Streamlit by AI/ML GDE Rubens Zimbres (Brazil) shows the Gen AI app he built with Streamlit and deployed in a Cloud Run container. The app runs a Gemini 1.5 Flash model and presents multimodality features where you can generate a report via function calling and analyze a PDF file making math calculations, a price table in an image, audio from Apple’s earnings report, and also a marketing video.

Spatial storytelling with Gemini Flash (repository) by AI/ML GDE Vikram Tiwari (US) presents an example of multimodal storytelling with Gemini. He uploads a video of the Golden Gate Park and the model generates a story about the objects present in the image, including a Golden Gate Bridge, hills, trees, and so on. The model not only generates a textual description but also identifies the objects in the image and gives you coordinates to put bounding boxes around them.

Enhancing Accessibility: GDSC-Hong Kong Institute of Information Technology (HKIIT) aids visually impaired individuals to see all web images using Gemini Pro Vision by AI/ML GDE Cyrus Wong (Hong Kong) shared his experience as a mentor of the GDSC group that developed a solution combining the open-source Chrome extension ChromeVox with Gemini Pro Vision to describe images without alt texts for visually impaired people.

You didn’t know this about Google Gemini! Improve your Productivity for FREE Using Extensions (Spanish) by AI/ML GDE Carlos Alarcon (Colombia) shows how you can improve your productivity using Gemini extensions in Google Workspace.

At Nvidia’s AI Summit, AI/ML GDE Jerry Wu (Taiwan) shared about Dynamically allocating composition of experts with transformer-based language models in finance service based on Gemini and Gemma in finance service.

RAG for Music Generation — Making Contextual Music, No Band Practice Needed by AI/ML GDE Bao Dai (Singapore) provided insights into generating music using AI without the need for traditional band setups. He introduced Gemini 1.5 Pro, a RAG pipeline with LlamaIndex and Milvus, and explained how to use Gemini with RAG integrated to generate music.

Exploring Multimodal Use Cases With Gemini on Vertex by AI/ML GDE Sara EL-ATEIF (Morocco) discussed how to leverage the multimodality aspect of Gemini and make the best out of it as a developer through the API of Vertex. She covered several use cases with codes in the Colab notebook.

Gemma

Ask my PDF by AI/ML GDE Nico Martin

Ask my PDF: RAG in the browser (video) by AI/ML GDE Nico Martin (Switzerland) is a web app building a RAG solution directly in a browser without dependencies to an AI cloud provider. The app interacts with a PDF you provide and answers questions based on the sources.

Zero-shot object detection and referring expression segmentation in videos using PaliGemma (repository) by AI/ML GDE Nitin Tiwari (India) shows examples of using PaliGemma for tasks such as object detection, segmentation, image captioning, etc.

Fine-tuning Google’s Gemma LLM for GST FAQs by AI/ML GDE Yogesh Kulkarni (India) discussed the background of fine-tuning LLMs and demonstrated how to use the Ludwig framework to fine-tune Gemma on a corpus of FAQs related to India’s Goods and Services Tax.

Hands-on Gemma Open Models (Generative AI): A Practical Approach (written tutorial | notebook) by AI/ML GDE Nathaly Alarcon (Bolivia) guides you through the process to harness the power of Gemma and explore its capabilities. She explained how to create an email generator, translator, travel assistant, and a code assistant.

Online Knowledge Distillation: Advancing LLMs like Gemma 2 through Dynamic Learning by AI/ML GDE Rishiraj Acharya (India) explores into the intricacies of online knowledge distillation, its implementation, and its implications for the future of LLM development.

Applying LLMs to Recommender Systems: Building a RAG using Google Gemma and MongoDB by AI/ML GDE Ashmi Banerjee (Germany) delved into the intersection of LLMs and recommender systems, best practices for evaluation and real world case studies.

QLoRA Finetuning & DPO Aligning Google’s Gemma with Hugging Face by AI/ML GDE Rishiraj Acharya (India) explored the process of fine-tuning Gemma using QLoRA and aligning it with the Hugging Face platform. He discussed how QLoRA can be applied to Gemma to tailor it for specific tasks while minimizing the computational resources required.

RecurrentGemma + Griffin, Many-Shot In-Context Learning and [DeepMind SIMA] Scaling Instructable Agents Across Many Simulated Worlds by AI/ML GDE Grigory Sapunov (UK) are reviews of papers from DeepMind.

Transform Your AI Strategy with Google Gemma hosted by Machine Learning, AI, Deep Learning & NLP Community — Bangladesh covered an in-depth exploration of Gemma and how to implement it using Hugging Face. The speaker Jay Thakkar (India) shared his insights on real-world applications and best practices for integrating Gemma into your projects.

Colab

Min P Sampling Explained notebookby AI/ML GDE Pedro Lourenço

Min P Sampling Explained by AI/ML GDE Pedro Lourenço (Brazil) is a Colab Notebook explaining about a brand new decoding strategy called Min P Sampling.

Chat with custom data source using Gemini API (slides | Colab Notebook) by AI/ML GDE Tarun R Jain (India) focused on building an effective RAG system and its evaluations pipeline using BeyondLLM, with Gemini as the default embeddings and LLM model.

Prompt Engineering Techniques for Language Models (slides | Colab Notebook) by AI/ML GDE Pedro Lourenço (Brazil) was a hands-on session on prompt engineering for language models and how to optimize model responses through Few-shot, Zero-shot, and reasoning techniques.

Gemini Explorations — Get Going & Gemini Explorations — Blazing Breeze (Colab Notebook) by AI/ML GDE Vikram Tiwari (US) explores various ways to use the Gemini API and digs deeper into AI Studio and the various options available there. He also explores tokenization for text, images, and audio and shows you how to generate responses based on text & audio inputs.

Fastest way to prototype with Gemini? Google Colab! by AI/ML GDE Tomasz Porozynski (Poland) is a quick tutorial to help you get started with trying out Gemini by setting it up with Colab!

JAX

flash-nanoGPT by AI/ML GDE Azzeddine CHENINE (Algeria) is a JAX/Flax re-write of NanoGPT using some of the common JAX libraries/features (shmap, pallas, jmp, optax, orbax).

Back to Basics NNX A new high level API for JAX and TPU v5 by AI/ML GDE David Cardozo

In an event, Diffusion Models and NNX: A new high level API for JAX hosted by Inteligencia artificial Montevideo, AI/ML GDE David Cardozo (Canada) introduced NNX, a DeepMind’s high level API that is at the backbone of many vision and multimodal models at Google and OpenSource (slides). He did an historical review with examples with the objective of launching a small TPU v5 training using Kubeflow and Vertex.

Dive into Gemma with Keras and JAX by AI/ML GDE Joan Santoso (Indonesia) introduced Gemma, how to use the model with Keras and JAX with several use cases.

ODML

Use Gemini and context caching inside android by AI/ML GDE George Soloupis (Greece) demonstrated how to implement conversation caching in an Android application, allowing users to resume their conversations with the Gemini API from where they left off.

Firebase Genkit With Ollama, use GenAI in you local Machine and deploy to the cloud by AI/ML GDE Xavier Portilla Edo (Spain) explores a Firebase project that uses the Gen AI Kit with Gemma using Ollama and explains how to test it locally with the Firebase emulator and the Gen UI Kit.

GenAI using Firebase by AI/ML GDE Pankaj Rai (India) was a session on how Firebase can help in adding Gen AI to Android apps with Vertex AI for Firebase and Genkit. He also gave a speech on Gemini in Android apps and Android Studio at KotlinConfDelhi for participants who want to learn how to accelerate app development using Gemini in Android Studio.

Integrating Gemini with Flutter: A Live Demo (video) hosted by TFUG Islamabad covered how to use Gemini with Flutter. The speaker, Flutter & Dart GDE Suesi Tran (Australia), provided a comprehensive guide on how to seamlessly integrate Gemini into a Flutter project and a live demo of the integrated app in action.

Cloud

AI/ML GDE Sungmin Han (Korea) at Innovators Hive ’24 Seoul

Distributed training of models across multiple TPUs using TPUStrategy by AI/ML GDE Sungmin Han (Korea) covered the process of distributed learning of a model through multiple TPUs via TPUStrategy. His another speech, Enhancing Product Value with Gemini’s Multimodal Capabilities delved into Gemini in detail and its multimodal capabilities. He explored the intricate structure of the LMM and discussed how the multimodality offered by the model brings unprecedented user experiences to products.

Awesome New! Batching Requests to Gemini API with Vertex Pro vs Flash by AI/ML GDE Linda Lawton (US) explained the concept of batching predictions and shared her repository, vertex_batch_predictions for Cloud run.

Build an end to end GenAI solution using Gemini & GCP services by AI/ML GDE Yannick Serge Obam (Cameroon) was a workshop on building out a code pipeline and backend for a customer service application powered by Gen AI, specifically leveraging the capabilities of the Gemini API in Vertex AI

Use Gemma with Vertex AI & GKE by AI/ML GDE Jun Jiang (China) explained Vertex AI simplifies AI lifecycle management from experimentation to deployment, streamlining workflows with a unified platform and repeatable processes for handling data, endpoints, and API deployments.

BigQuery powered by Vertex AI Gemini by AI/ML GDE Sungmin Han (Korea) discussed the integration of Gemini with BigQuery using BQML. He explored how to incorporate the productivity of LLMs into information systems and examined the practical advantages/disadvantages, and expectations of this technology from a professional point of view.

Unlocking Claude 3.5 Sonnet in Google Cloud: Your First AI App in minutes! (Vertex AI Quick-start) by AI/ML GDE Tomasz Porozynski (Poland) showed how to quickly get started with Claude 3.5 on Vertex AI, including enabling Claude 3.5 Sonnet in your Google Cloud project, running your first queries, and building a basic chatbot using Python.

--

--