Gemini + Streamlit — End To End LLM And Large Image Model Application Using Gemini Pro

Published in

The Streamlit Teacher

5 min readDec 26, 2023

Introduction: Unveiling Google’s Gemini Model — Pushing NLP Boundaries

In the realm of natural language processing (NLP), Google’s Gemini model stands as a groundbreaking achievement, revolutionizing the way AI comprehends and responds to human language. This innovative model marks a pivotal moment, showcasing the limitless potential of AI to effectively communicate and engage with users in a more natural and intuitive manner.

Gemini stands apart from its predecessors by leveraging a dual-encoder architecture that empowers the model with the ability to understand the nuances of language, extract meaningful context, and generate responses that align with human expectations. This remarkable breakthrough paves the way for more seamless and engaging interactions between humans and AI systems.

As we delve into the intricacies of the Gemini model, we’ll explore its transformative impact on the NLP landscape, unlocking new possibilities for conversational AI, language translation, abstractive summarization, and beyond. Discover how Gemini’s exceptional language understanding and response generation capabilities are shaping the future of AI-powered communication.

Interactive AI Exploration with Streamlit and Google’s Generative AI: Unleashing Gemini Pro and Gemini Pro Vision Models

Streamlit web application that utilizes Google’s Generative AI models to generate responses based on user input, either in the form of text prompts or uploaded images. The application provides an interactive interface for users to experiment with the capabilities of the Gemini Pro and Gemini Pro Vision models

Get your API key

Before you can use the Gemini API, you must first obtain an API key. If you don’t already have one, create a key with one click in Google AI Studio.

Get an API key

1. Importing Necessary Libraries

from dotenv import load_dotenv
import streamlit as st
import os
import pathlib
import textwrap
from PIL import Image
import google.generativeai as genai

dotenv: Library for loading environment variables from a file.
streamlit: A Python library for creating web applications with minimal code.
os: Provides a way to interact with the operating system.
pathlib: Offers a convenient interface for working with file and directory paths.
textwrap: Module for formatting and wrapping text.
PIL: Python Imaging Library, used for working with images.
google.generativeai: Presumably, a library for interacting with Google's Generative AI model.

2. Loading Environment Variables

load_dotenv()

Loads environment variables from a file, presumably containing sensitive information like API keys.

3. Configuring Google Generative AI

os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

Retrieves the Google API key from environment variables and configures the Google Generative AI library with the key.

4. Markdown Formatting Function

def to_markdown(text):
    text = text.replace('•', '  *')
    return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

Defines a function to convert text to Markdown format, replacing ‘•’ with ‘*’ and indenting the text.

5. Function to Get Gemini Response

def get_gemini_response(question):
    model = genai.GenerativeModel('gemini-pro')
    response = model.generate_content(question)
    return response.text

Defines a function to get a response from the Gemini Pro model based on a given question.

6. Function to Get Gemini Response with Image

def get_gemini_response_image(input, image):
    model = genai.GenerativeModel('gemini-pro-vision')
    if input!="":
       response = model.generate_content([input, image])
    else:
       response = model.generate_content(image)
    return response.text

Defines a function to get a response from the Gemini Pro Vision model based on text input and an optional image.

7. Streamlit Sidebar Interface

with st.sidebar:
    st.header("Text as input")
    text_input_prompt = st.text_input("Enter the prompt: ", key="input")
    st.markdown("<h1 style='text-align: center;'>(or)</h1>", unsafe_allow_html=True)
    img_input_prompt = st.text_input("Enter the prompt: ", key="input1")
    uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
    image=""
    submit = st.button("Generate response")

Sets up the Streamlit sidebar with input options for text, image, and a button to generate a response.

8. Handling User Input and Generating Response

if submit:
    if text_input_prompt:
        response = get_gemini_response(text_input_prompt)
        st.subheader("Generated response:")
        st.write(response)
    elif uploaded_file:
        if uploaded_file is not None:
            image = Image.open(uploaded_file)
            st.image(image, caption="Uploaded Image.", use_column_width=True)
        st.subheader("Generated response:")
        response = get_gemini_response_image(img_input_prompt, image)
        st.write(response)

Checks user input and triggers the generation of a response based on either text input or an uploaded image.

Entire Code:

from dotenv import load_dotenv

load_dotenv()

import streamlit as st
import os
import pathlib
import textwrap
from PIL import Image
import google.generativeai as genai

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))


def get_gemini_response(question):
    model = genai.GenerativeModel('gemini-pro')
    response = model.generate_content(question)
    return response.text

def get_gemini_response_image(input,image):
    model = genai.GenerativeModel('gemini-pro-vision')
    if input!="":
       response = model.generate_content([input,image])
    else:
       response = model.generate_content(image)
    return response.text

with st.sidebar:
    st.header("Text as input")
    text_input_prompt =st.text_input("Enter the prompt: ",key="input")
    st.markdown("<h1 style='text-align: center;'>(or)</h1>", unsafe_allow_html=True)
    img_input_prompt =st.text_input("Enter the prompt: ",key="input1")
    uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
    image="" 
    submit=st.button("Generate response")


if submit:
    if text_input_prompt:
        response=get_gemini_response(text_input_prompt)
        st.subheader("Generated response:")
        st.write(response)
    elif uploaded_file:
        if uploaded_file is not None:
            image = Image.open(uploaded_file)
            st.image(image, caption="Uploaded Image.", use_column_width=True)
        st.subheader("Generated response:")
        response=get_gemini_response_image(img_input_prompt,image)
        st.write(response)

Gemini vs OpenAI

Model Architecture:

Gemini: Gemini is a neural network-based language model developed by Google AI. It boasts a massive architecture with billions of parameters, enabling it to handle a wide range of natural language processing tasks.
OpenAI: OpenAI, developed by the organization of the same name, also utilizes a neural network architecture. However, its model is smaller in size compared to Gemini, resulting in faster training and inference times.

2. Training Data:

Gemini: Gemini was trained on a vast corpus of text data, spanning various domains and languages. This extensive training dataset contributes to its comprehensive understanding of natural language.
OpenAI: OpenAI was trained on a diverse collection of text, code, and web data. This diverse training regime enables it to excel in various tasks, including question answering, code generation, and dialogue generation.

3. Natural Language Processing Tasks:

Gemini: Gemini showcases impressive performance in various NLP tasks, including text summarization, machine translation, and named entity recognition. Its ability to grasp the context and generate coherent text makes it suitable for tasks that require high-quality natural language generation.
OpenAI: OpenAI excels in tasks that involve generating creative content, such as writing stories, poems, and even computer code. Its versatility extends to dialogue generation, where it can engage in conversations that mimic human-like responses.

4. Applications:

Gemini: Gemini’s capabilities lend themselves to applications ranging from customer service chatbots to automated text summarization tools. Its proficiency in understanding and generating natural language makes it valuable in domains that require accurate and fluent communication.
OpenAI: OpenAI finds applications in creative writing assistance, language translation, and question answering systems. Its ability to generate diverse and engaging content makes it a valuable tool for content creators and marketers.

5. Ethical Considerations:

Gemini: As with any powerful technology, the ethical implications of using Gemini must be carefully considered. Its potential for generating biased or harmful content warrants responsible usage and monitoring.
OpenAI: OpenAI presents similar ethical challenges, particularly in the context of deepfake generation and the spread of misinformation. Its capabilities necessitate thoughtful guidelines and responsible deployment.

6. Future Potential:

Gemini: Gemini’s ongoing development by Google AI promises further advancements in its capabilities, leading to even more sophisticated natural language processing tasks.
OpenAI: OpenAI’s focus on developing general-purpose AI suggests that its potential extends beyond language modeling. It holds the promise of tackling complex problems that require reasoning, planning, and decision-making.

Closing the chapter on Gemini Pro and Gemini Pro Vision — a transformative duo in AI exploration. Gemini Pro unleashes textual creativity, while Pro Vision seamlessly blends text and images. With Streamlit as our guide, we’ve made AI interaction intuitive. As we look forward, these models democratize AI, making it a creative companion for all. The future of transformative AI is here, and Gemini leads the way!