Gemini Reshaping the NLP Task for Extracting Knowledge in Text

JOAN SANTOSO
7 min readFeb 18, 2024

--

Gemini is the new language models developed by Google. But Gemini is not like other chatbots, Gemini was designed to be multimodal. Which means that Gemini accepts multiple kinds of input, from text, images, audio, video, and programming code. Gemini also aimed to be informative but also create enjoyable conversations.

Gemini Logo

Being able to stay up to date by accessing and processing real-time information is also quite important for a chatbot. Guess what? Gemini is able to do that too. On top of that Gemini is also able to learn and evolve based on user interactions and feedback, which means that Gemini can evolve and improve over time. One of main benefit of Gemini is if you need factual information, Gemini is able to access Google Search to serve you the most relevant information on the web.

Before we continue to a tutorial of how to utilize the Gemini for analyzing the Natural Language input, let’s discuss about LLM and prompt. The field of Natural Language Processing (NLP) has witnessed a significant revolution with the emergence of Large Language Models (LLMs). These language giants are not just pushing the boundaries of technology, using a prompt acts as a set of instructions that guide the model towards the desired output.

This bring an era of Generative AI in various task including how we analyze the data in Natural Language. One of important aspect in LLM is how we instruct the model using Prompt. Prompt serves as the crucial bridge between you and the vast knowledge and capabilities hidden within the LLM.

Example Prompting in Google AI Studio

The prompt influence the models to provide the context, specifies the task, and influence the output from the language giant or we known as LLM. To understand on how LLM worked is, we make an analogy LLM as an orchestra. We as the prompt engineer become the conductor of the orchestra. The orchestra may has the potential to play a good melody. As the conductor we provide the instructions to know what music should be played. The instruction that mentioned before is known as prompt.

Now, lets move to our tutorial. In this tutorial we will have an information extraction task for extracting an entity and relation extraction from those entity. This is an important task especially for obtaining a knowledge from text. Long time a go before the LLM era, to perform this task, we need a lot effort such as dataset building, model creation, until the deployment of the models. Now with the Gemini we can easily to create the task using the API of Gemini. In this example, we will use Python as the client to analyze the input and produce the entities and relation.

Before we started, let’s obtain the API key from the Google AI Studio.

  1. Click get API Key in Google AI Studio. Once clicked, it will bring us to the page like this.

2. Click Get API Key and we will show the information regarding the key generated for us. Save your API Key for later use in this project while making request to the Gemini API.

3. We will use python as the client. To get access to the API we need to instal the google-generativeai python package in the our python system.

pip install -q -U google-generativeai

We install the library into our python environtment. Then we can import the library using import google.generativeai as genai .

3. Access the model using the API_KEY that we get before. For this task, we use Gemini Pro model. If we want to have multimodal analysis, we can use Google Gemini Vision Pro model. Look at this code bellow to define the model.

import google.generativeai as genai
API_KEY = "provide your API KEY"
genai.configure(api_key=API_KEY)
model = genai.GenerativeModel('gemini-pro')

4. Now create the prompt to help the orchestra (LLM) produce the output that we want. This is the example that I used as the prompt template. For the first example, I will used only NER task and we will have a more complex task later.

prompt='''
1.You are a Named Entity Recognition in Indonesian Language.
2.Do some analysis to extract the Entity from the text for some categories, i.e., Person, Organization, Location, Date/Time, and other as Miscellaneous.
3.Output Person category as PER, Organization category as ORG, Location category as LOC, Date/Time category as DT, and Miscellaneous category as MISC.
4.Return this result as JSON for each entity with character offset from each result.
Analyze the sentences as follow: "'
'''

Let’s look at the prompt that mention in the example. First we define that Gemini should act as Named Entity Recognition Model (NER) in Indonesian Language. Second we inform the Gemini to extract the entity with some categories mention in the prompt. Third we make the label for each category to simplify the output class. The last part of the prompt was to output the result in the JSON format with list of each entity consist of character offset. And we give the instruction to analyze the sentences as give the input.

5. Now, we are ready to test the models. We try to get the results as JSON and concat the prompt template from previous with the query input. For this example we use this sentences “ISTTS sebagai perguruan tinggi di Indonesia terletak di Surabaya”. And we call the API using this code and print the results.

query = "ISTTS sebagai perguruan tinggi di Indonesia terletak di Surabaya"
response = model.generate_content(prompt+query +'"')
print(response.text)

While we request the Gemini to output the result in JSON, we can obtain the result as a JSON output like this.

[
{
"category": "ORG",
"entity": "ISTTS",
"offset": 0,
"length": 5
},
{
"category": "LOC",
"entity": "Indonesia",
"offset": 16,
"length": 8
},
{
"category": "LOC",
"entity": "Surabaya",
"offset": 30,
"length": 8
}
]

The output produce the entity list and character offset from each entity that obtained from the input.

Now the Gemini has help us to do the NER task. Let’s move to the second part of this tutorial.

How about the relation extraction?

Is the Gemini able to do this analysis?

Let’s look at the prompt part. In the previous example, we provide the Gemini a prompt to have a NER task only. Let’s modify the prompt to do the second task like this.

prompt='''
1.Do some analysis to extract the Entity from the text for some categories, i.e., Person, Organization, Location, Date/Time, and other as Miscellaneous.
2.Output Person category as PER, Organization category as ORG, Location category as LOC, Date/Time category as DT, and Miscellaneous category as MISC.
3.Provide a relation extraction from those entity for build the knowledge graph.
4.Provide this NER result consist of each entity with character offset and the relation extraction result as edge list.
Analyze the input and produce the output as JSON as follow: "'
'''

In this second version of the prompt, we modify the prompt into several task. First we define the first task for extracting the entity like the previous prompt. The second part we simplify the output categories into some simplified label. Then we add the new analysis task. The third is the relation extraction task. We have the Gemini to perform the relation extraction for building the knowledge graph. And the last we define how we output the result for the NER and relation extraction.

To show how this prompt is worked. Let’s provide a long input as follows in Indonesian Language: “Indonesia dengan nama resmi Republik Indonesia adalah sebuah negara kepulauan di Asia Tenggara yang dilintasi garis khatulistiwa dan berada di antara daratan benua Asia dan Oseania sehingga dikenal sebagai negara lintas benua, serta antara Samudra Pasifik dan Samudra Hindia.”

We concat the prompt template with the query input and ask the Gemini to provide the result using code as follows.

query = 'Indonesia dengan nama resmi Republik Indonesia adalah sebuah negara kepulauan di Asia Tenggara yang dilintasi garis khatulistiwa dan berada di antara daratan benua Asia dan Oseania sehingga dikenal sebagai negara lintas benua, serta antara Samudra Pasifik dan Samudra Hindia.'
response = model.generate_content(prompt+query +'"')
print(response.text)

After we run, the API request, we get the result from those query as follows.

{
"entities": [
{
"text": "Indonesia",
"offset": 0,
"type": "LOCATION"
},
{
"text": "Republik Indonesia",
"offset": 11,
"type": "ORG"
},
{
"text": "Asia Tenggara",
"offset": 36,
"type": "LOCATION"
},
{
"text": "daratan benua Asia",
"offset": 59,
"type": "LOCATION"
},
{
"text": "Oseania",
"offset": 72,
"type": "LOCATION"
},
{
"text": "Samudra Pasifik",
"offset": 92,
"type": "LOCATION"
},
{
"text": "Samudra India",
"offset": 106,
"type": "LOCATION"
}
],
"relations": [
{
"subject": "Indonesia",
"object": "Republik Indonesia",
"type": "IS_A"
},
{
"subject": "Indonesia",
"object": "Asia Tenggara",
"type": "LOCATED_IN"
},
{
"subject": "Indonesia",
"object": "daratan benua Asia",
"type": "LOCATED_IN"
},
{
"subject": "Indonesia",
"object": "Oseania",
"type": "LOCATED_IN"
},
{
"subject": "Indonesia",
"object": "Samudra Pasifik",
"type": "BORDERED_BY"
},
{
"subject": "Indonesia",
"object": "Samudra India",
"type": "BORDERED_BY"
}
]
}

The JSON result provide the entity list with the character offset for each entity. And the relation in the JSON was provides with the result and type of the relation that obtained from the input. This tutorial show how Gemini can perform a various task including advance task like information extraction. For the example of the code, it can be seen on the my GitHub on the following links.

For more information regarding Gemini, please look at several resources bellow:

  1. Getting Started with Gemini
  2. Gemini API Tutorial
  3. Generative AI Learning Path

This is the my first tutorial for Gemini. Now we will have a second topics on how embed the Gemini into our Keras Model in the next series of my tutorial. Stay tuned for the next tutorial. Thank you for reading.

--

--

JOAN SANTOSO

Lecturer, Machine Learning Researcher, Lifetime learner