Google Developer Experts

Experts on various Google products talking tech.

Automating Hoax Classification with Vertex AI and Cloud Functions

4 min readFeb 28, 2025

--

Misinformation and hoaxes spread rapidly across the internet, making it crucial to have an automated classification system. This article will guide you through implementing a hoax classification system with Vertex AI.

We’ll break down the implementation step by step, ensuring clarity and practical understanding. By the end of this guide, you can deploy an AI-powered Vertex AI based service that can classify claims as hoaxes, verified information, disinformation, or hate speech.

Prerequisites

Before starting, ensure you have the following installed:

  • Python 3.8+
  • Google Cloud SDK
  • Required Python packages:
pip install functions-framework==3.* vertexai

Step 1: Import Libraries

To begin, import the necessary libraries.

import functions_framework
import vertexai
from vertexai.generative_models import (
GenerationConfig,
GenerativeModel,
HarmCategory,
HarmBlockThreshold,
SafetySetting,
Tool,
grounding
)
import json
import io

The functions_framework library facilitates the building and deployment of an HTTP Cloud Function, providing a simple interface for handling requests and responses. The vertexai library provides direct access to Google’s Generative AI models.

Step 2: Configure the Generative AI Model

vertexai.init(project="YOUR_PROJECT_ID", location="us-central1")

generation_config_json = {
"temperature": 0,
"top_k": 1,
"max_output_tokens": 1024,
"response_mime_type": "text/plain",
}
generation_config = GenerationConfig(
**generation_config_json
)

tools = Tool.from_google_search_retrieval(
grounding.GoogleSearchRetrieval(
# Optional: For Dynamic Retrieval
dynamic_retrieval_config=grounding.DynamicRetrievalConfig(
dynamic_threshold=0.06,
)
)
)

Calling vertexai.init initializes the Generative AI client, ensuring it authenticates properly with your Google Cloud project. The generation_config (in this case populated from the generation_config_json dictionary) sets the model’s output parameters—things like temperature (which controls randomness in generation) and max_output_tokens which limits the response length. The tools object sets up Google Search retrieval capabilities so the model can query web sources for relevant information.

Step 3: Define the Hoax Classification Function

def classify_claim(claim, article_content=None, classify_disinformation=False, classify_hate_speech=None):
claim = f"Claim: {claim}"

if article_content:
claim += f"\n\nArticle about the claim: {article_content}"

classify_targets = ["Hoax", "Verified"]

if classify_disinformation:
classify_targets += ["Disinformation"]
if classify_hate_speech:
classify_targets += ["Hate Speech"]
map_generation_config_json = {
**generation_config_json,
"response_schema": {
"type": "object",
"properties": {
"classification": {
"type": "string",
"enum": classify_targets,
},
"justification": {
"type": "string",
}
}
},
"response_mime_type": "application/json",
}
map_generation_config = GenerationConfig(
**map_generation_config_json
)

searcher_model = GenerativeModel(
model_name="gemini-1.5-pro",
generation_config=generation_config,
system_instruction=(
f"The user will give you a claim, your task is to generate a detailed justification based on the found sources from google and specify whether the claim is either one of these {classify_targets}, make your point clear on what the most probable one is between all of those choices.\n"
"make a query using the language used in the claim because it might help.\n"
"a generally helpful guideline is to search 'is {claim} a hoax?\n'"
)
)
mapper_model = GenerativeModel(
model_name="gemini-1.5-flash",
generation_config=map_generation_config,
system_instruction=(
"You are a mapper system that will map the output conclusion of a hoax classifier system into json format, map the provided conclusions into json"
)
)
search_result = searcher_model.generate_content(
claim,
tools=tools,
safety_settings=safety_config
)
response = mapper_model.generate_content(
search_result.text,
safety_settings=safety_config
)
result = json.loads(response.text)
result['links'] = [dict(uri=chunk.web.uri, site=chunk.web.title) for chunk in search_result.candidates[0].grounding_metadata.grounding_chunks]
return result

The classify_claim function accepts a textual claim and optional parameters such as article_content and boolean flags for disinformation and hate speech classification. It constructs a prompt that includes the user’s claim and any relevant article text, then identifies which classification categories (Hoax, Verified, etc.) should be considered. A single GenerativeModel named searcher_model is created with a system instruction guiding it to query Google and produce a detailed justification of its classification. The final result is assembled into a Python dictionary containing the generated text, a classification field (derived from a basic keyword-matching heuristic), and a collection of link metadata (sources) extracted from the model’s grounding metadata.

Step 4: Define the Cloud Function

Create an API Gateway to manage hoax classification requests.

@functions_framework.http
def classify(request):
request_json = request.get_json(silent=True)
if not request_json:
request_json = dict(request.form)
if 'classify_disinformation' in request_json:
request_json['classify_disinformation'] = request_json['classify_disinformation'] == "true"
if 'classify_hate_speech' in request_json:
request_json['classify_hate_speech'] = request_json['classify_hate_speech'] == "true"

if request_json and 'claim' in request_json:
claim = request_json['claim']

article_content = request_json['article_content'] if ('article_content' in request_json) else None
classify_disinformation = request_json['classify_disinformation'] if ('classify_disinformation' in request_json) else False
classify_hate_speech = request_json['classify_hate_speech'] if ('classify_hate_speech' in request_json) else False

return classify_claim(
claim,
article_content=article_content,
classify_disinformation=classify_disinformation,
classify_hate_speech=classify_hate_speech
), 200

return 'Required parameter "claim" is missing in the request body', 400

This function, classify, is defined as an HTTP-triggered Google Cloud Function. It attempts to parse JSON data from the request to obtain the necessary classification inputs. Specifically, it looks for a claim key, as well as optional values for article_content, classify_disinformation, and classify_hate_speech. These values are passed to the classify_claim function, and the output is returned as JSON with a 200 status code. If no valid claim is detected, the function responds with an error message and an HTTP 400 status code.

Step 5: Deployment and Testing

Deploy the function to Google Cloud and use tools like curl or Postman to send HTTP requests with JSON data

gcloud functions deploy classify \
--runtime python310 \
--trigger-http \
--allow-unauthenticated
curl -X POST \
-H "Content-Type: application/json" \
-d '{"claim": "The Earth is flat."}' \
https://REGION-PROJECT_ID.cloudfunctions.net/classify

Conclusion

You can automate hoax detection at scale with Google Vertex AI. This system can process text and article content to provide detailed justifications and evidence.

Now deploy your hoax classification function and fight misinformation with AI! 🚀

Google Cloud credits are provided for this project. Special thanks to Nicholas Arsa for collaborating on this project.

--

--

No responses yet