Getting Started with Generative AI in Angular

Published in

Google Cloud - Community

12 min readAug 14, 2023

Vertex AI is Google Cloud platform for state-of-the-art Generative AI including PaLM 2

IMPORTANT: Google Gemini, a new model, has become the default and a new JavaScript client is available from Google AI. For an updated article check here

In this post, you are going to learn how to access to the just released PaLM APIs to create the next generation of AI-enabled Applications using Angular.

If you are based in the US, you can skip to the annex at the end using an API key from MakerSuite. This tutorial, covers accessing Vertex AI with Google Cloud credentials for countries where MakerSuite is not available yet.
For Firebase based projects, you can use Call PaLM API Securely, a Firebase extension. Find more details about how to set it up here.

We’ll use Angular to build a simple application that prompts the PaLM for Text model. Once you get this working, you are set to use any of the other models. Here’s what we’ll cover:

Introduction to Generative AI
Responsible AI
Getting API credentials for Vertex AI
Setup for API access via Google Cloud
Creating the Angular Application
Making the Prediction Request
Handling the Prediction Response
Conclusion
Annex 1: using API key from MakerSuite? Join the waitlist otherwise
Annex 2: Prompt basics: temperature, topK and TopP
Annex 3: Response Metadata: Citations, Safety Attributes and Billing

This tutorial will help you get set up to Vertex AI so you can start adding AI-enabled features using models like PaLM in your applications. Let’s get started!

Introduction to Generative AI

Vertex AI provides pre-trained models for tasks like text generation, chat, image editing, code, speech and more. You can access these via API requests. Alternatively, if you need managed services to create models from scratch and fine tune them you can also use these to create, host and manage your models for you.

Vertex AI: Foundational Models. Embeddings available for text and images.

What is Generative AI?

Generative AI is a very exciting new area within AI that falls into Artificial Intelligence (AI), Machine Learning (ML) and Neural Networks. The Transformer together with the attention mechanism released back in 2017 by Google is what powers, the new wave of Generative AI and the Foundational models.

Large language models are pre-trained on massive text with the goal of predicting the next word in a sequence. Once trained, they can generate new text relevant to the given prompt being an open-ended paragraph in a novel, a list of ingredients for a recipe or providing an answer to a general knowledge question.

The training data includes public data available on the Internet, books and code repositories similar to GitHub.

How are people using Generative AI today?

Models based on the Transformer have surpassed many traditional ML approaches. Most recently, research has focused in multi-modal architectures that can be trained and combine inputs and outputs not only as text but as images, video and audio in a single model. This is the case for Gemini, a model being developed by DeepMind, the creators of AlphaGo.

Some use cases for Generative AI sorted by output.

Some examples of how people are using Generative AI today:

Creative text generation
Chatbots and virtual assistants
Summarisation of text
Image generation and editing

Responsible AI

While Generative AI enables exciting new applications, these powerful models also introduce important ethical questions we must grapple with as developers. We have a responsibility to build AI thoughtfully by considering potential harms, setting appropriate constraints, and monitoring for misuse.

As we integrate these models into applications, we need to implement strategies like human review processes and sensitivity analysis to prevent abusive or dangerous uses of the technology.

Some questions you should ask before releasing any new AI features:

Auditing and traceability – How will we facilitate tracing back to the source of abusive or questionable AI usage? What logging and auditing capabilities should we implement?
Prompt framing and grounding – How will we ensure our application provides proper context for AI-generated content? Without context, misinterpretation can easily occur.
Outputs monitoring – If incorrect output does occur, how will it be corrected? What feedback mechanisms should exist for users?
Bias mitigation. How might the AI output negatively impact or unfairly characterise underrepresented groups? How will we proactively test for bias?

For more information, read Google’s approach to responsible AI.

Now that we’ve covered the key concepts, let’s walk through how to access Vertex AI and make requests from an Angular application.

Getting API Credentials for Vertex AI

In the previous section, we introduced Vertex AI and discussed the models and capabilities it provides. These are the steps to get credentials for accessing Vertex AI.

⚠️ Note that this approach is only temporary while no official Vertex AI Client for Web is available. If you are developing for the server-side do use available clients for Java, Node.JS, Python and Go.

Setup for API access via Google Cloud

To secure the API access, you need to create an account and get the credentials for your application so only you can access it. Here are the steps:

Sign up for a Google Cloud account and enable billing — this gives you access to Vertex AI.
Create a new project in the Cloud Console. Make note of the project ID.
Enable the Vertex AI API for your project.
Install the gcloud CLI and run gcloud auth print-access-token. Save the printed access token - you’ll use this for authentication.

Once you have the project ID and access token, we’re ready to move on to the Angular app. To verify everything is setup correctly you can try these curl commands.

Make sure that your access token remains private so you do not incur into expenses from unauthorised access.

Creating the Angular Application

We’ll use the Angular CLI to generate a new application:

ng new vertex-ai-palm2-angular

This scaffolds a new project with the latest Angular version.

To make API requests, we need the HttpClient module:

// app.module.ts
import {HttpClientModule} from '@angular/common/http';

@NgModule({
  imports: [
    HttpClientModule
  ] 
})

With this import, we can inject HttpClient into any component or service to make web requests.

Making the prediction request

Let’s create a component to run our prompt:

ng generate component predict

In predict.component.ts, we’ll use the code below to set up the HttpClient and prepare the stage for the API request:

import { Component, OnInit } from '@angular/core';
import { HttpClient, HttpHeaders } from '@angular/common/http';
import { createPrompt, TextRequest, TextResponse } from '../models/vertex-ai';

@Component(...)
export class PredictComponent implements OnInit {
  endpoint: string = "";
  headers: HttpHeaders | undefined;
  prompt: TextRequest = createPrompt("What is the largest number with a name?");

  constructor(public http: HttpClient) {
  }

  ngOnInit(): void {
    this.TestVertexAIWithoutApiKey();
  }
}

To make the call, We’ll break this down into a few steps.

Creating the prompt

To create the prompt, we are using a helper function which will return the request JSON object that we will be used to send our prediction.

/// src/app/models/vertex-ai.ts

export function createPrompt(
  prompt: string = "What is the largest number with a name?",
  temperature: number = 0.7,
  maxOutputTokens: number = 100,
  topP: number = 0.95,
  topK: number = 40
): TextRequest {
  const request : TextRequest = {
    "instances": [
      {
        "prompt": `${prompt}`
      }
    ],
    "parameters": {
      "temperature": temperature,
      "maxOutputTokens": maxOutputTokens,
      "topP": topP,
      "topK": topK
    }
  }
  return request;
}

Building the Endpoint URL

The endpoint requires consolidating the base URL, API version, your project ID, Google Cloud region, publisher, the model name, and the specific action. We concatenate these together to form the full prediction endpoint URL.

buildEndpointUrl(projectId: string) {
  const BASE_URL = "https://us-central1-aiplatform.googleapis.com/";
  const API_VERSION = 'v1';        // may be different at this time
  const MODEL = 'text-bison';      

  let url = BASE_URL;              // base url
  url += API_VERSION;              // api version
  url += "/projects/" + projectId; // project id
  url += "/locations/us-central1"; // google cloud region
  url += "/publishers/google";     // publisher
  url += "/models/" + MODEL;       // model
  url += ":predict";               // action

  this.endpoint = url;
}

This endpoint in particular calls the predict API on text-bison. If you want to try any of the other available APIs (chat, code, images, speech) find more details here. Just be aware that each API has different endpoints, requests and responses.

Authentication

Next, we added the access token retrieved earlier to the HTTP headers to authenticate the request:

getAuthHeaders(accessToken: string) {
  this.headers = new HttpHeaders()
    .set('Authorization', `Bearer ${accessToken}`);
}

In order to comply with the security credentials, we added an auth header.

This access token is temporary so you may need to renew it by running this command again gcloud auth print-access-token.

Making the request

Finally, we use all the previous steps to make a POST request to the prediction endpoint. To handle the response, we define a subscribe callback to print out to the console the content for the prediction. This is part of the response object as seen in the code below.

TestVertexAIWithoutApiKey() {
  const PROJECT_ID = '<YOUR_PROJECT_ID>';
  const GCLOUD_AUTH_PRINT_ACCESS_TOKEN = '<YOUR_GCLOUD_AUTH_PRINT_ACCESS_TOKEN>';
  
  this.buildEndpointUrl(PROJECT_ID);
  this.getAuthHeaders(GCLOUD_AUTH_PRINT_ACCESS_TOKEN);
  
  this.http.post<TextResponse>(this.endpoint, this.prompt, { headers: this.headers })
    .subscribe(response => {
      console.log(response.predictions[0].content);
    });
}

Be sure to substitute your actual project ID and access token retrieved in the credential steps here.

Handling the Prediction Response

In order to run the code run this command in the terminal and navigate to localhost:4200.

ng serve

Verifying the response

To verify the response from the prediction call, you can quickly check the Console output in your browser.

console.log(response.predictions[0].content);

{
  "predictions": [
    {
      "content": "The largest number with a name is a googolplex. A googolplex is a 1 followed by 100 zeroes."
    }
  ],
}

Congratulations! You now have access to Vertex AI APIs. And that’s really all you need to integrate PaLM for Text predictions into an Angular application. The complete code is available on GitHub.

Conclusion

By completing this tutorial, you learned:

How to obtain credentials and set up access to the Vertex AI APIs
How to create an Angular application using the CLI
How to make a prediction request to a generative model
How to handle the response and output the generated text

You now have the foundation to start building AI-powered features like advanced text generation into your own Angular apps using Vertex AI.

Thanks for reading!

Have you got any questions? Feel free to leave your comments below or reach out on Twitter at @gerardsans.

Credits: AI Assistants

This post was co-created with the help of a few AI assistants. Claude collaborated with me throughout the writing process — from structuring ideas to drafting content. Their tireless efforts as my virtual writing buddies were instrumental in crafting this tutorial. Later, Bard and ChatGPT punctually help me out getting to the final version.

Try Claude for FREE here: https://claude.ai

Annex 1: Using API key from MakerSuite? Join the waitlist

In this annex you will learn how to use your API key to access Vertex AI. First, make sure you have a valid API key from MakerSuite or request joining the waitlist here if you live outside the US. Once you have it you can quickly verify it with a curl request.

Once you receive your API key follow these steps:

Changes to the Prediction Component

Now the code to make the prediction call becomes:

TestVertexAIWithApiKey() {
  const API_KEY = '<YOUR_API_KEY>';

  this.buildEndpointUrlApiKey(API_KEY);

  this.http.post<TextResponse>(this.endpoint, this.prompt)
    .subscribe(response => {
      console.log(response.predictions[0].content);
    });
}

Replace API_KEY with yours, pass the prompt text in the request body, and make a POST request to the prediction endpoint. Let’s quickly go through the remaining code. I tried to broke buildEndpointUrl down so you can see how each piece connects together.

buildEndpointUrlApiKey(apikey: string) {
  const BASE_URL = "https://generativelanguage.googleapis.com/";
  const API_VERSION = 'v1beta2';   // may be different at this time Eg: v1, v2, etc
  const MODEL = 'text-bison-001';  // may be different at this time

  let url = BASE_URL;              // base url
  url += API_VERSION;              // api version
  url += "/models/" + MODEL        // model
  url += ":generateText";          // action
  url += "?key=" + apikey;         // api key

  this.endpoint = url;
}

This endpoint in particular calls the generateText API on text-bison-001. If you want to try any of the other available APIs (chat, code, images, speech) find the details here. Just be aware that each API has different endpoints, requests and responses.

Notice how in this version of the code the API key is part of the query string.

Full code is available on GitHub.

Annex 2: Understanding prompt basics: temperature, topK and TopP

Prompt parameters are some basic arguments that we can fiddle with to influence the output for a given prompt. For example, these can be used to get less common and more creative outputs.

Tokens vs words

While we submit prompts to the PaLM API in the form of text, the API converts these strings into chunks called tokens. A token is approximately four characters. 100 tokens correspond to roughly 60 to 80 words.

{
  "predictions": [
    {
      "content": "The largest number with a name is a googolplex. A googolplex is a 1 followed by 100 zeroes."
    }
  ],
  "metadata": {
    "tokenMetadata": {
      "inputTokenCount": {
        "totalTokens": 
      },
      "outputTokenCount": {
        "totalTokens": 
      }
    }
  }
}

Some tasks may require you to review the default settings for these parameters to get better results.

Diagrams by Cohere. How prompt parameters and temperature influence the output.

Let’s look at these in more detail:

prompt. This parameter becomes the text input. Eg: What is the largest number with a name? Feel free to change this prompt as is only to test access to the API. Mandatory.
temperature. This parameter allows you to introduce less common options during the decoding strategy thus rising up the chances for a more creative or less repetitive responses. When set to 0 a greedy strategy is used and the first option is picked. Recommended.
maxOutputTokens. This parameter limits the total amount of tokens generated by this prompt. Note output can be truncated leaving part of the answer missing. Be aware of costs while testing. The maximum value is 1024. Recommended.

Most of the time, the default values will be enough.

Extending the reach for your responses: TopK and TopP

Some times you will be testing a prompt and looking for different responses that can better fit your expectations or be a better match for the task at hand. In these cases you can use TopK and TopP to extend the scope for more and different responses.

Note that these parameters require a deeper understanding of the model and how probability distributions work for a given prompt.

As shown in the diagram below, given a certain temperature, you can introduce more variability by increasing topP and topK respectively. The final result will depend on the probability distribution present for that specific prompt.

Diagrams by Cohere. Shows outputs changing TopP and TopK for different Temperatures

As a review:

topK. This parameter limits the sampled options to k . Default value is set to 40. Optional.
topP. This parameter limits the aggregated probabilities up to p. Default value is 0.95. Values between 0 and 1. Optional.

When both topK and topP are provided topK takes precedence before applying topP. Finally a random selection is made within the sampled tokens. The probabilities for each token are the result after the training for that model ends.

Common strategies when testing your prompts:

Limit response to the most common tokens seen in training. Set temperature to zero and topP to 0.2. This will respect the probabilities from the model and chose a token that seats within the 20% of cases seen by the model during training. This will also limit the variability for long-tail distributions if that’s the case for your prompt.
Allow less common tokens and more creative responses: Set temperature to 0.7 and topK to 40. This will rise the probabilities for less common cases so they can show in the output. Note this strategy depends on each prompt and how many training data relevant to it was present during training.

Annex 3: Response Metadata: Citations, Safety Attributes and Billing

As part of the response you will also get access some interesting metadata around citations, safety and billing. See below the whole structure for reference. For more details, see Responsible AI.

{
  "predictions": [
    {
      "content": "The largest number with a name is a googolplex. A googolplex is a 1 followed by 100 zeroes."
      "citationMetadata": {
        "citations": []
      },
      "safetyAttributes": {
        "scores": [],
        "blocked": false,
        "categories": []
      }
    }
  ],
  "metadata": {
    "tokenMetadata": {
      "inputTokenCount": {
        "totalBillableCharacters": ,
        "totalTokens": 
      },
      "outputTokenCount": {
        "totalBillableCharacters": ,
        "totalTokens": 
      }
    }
  }
}

Safety

Generative AI is still very new and while adding AI-enabled features to your Apps be aware of the current PaLM API limitations, including:

Model Hallucinations, Grounding, and Factuality
Biases and Limited Domain Expertise

Learn how you can mitigate these by following the recommendations found in Responsible AI.

Vertex AI Billing

As a reference, 500 requests and corresponding 500 responses, will cost you (per 1,000 characters):

PaLM for Text: $1 (used in this post)
PaLM for Chat or Codey: $0.5
Embeddings for text or images: $0.1

Prices may vary over time. Note input and output are charged separately. See latest pricing.