Deploy Hugging Face LLMs on Teradata VantageCloud with NVIDIA GPU Acceleration
Organizations are under increasing pressure to quickly deploy valuable GenAI applications and demonstrate a strong Return on Investment (ROI). However, deploying GenAI comes with significant challenges for the developers and data professionals including using the right large language model (LLM) efficiently while maintaining security, privacy, and cost-effectiveness.
With the Bring Your Own LLM (BYOLLM) capability of Teradata VantageCloud on AWS — and coming to Google Cloud and Azure — companies can now easily experiment, deploy and inference with small to medium-sized open source LLMs within the database. This capability reduces complexity and costs while giving businesses, developers, and data professionals full control over their data in a single, flexible, secure and trusted environment.
For a simple example, we’ll walk through deploying one of DeepSeek’s models for a basic in-database inference example within Teradata VantageCloud using BYOLLM. You’ll learn:
- How to set up VantageCloud with an Analytic GPU compute group
- How to connect to VantageCloud using teradataml
- How to download LLMs from Hugging Face using a transformers package
- How to execute batch inference jobs using the APPLY function
By the end, you’ll have all the tools you need to build a streamlined LLM inference pipeline running within VantageCloud which will help you extract insights from your vast repositories of unstructured data such as text, emails, pdfs, customer communications, call center transcripts and more. Select any small to medium sized LLM or Deep Learning Models from Hugging Face and perform Natural Language Processing tasks.
Table of Contents
- Why Run LLMs on Teradata VantageCloud?
- Step-by-Step: Deploying a Hugging Face Model Using BYOLLM on Teradata
- Explore More Real-World NLP Demos with Task-Specific Models for In-Database Processing
- Conclusion & Next Steps
- BYOLLM Coming Soon to Teradata’s Free Online Learning Site
Why Run LLMs on Teradata VantageCloud?
The BYOLLM capability enables developers and data scientists to select their preferred open-source LLMs from Hugging Face and inference tasks within VantageCloud without moving data outside of their secure Teradata environment. Additional benefits include:
- Maximize Security, Privacy and Trust: Keep sensitive data within your trusted Teradata VantageCloud analytics platform rather than sending it to external LLM APIs.
- Flexible Compute Options: Choose CPU or NVIDIA GPU-accelerated inference based on model complexity and performance needs.
- Ephemeral On-demand Compute: Optimize costs by only using resources when needed — spin up CPU or GPU cluster for the duration of the task.
- Optimized Model Selection: Run task-specific or general-purpose models tailored to your use case, whether it’s NLP tasks such as entity recognition, text classification, sentiment analysis, document summarization, language detection, Masking PII entities and more.
- Scalable AI Workloads: Execute batch jobs by reading input data directly from database tables and storing results back into Teradata tables.
- Reduce Vendor Lock-in: Quickly experiment with open-source models to maintain flexibility, control costs, and avoid dependency on proprietary APIs.
Let’s get started!
Step-by-Step: Deploying a Hugging Face Model Using BYOLLM on Teradata
Prerequisites:
Set up VantageCloud with Analytic GPU Compute Cluster
- Ensure that your organization administrator has configured an environment and set up the appropriate environment connection type. If these steps have not been completed, refer to this quickstart guide for instructions: Getting Started with VantageCloud.
- To run LLM workloads your admin will also need to configure an Analytic GPU Compute Cluster, set up its profile, and provide your user access. These clusters come pre-configured with NVIDIA GPU-accelerated containers, optimized for deep learning model and large language model inference.
Admin Steps for Creating GPU Compute Clusters
- Log in to the VantageCloud console using Org Admin credentials.
- Authenticate into the environment with DBC or a user that has access to TD_COMPUTE_CLUSTER_ADMIN.
- Select Compute groups from the environment
- Select (+) icon to navigate to Create group window
- Enter Name and Description for the new compute group
- Select Analytic GPU as the group type and select Create
7. Go to Manage Access > Users.
8. Create new or edit existing users to grant access to the new analytic GPU compute group.
9. Grant the user or user groups permission to execute the APPLY statement
GRANT EXECUTE ON FUNCTION TD_SYSFNLIB.APPLY TO {username};10. Grant permission to create and manage the User Environment using Open Analytics Framework APIs.
GRANT TD_DATA_SCIENTIST to {username};11. Enable console login for the user to manage environment services:
GRANT LOGON ON ALL TO {username} WITH NULL PASSWORD;Step 1: Start a JupyterLab Docker container with Teradata Jupyter Extensions
Make sure your Docker desktop is up and running, open the terminal and run the following command to start a JupyterLab Docker container with Teradata Jupyter Extensions for Docker and bind a local directory on your machine to a directory inside the container.
The Teradata Jupyter Extensions for Docker provide the SQL ipython kernel, and utilities to manage connections to Teradata.
docker run --platform linux/amd64 -e "accept_license=Y" \
-p 127.0.0.1:8888:8888 \
-v ~/container_data:/home/jovyan/JupyterLabRoot \
teradata/jupyterlab-extensions Once the container is running, open the provided URL in your browser to access the JupyterLab environment.
Step 2: Install additional packages and import dependencies
Once in JupyterLab:
1. Create a new Python 3 notebook and name it.
2. Install the necessary packages and import dependencies. For this tutorial, use teradataml 20.00.00.03 or greater (supports BYOLLM).
Run the following in the notebook:
!pip install torch transformers==4.47.1 teradataml==20.00.00.03 numpy pandas sentence-transformers sentencepiece dill "accelerate>=0.26.0" bitsandbytes
import teradataml
import getpass
import sys
import pandas as pd
from teradataml import *
from teradatasqlalchemy.types import *
from IPython.display import display as ipydisplay
from os.path import expanduser Step 3: Create a Connection to VantageCloud
Connect to VantageCloud using the teradataml Python library. Input your connection details, including the host, username, password and compute group name.
print("Creating the context...")
host = <HOST>
username = <USER>
password = <PASSWORD>
compute_group = <COMPUTE_GROUP>
eng = create_context(host = host, username = username, password = password)
print("Connected to Teradata:", eng) Step 4: Configure User Environment Service (UES) Authentication
UES authentication is necessary to manage Python or R environments within Teradata.
To do this we need a UES URL and personal access token and key.
Retrieve your UES URL from the VantageCloud console under the “Connect your tools” section.
Generate a personal access token (PAT) and key:
- Log in to the VantageCloud console.
- Click your profile icon and select Account Settings.
3. Scroll to the Access Tokens and Keys section and create a new access token and key.
4. Save the PAT token and download the key file.
ues_url = "<UES_URL>"
#personal access token and key
pat_token = "<PAT_TOKEN>"
pem_file = os.path.expanduser("<PEM_FILE.PEM>") #Key must be uploaded as pem file configure.ues_url = ues_url
if set_auth_token(ues_url=ues_url, username=username, pat_token=pat_token, pem_file=pem_file):
print("UES Authentication successful")
else:
print("UES Authentication failed. Check credentials.")
sys.exit(1)Step 5: Set Up the Environment in Teradata VantageCloud for Model Deployment
Now that UES authentication has been set, we can begin using the APIs to manage user environments. Let’s start by viewing the available user environments and creating a new one.
env_list = list_user_envs()
print("Available Environments:")
ipydisplay(env_list)Create a new environment that matches your existing python version.
# get the current python version
python_version = str(sys.version_info[0]) + '.' + str(sys.version_info[1])
print(f'Using Python version {python_version} for user environment') Use create_env API, give the environment a name, set the version, and a description.
#Create a new Python user environment for Python 3.10
demo_env = create_env(env_name = input("Env Name:"),
base_env = f'python_{python_version}',
desc = 'BYOLLM demo env') Set demo_env to your newly created environment.
demo_env = get_env(input("Env Name:"))Install required libraries into the user environment.
demo_env.install_lib(["transformers==4.47.1", "torch", "sentencepiece", "numpy", "pandas", "sentence-transformers", "dill", "accelerate", "bitsandbytes"])
print("All libs installed") demo_env.libsStep 6: Download and Archive deepseek-coder-1.3b-instruct from Hugging Face
You can download Hugging Face LLMs in either native format or streamlined format. The APPLY Python script that loads the language model must match the format.
Here we use the native Hugging Face format to download the model.
#download model into a local directory first
from transformers import AutoModelForCausalLM, AutoTokenizer #AutoConfig
model_name = "deepseek-ai/deepseek-coder-1.3b-instruct"
cache_dir = "/home/jovyan/JupyterLabRoot/cache"
print("Downloading model...")
AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, cache_dir=cache_dir)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, cache_dir=cache_dir)
print("Model downloaded!")
Archive the model.
import shutil
model_base_path = "/home/jovyan/JupyterLabRoot/cache/models--deepseek-ai--deepseek-coder-1.3b-instruct"
shutil.make_archive("models--deepseek-ai--deepseek-coder-1.3b-instruct", "zip", model_base_path)
print("Full model archived successfully!") After the LLM directories are compressed into zip files, you can use user environment model APIs to install, uninstall, and list LLMs. Here we use the install_model API.
demo_env.install_model(model_path="models--deepseek-ai--deepseek-coder-1.3b-instruct.zip", asynchronous=False)
print("Model uploaded to OAF!") Confirm the model has loaded.
ipydisplay(demo_env.models) Step 7: Create a sample dataset and inference using the APPLY Function.
Create a sample dataset to test the model.
# Sample data
data = {
'text': [
"How do I write a SQL query to find duplicate records in a table?",
"What is the difference between a left join and an inner join in SQL?",
"How can I optimize a Python script to run faster?",
"What are the key differences between Python lists and tuples?",
"How do I write a function to reverse a string in Python?",
"How can I find the second largest element in an array using SQL?",
"What is the difference between a stack and a queue in data structures?",
"How do I write a recursive function to compute the factorial of a number?",
"What are the benefits of using indexes in a SQL database?"
]
}
# Create DataFrame
df = pd.DataFrame(data)
#Define column types
column_types = OrderedDict({
"text": VARCHAR(1000)
})
# Copy to Teradata, replacing existing table
copy_to_sql(df=df, table_name="coding_questions", if_exists="replace", types=column_types)
# Load data as a Teradata DataFrame
coding_questions = DataFrame("coding_questions")
# Display DataFrame to verify
print(coding_questions)Create the python script that will handle text input using standard input (sys.stdin), generate responses using the DeepSeek AI model, and output results in a structured format using standard output (sys.stdout). This script ensures efficient inference using NVIDIA GPUs if available.
%%writefile deepseek_inference.py
#!/usr/bin/env python3
import glob
import os
import sys, csv
import torch
import warnings
import pandas as pd
from transformers import AutoModelForCausalLM, AutoTokenizer
warnings.simplefilter("ignore")
# Model info
MODEL = "deepseek-ai/deepseek-coder-1.3b-instruct"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
# Load model & tokenizer once
def load_model():
global model, tokenizer
if "model" not in globals() or "tokenizer" not in globals():
model = AutoModelForCausalLM.from_pretrained(MODEL, trust_remote_code=True, cache_dir = './models').to(DEVICE) # Move to GPU explicitly
tokenizer = AutoTokenizer.from_pretrained(MODEL, local_files_only=True, cache_dir = './models')
# print("Model loaded successfully on", DEVICE)
load_model()
def generate_response(input_text):
if not input_text or input_text.strip() == "":
return "EMPTY INPUT"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(DEVICE)
attention_mask = torch.ones_like(input_ids).to(DEVICE)
# Generate response
output = model.generate(
input_ids,
attention_mask=attention_mask,
max_length=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(output[:, input_ids.shape[-1]:][0], skip_special_tokens=True)
colNames = ["user_input"]
d = csv.DictReader(sys.stdin.readlines(), fieldnames=colNames)
df = pd.DataFrame(d, columns=colNames)
df["response"] = df["user_input"].apply(generate_response)
df.to_csv(sys.stdout, index=False, header=False)Install the python script file to the environment.
demo_env.install_file('deepseek_inference.py', replace=True) Set the session to the GPU Analytic compute group. The cluster needs to be running to execute the APPLY statement.
execute_sql(f"SET SESSION COMPUTE GROUP {compute_group};")
print(f"Compute group set to {compute_group}")Now use the APPLY operator to execute our inference script on the dataset stored in Teradata VantageCloud.
returns_dict = OrderedDict({
# "query_id": VARCHAR(50),
"user_input": VARCHAR(1000),
"response": VARCHAR(4000)
})
# Run inference using Apply
apply_obj = Apply(
data=coding_questions,
apply_command="python deepseek_inference.py",
returns=returns_dict,
env_name=demo_env
)
import time
start = time.time()
df_result = apply_obj.execute_script()
print(f"Execution Time: {time.time() - start} seconds")
df_resultThe APPLY operator executes the Python script within the user environment. It processes text questions from all rows in the database, generates responses using the DeepSeek AI model and NVIDIA GPUs, and outputs the results in a dataframe within seconds.
Explore More Real-World NLP Demos with Task-Specific Models for In-Database Processing
After successfully deploying your first LLM, explore real-world Natural Language processing (NLP) demos directly in-database using task specific models.
You can download the following demos from the LLM_Use_Cases.zip file, available in the attachments section of the Teradata VantageCloud documentation (left sidebar). These demos include:
- Text Classification — bart-large-mnli (Facebook): Classifies text into predefined categories
- Language Detection — papluca/xlm-roberta-base-language-detection: Detects the language of text
- Generating Embeddings — sentence-transformers/all-mpnet-base-v2: Converts text into vector representations for similarity searches.
- Named Entity Recognition — tner/roberta-large-ontonotes5: Identify and categorize named entities within unstructured text
- Extracting Key Phrases — ml6team/keyphrase-extraction-kbir-kpcrowd: Extracts key phrases from a document to quickly understand the content.
- Grammar Correction: pszemraj/flan-t5-large-grammar-synthesis: Fixes grammatical errors in text
- Masking PII Entities: ab-ai/pii_model: Mask personally identifiable information (PII) entities
- Sentiment Analysis: distilbert-base-uncased-finetuned-sst-2-english: Determines the emotional tone of a sentence (positive, negative, neutral).
- Sentence Similarity: sentence-transformers/all-MiniLM-L6-v2 (Meta AI): Measures how similar two sentences are.
- Summarize — bart-large-cnn: Generates concise summaries of long documents.
- Translation — Helsinki-NLP/opus-mt-en-fr: Translates text from English to French
Conclusion & Next Steps
Businesses and data professionals now have an efficient, secure, and cost-effective solution for building powerful Generative AI applications with Open Source LLMs directly within Teradata VantageCloud. With Teradata’s BYOLLM capability teams can quickly scale their NLP and AI workflows without compromising on security, privacy, or flexibility.
If your company is looking to classify text, detect languages, perform sentiment analysis, mask sensitive data or other NLP tasks, BYOLLM allows you to quickly deploy task-specific open-source models securely and efficiently. Start experimenting today — download practical NLP use cases from the LLM_Use_Cases.zip available in the Teradata VantageCloud documentation.
BYOLLM Coming Soon to Teradata’s Free Online Learning Site
Stay tuned for the BYOLLM functionality on Teradata’s ClearScape Analytics Experience site, available upon request. ClearScape Analytics Experience is Teradata’s free online learning site providing a fully interactive environment to analyze data, build AI/ML models, and develop GenAI applications. You can create your free account here and sign in.
About the author
Janeth Graziani is a Developer Advocate at Teradata who enjoys leveraging her expertise in technical writing to help developers navigate and incorporate new tools into their tech stacks. When Janeth is not exploring a new tool for her stack or writing about it, she is enjoying the San Diego weather and beaches with her family. Before joining Teradata, Janeth was a Developer Advocate at Autocode where she helped build a thriving developer community. Connect with Janeth on LinkedIn!

