Creation of LLM Program

Viswanatha Reddy Allugunti, Ph.D
2 min readJul 30, 2023

If you read to miss my previous blogs on the LLM’s and Prompt Engineering please go through with the below link.

https://www.linkedin.com/pulse/prompt-engineering-llm-viswanatha-allugunti-phd%3FtrackingId=mG80E65FRiSKtwPmlBFVFw%253D%253D/?trackingId=mG80E65FRiSKtwPmlBFVFw%3D%3D

Creating a language model from scratch is a complex task, but here’s a simplified example using Python with NLTK library. In this case, we’ll be creating a bigram model, which is a type of language model that uses the previous word to predict the next word.

import nlt
from nltk import bigrams, FreqDist
from nltk.util import pad_sequence
from nltk.lm import MLE
from nltk.lm.preprocessing import padded_everygram_pipeline
# Assume this is our small text dataset
data = "ChatGPT is a language model created by OpenAI. It can generate human-like text. OpenAI is an AI research organization."
# Tokenize the data
tokenized_text = [list(map(str.lower, nltk.tokenize.word_tokenize(sent)))
for sent in nltk.sent_tokenize(data)]
# Preprocess the tokenized text for bigrams language modelling
n = 2 # We're creating a bigram model
train_data, padded_sents = padded_everygram_pipeline(n, tokenized_text)
# Build the language model
model = MLE(n) # Maximum Likelihood Estimator
model.fit(train_data, padded_sents)

Now you have a simple bigram language model trained on your small text dataset. You can use this model to generate text as follows:

print(model.generate(10, random_seed=7))  # Generate a sequence of 10 word

Please note that this is a very basic language model. Real-world language models like GPT-3 are far more complex and are based on neural networks. They require vast computational resources and extensive knowledge in machine learning and natural language processing. This simplified example is just to illustrate the basic concept of a language model.

--

--

Viswanatha Reddy Allugunti, Ph.D
Viswanatha Reddy Allugunti, Ph.D

Written by Viswanatha Reddy Allugunti, Ph.D

Solutions Architect — AI | ML | IoT | Blockchain | PhD| Author| Forbes Technology Council Member | UN Youth Assembly Delegate | Keynote Speaker