Stories by Bhavana Gollapudi on Medium

Understanding AGENTIC AI

Bhavana Gollapudi — Sun, 18 May 2025 14:49:57 GMT

In an era where artificial intelligence (AI) is rapidly transforming industries and shaping the way we interact with technology, the concept of agentic AI has gained significant traction. Unlike traditional AI systems that are often designed to perform specific tasks with no autonomy or higher-level decision-making capabilities, agentic AI focuses on creating intelligent systems that exhibit autonomy, proactivity, and intentionality in their actions. But what exactly is agentic AI, and why is it important for the future of technology? Let’s dive in.

GENERATIVE AI vs AGENTIC AI

Generative AI refers to AI systems designed to create new content, such as text, images, videos, audio, or even code. These systems use algorithms (often based on machine learning models like neural networks) to analyze existing data patterns and generate outputs that resemble the input data.

Agentic AI, on the other hand, refers to AI systems that exhibit autonomy, goal-directed behavior, and proactivity in their operations. These systems are capable of taking independent actions, making decisions, and adapting to changes in their environment without the need for constant human input or micromanagement.

Generative AI in simple is a cognitive engine to generate the content by providing reasoning using LLM prompts and use external tools for further actions (like publishing results in dashboards, excel sheets..) when it comes to Agentic AI, though it also uses LLMs to generate the content with reasoning, it will also facilitate to act upon the results which is the combination of ReAct (Reasoning & Action).

2. HOW AGENTIC AI WORKS ?

Agentic AI have the fundamental generative ai process to use pre-trained large language models for reasoning and also to perform further actions by accessing the memory (like creating reports, publishing results in dashboard, raise requests in any tool, trigger alerts etc.,)

Agentic AI systems are designed to function autonomously, making decisions and taking actions in dynamic environments without requiring constant human supervision. To understand how such systems operate, we can break down their workflow into a series of well-defined steps or components. These steps collectively enable an agentic AI to perceive its surroundings, plan its actions, execute those plans, and adapt to changes — a cycle that mimics human-like decision-making but in a computational framework.

3. AI AGENTS

Agentic AI is designed in such a way to avoid human input at each generation of content. An AI agent is an intelligent computational entity that perceives its environment, makes decisions based on that input, and takes actions to achieve specific goals.

Unlike traditional algorithms that require explicit instructions for every task, AI agents are designed to operate with a degree of autonomy, using a continuous feedback loop to adapt and improve their behavior.

The combination of AI Agents will form an AI workflow.

(A) Perception

Agentic AI systems must first gather information about their environment in real time. This step involves perceiving the external world through sensors, data inputs, or external APIs. The system processes this data to create an “understanding” of its current conditions.
Example: A financial agent gathers information of historical details of a stock, articles related to that, market trends, open challenges etc.

(B) ReACT

Once an agent perceives its environment, it processes the information to take appropriate actions. It processes the data gathered to understand what’s going on and decides what to do based on its understanding, then improves and adapts over time, learning from feedback and experience. To achieve these, LLMs were used.
Example: A financial AI agent evaluates market trends to determine whether to buy, hold, or sell a stock.

(C) Feedback and Learning

The agent evaluates the results of its actions. If the outcome is unsatisfactory or an unexpected error occurs, the agent uses this feedback to improve its future decisions. Advanced agents may leverage machine learning for continuous self-improvement.
Example: By constantly refining strategies and adapting to market dynamics, these agents are paving the way for a future where finance becomes smarter, faster, and more personalized.

Multi AI Agents:

To achieve multiple actions, we might need more AI agents to perform the tasks. For example, if we want to meet a person at X cafe, AI Agent-1 will find the best route to reach the destination. But if we want to know the weather condition to meet the person at the discussed time frame, AI Agent-1 is not enough, we need one more agent to get the weather information and thus multiple AI Agents combined to get her to achieve the outcome.

4. TYPES OF AI AGENTS

Not all AI agents are the same. They vary in complexity and capability, depending on the scope of their tasks and the intelligence required. Here are the main types of AI agents:

(A) Simple Reflex Agents:

Also called Reactive agents operate purely based on the current state of their environment. They do not store past data or plan for the future. These agents follow simple rules — if a specific condition occurs, they take an appropriate action.
Example: A thermostat that adjusts temperature based on the current room conditions.

(B) Model Based Reflex Agents:

These agents go a step beyond reactive agents by maintaining an internal model of their environment. They can update their behavior by considering how their actions will affect the world.
Example: A chess AI that calculates potential moves and predicts opponents’ responses based on a model of the game.

(C) Goal Based Agent:

These agents are designed to achieve specific goals. They don’t just react to their environment — they actively evaluate and choose actions based on how those actions will contribute to achieving their goals.
Example: A self-driving car navigating to a destination while avoiding traffic and hazards.

(D) Utility Based Agent:

These are an advanced type of goal-based agents that go beyond achieving “any goal” by trying to achieve the best possible goal. They measure the “utility” (or value) of each decision and take the action expected to yield the most favorable outcome.
Example: A robotic shopper that determines which store provides the best combination of price and convenience.

(E) Learning Based Agent:

These agents enhance their performance through learning. Starting with a basic set of instructions or objectives, they improve over time through feedback from their environment. This is done via machine learning techniques like reinforcement learning.
Example: Virtual assistant like Alexa that personalize responses based on user preferences and past interactions.

REFERENCES

CONCLUSION

As AI agents become more capable and autonomous, their applications will only grow, transforming industries, enhancing productivity, and opening new opportunities for innovation.

Optimization using PyGAD

Bhavana Gollapudi — Sun, 28 Jan 2024 13:45:48 GMT

Optimization using PyGAD | GA

Genetic Algorithm resembles the theory of natural evolution. It is a heuristic evolutionary algorithm that helps in optimization problems. It is used to solve complex problems by mimicking the evolution process to improve the population of potential solutions iteratively.

In simple terms, the word genetic means inherit the properties of its own parents. Similarly the algorithm uses process of selection, crossover, mutation of properties and produce new offsprings which in turn used in the process of solving the given requirement.

Brief flowchart of how the algorithm works.

Image from Internet

The fitness function is used to calculate the fitness value for each solution in the population. The fitness value is calculated using the sum of absolute difference between genes values in the original and reproduced chromosomes.

Genetic Algorithm can be implemented in Python from scratch or by using libraries like PyGAD, GeneAI.

Lets go through a tour of PyGAD and a short implementation of it.

PyGAD is an open source easy to use python library for implementing the genetic algorithm and it supports a wide range of parameters and methods.

Image from Internet

Each step like selection, crossover, mutation were again categorized into multiple parameters and methods and the complete life cycle of PyGAD library is available in the below website:

PyGAD

The initial step is to install the library:

pip install pygad

PyGAD is a general purpose optimization library that is designed to customize the fitness function. It mainly consists of 3 steps:

Build the fitness function — Create an instance of the class — Call the pygad run method.

Lets see the implementation part of GA using PyGAD library

def fitness_func(ga_instance, solution, solution_idx):
   
    fitness = ## use customized method or specific operation to get value

    if fitness is None:
        return 0
    else:
        return fitness

Below is the link of the code:

https://medium.com/media/c2b3c8af37289f856f4621d5f71dba45/href

The complete breakdown and workflow was described while implementing the code itself. In this way, we can have pygad into picture for solving the optimization problems in heuristic approach by having control over the parameters of pygad and customizing it’s methods especially fitness function. Pygad facilitates to visualize and load the output.

PyGAD can be implemented by integrating with deep learning techniques as well efficiently.

We can also implement the python code using GeneAI also well and it it described well here:

Introducing GeneAl: a Genetic Algorithm Python Library

Reinforcement Learning with Human Feedback — Avails in ChatGPT

Bhavana Gollapudi — Mon, 24 Jul 2023 09:45:07 GMT

Reinforcement Learning with Human Feedback — Avails in ChatGPT

In the recent days, Generative AI takes a key role in the IT industry. Basically the main goal of generative ai is to generate new data that was trained on some huge amount of data. It generates language models and image models.

What is ChatGPT?

ChatGPT is an AI powered language model that was developed by OpenAI and launched on November 30,2022 that is capable of generating human like conversations. This uses natural language processing to create humanlike conversational dialogue. The language model can respond to questions and compose various written content, including articles, social media posts, essays, code and emails.

ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.

What is its workflow?

ChatGPT is well trained to perform its tasks in such a way that a person interacts. We have multiple stages to get the model trained.

Will go in deep how it is trained:

Workflow of ChatGPT

Input of the Model:

Internet have vast amount of data that can be dealt as training dataset for any model to go with. In crisp, the internet data that is being used for training purpose is called as Internet Corpus Data which plays as an input here.

Generative Pre-training:

Here the base model was trained on the input corpus data by using Transformer architecture.

Reinforcement Learning through Human Feedback:

RLHF is primarily used in natural language processing (NLP) for AI agent understanding in applications such as chatbots and conversational agents, text to speech and summarization. In regular reinforcement learning, AI agents learn from their actions through a reward function. RLHF integrates human guidance to accelerate learning.

RLHF have sub stages to achieve the final tuned model.

(A) Supervised Fine Tuning | SFT

(B) Reward Model | RM

Figure adapted from OpenAI

Supervised Fine Tuning

With human AI trainers, you get to have conversations between you and AI assistant. This stage uses Stochastic Gradient Descent algorithm and creates SFT training data corpus which consist of conversational history as input and ideal next response as output.

Stochastic gradient descent algorithm is an optimization algorithm for machine learning that keeps tweaking the parameters of a model until the cost function is minimized. Here parameters get updated based on gradient computed from the random subset of training data.

For example:

Prompt: Generate text on mountain

Output: Landform that rises prominently above its surroundings, generally exhibiting steep slopes, a relatively confined summit area, and considerable local relief

The Expert policy is simply acts as a rule book and the model get tuned accordingly. But this is not enough !

Due to out of rule book conversations, distributional shift get created and it creates issue with the SFT output model.

Reward Model

In this stage, the model get optimized by training it against a reward model. RM provide additional feedback to the agent. You can provide models that assign a value function to different states or actions based on desirability. The agent learns to maximize the cumulative reward signal it receives. Human preferences are integrated into the system.

The underlying goal is to get a model that takes in a sequence of text, and returns a scalar reward which should numerically represent the human preference. The system can be an end-to-end LM, or a modular system outputting a reward (e.g. a model ranks outputs, and the ranking is converted to reward). The output being a scalar reward is crucial for existing RL algorithms being integrated seamlessly later in the RLHF process.

Human annotators are used to rank the generated text outputs from the LM. One may initially think that humans should apply a scalar score directly to each piece of text in order to generate a reward model, but this is difficult to do in practice. The differing values of humans cause these scores to be uncalibrated and noisy. Instead, rankings are used to compare the outputs of multiple models and create a much better regularized dataset.

For example:

Prompt: Generate text on mountain

Output A: Landform that rises prominently above its surroundings, generally exhibiting steep slopes, a relatively confined summit area, and considerable local relief

Output B: A mountain is an elevated portion of the Earth’s crust, generally with steep sides that show significant exposed bedrock. Although definitions vary, a mountain may differ from a plateau in having a limited summit area, and is usually higher than a hill.

Output C: A mountain is an elevated portion of the Earth’s crust, generally with steep sides that show significant exposed bedrock.

Input prompt and several model outputs are sampled and a labeler ranks all the outputs with human agent assistance from best to worst and this ranked score data is used to train our reward model.

Proximal Policy Optimization

This helps to decide what is a good response and what is not. The reward model gives reward scores to PPO. It updates policy function in small proximal steps so that it gets better at choosing the best response using advantage function.

Advantage function measures how much better a response compared with all other responses.

PPO is a trust region optimization algorithm that uses constraints on the gradient to ensure the update step does not destabilize the learning process.

But this isn’t enough as due to avoid over optimizing, divergence might takes place and hence KL divergence to be checked before obtaining the final output model.

KL Divergence

This (Kullback–Leibler) KL divergence measure the difference between two probability distributions. Simply it tells how much information gets lost when one thing is used to guess the other.

In addition, per-token probability distributions from the RL policy are compared to the ones from the initial model to compute a penalty on the difference between them. This penalty has been designed as a scaled version of the KL divergence between these sequences of distributions over tokens. The KL divergence term penalizes the RL policy from moving substantially away from the initial pretrained model with each training batch, which can be useful to make sure the model outputs reasonably coherent text snippets. Without this penalty the optimization can start to generate text that is gibberish but fools the reward model to give a high reward.

Let’s formulate this now:

The reward function is where the system combines all of the models we have discussed into one RLHF process. Given a prompt, x, from the dataset, the text y is generated by the current iteration of the fine-tuned policy. Concatenated with the original prompt, that text is passed to the reward model, which returns a scalar notion of preferability, rθ.

In practice, the KL divergence rKL is approximated via sampling from both distributions.

The final reward sent to the RL update rule is r=rθ−λrKL

Output of the Model:

The final output obtained after multiple stages will be the ChatGPT model with well trained and tuned with multiple parameters and factors.

Conclusion

RLHF’s most recent success was its use in ChatGPT. This is how ChatGPT avails RLHF and generates desired output !

Family of AI

Bhavana Gollapudi — Thu, 20 Jul 2023 14:35:24 GMT

We often listen AI is everything and everywhere !

What actually is it ?

AI is a tree that is able to do tasks without human intervention and it is the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.

What are its types?

Artificial Intelligence family have various children:

Reactive Machines:

AI with well trained instructions such as self driving cars

Generative AI:

This breed is designed to generate new content as its primary output trained on bunch of data such as predicting next word in a sentence, chatbots, chatGPT

Limited Memory:

This type is works on data like forecasting the weather

Theory of Mind:

AI is efficiently trained as such human’s behave such as virtual assistants, conversational bots

Narrow AI:

This sort of AI generates customized product suggestions as noticed in our day-to-day routine

Supervised Learning:

This classification type deals with identifying the objects, classifying them and dealing with the distribution of data

Unsupervised Learning:

This type basically focuses on detecting the abnormalities, clustering the similar and unsimilar groups

Reinforcement Learning:

This sort of kind majorly plays crucial role in accepting and working on human feedback and behaves accordingly such as teaching a machine how to play chess

Conclusion

AI though have various breeds, the core element of it is to minimize human actions and get effective reactions and responses according to the inputs provided and those inputs should be clear in its conditions if so.