TrendsGPT: How to set up an AI agent for automated Data Analysis and Market Research

Published in

Bootcamp

9 min readNov 1, 2023

July’s experiment in my year-long experiment to use GenAI on various types of hobby-related activities is trying to use AI for automated market research to inform on building out a product for an Etsy storefront and do it with up-to-date data.

Can I create an #AIAgent that can automatically perform data analysis and market research and give some starter ideas for a product that I am looking to design?

Can this agent follow a multi-step process of looking at different pieces of information and take decisions based on qualitative and quantitative data?

Can this agent run this process automatically at a schedule of your choosing and provide the results back to you in a consumable format?

I would mark this a very successful project, See the entire code here at

theailifestyle/trendsGPT: 📈 TrendsGPT: An AI-powered tool harnessing OpenAI’s GPT-4 to automate market research and data analysis. Fetches trending topics from Reddit, analyzes sentiment, extracts insights, fact checks on google trends and recommends creative content. (github.com)

Let’s dive into the results first.

Check out the store for yourself HauteTopicByAI — Etsy

This was the result of running the AI agent successfully for the experiment above.

The product was creating a T-Shirt design to sell on our etsy store.
Market research was finding topics that are popular and appropriate for a T-Shirt design.
Data analysis was finding relevant topics and quantitatively determining the best idea for the design.
Get some starter ideas, Iterate, develop the design.
Publish to our Etsy Store. The genesis of my experiment:

Let’s start with putting together the steps I had to take to do the above.
Over the last few weeks my feed (various channels; google news, reddit and other news sources)is buzzing with news about Taylor Swift. I analyze and know enough about the topic to identify it as a popular topic. I should back this up with some quantitative data. Let’s start with Taylor Swift and take a look at her Google Trends.

As you can see, a lot of the trend is really popular in Kansas. A quick internet search session will yield this is because of the current topic of Taylor Swift and Travis Kelce. Travis Kelce is a football player in Kansas Chiefs and Red is a very identifying color. So Taking these in mind and checking to see what is out there around this topic, I was able to get the Traylor Shirt and 1387 Designed and published.

Now, let us take these steps and automate and build an AI agent.

Market Research:

Market research has often been compared to constructing a puzzle, where every piece represents an insight into consumer behavior, industry trends, and competitive landscape. Traditional methods of piecing this puzzle together involve data scientists sifting through mounds of data, analysts laboring over spreadsheets, and the inevitable hours spent in meeting rooms discussing strategies based on this research.

However, in our ever-evolving digital age, we’re posed with a new challenge: the sheer velocity, variety, and volume of information available. As the data grows, the need for a streamlined, efficient, and comprehensive approach to market research becomes paramount. This is where the power of AI, specifically large language models (LLMs) like OpenAI’s GPT-4, comes into the spotlight. But there’s a catch!

LLMs are incredibly powerful. Their natural language understanding and generation capabilities can churn out human-like text, understand context, and even offer creative solutions. Yet, they also have limitations.

The two biggest limitations so far are

Recent and Up-to-date data.
Instances of hallucination.

We are going to try and solve for these two topics with our AI agent.

The Need for a Multi-Step Approach

Solving the limitations requires a multi-step approach.

Gathering Up-to-date data:

We are going to try and solve for these two topics with our AI agent.

First, getting recent and up-to-date data. We will use the Reddit API to get the latest information from a subreddit of our choice.

# --- Setup API Keys and Reddit Client ---
CLIENT_ID = 'Xg_XXXXXXXXXX' #use your client ID
CLIENT_SECRET = 'XXXXXXXXX'#use your own key
USER_AGENT = 'trending_topic_fetcher'

reddit = praw.Reddit(client_id=CLIENT_ID, client_secret=CLIENT_SECRET, user_agent=USER_AGENT)

# Fetch and filter articles from Reddit
top_week_posts = reddit.subreddit('news').top(time_filter='week', limit=1)
memeworthy_headlines = []
memeworthy_urls = []py

This snippet showcases the first step: Sourcing data from Reddit, ensuring that the LLMs work with up-to-date information.

To find out how to setup your own Reddit API, follow the link on the README.md

Sentiment Analysis: The Filtering Mechanism

Every piece of content has an emotional undertone. To ensure we’re working with data that aligns with our objectives, sentiment analysis is crucial.

def get_sentiment(headline):

    response = openai.Completion.create(

        engine="text-davinci-003",

        prompt=f"What is the sentiment of this headline? Positive, neutral, or negative? I am trying to make an upbeat t-shirt design positive is anything that fits the objective. If you are unsure put it under neutral and if you feel it is not a good fit make it negative \"{headline}\"",

        max_tokens=1000

    )

    sentiment = response.choices[0].text.strip().lower()

    return sentiment

This function leverages the capabilities of GPT-4 to gauge the sentiment of a given headline.

Data Extraction through web scraping

Once you go through the step of categorizing the topics we can then go ahead and use tools like beautifulsoup to data scrape the article or the subject in question. The power of LLMs allows you to do powerful comprehension on the scraped data

def fetch_article_content(url, headline):
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
        page = requests.get(url, headers=headers)
        soup = BeautifulSoup(page.content, 'html.parser')
        article_text = ' '.join([p.get_text() for p in soup.find_all('p')])
        return article_text
    except requests.RequestException:
        print(f"Error fetching article for URL: {url}. Using headline as the content.")
        return headline  # return the headline if there's an error fetching the article

The above function should give us good robust results from the web url.

Diving Deep with Data Analysis

Once our data source is established and filtered, we engage the LLMs as our primary data analysts.

# Extract Keywords

keywords = get_keywords_from_article(article_content)

# Get keyword scores

keyword_scores = [f"{kw}: {check_google_trends(kw)}" for kw in keywords]

def get_keywords_from_article(article):
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=f"We are identifying keywords from the below news article. Extract five keywords from the below article in the format of Word1,word2,word3,word4,word5 \n\narticle:{article}",
        max_tokens=2000
    )
   # Sample text from response.choices[0].text
    text = response.choices[0].text
    text_cleaned = text.replace("\n", "").strip()
    keywords=text_cleaned.split(',')
    return keywords


#fetch trends for the keywords
def check_google_trends(keyword):
    pytrends = TrendReq(hl='en-US', tz=360)
    keyword = keyword.strip()
    kw_list = [keyword]
    try:
        pytrends.build_payload(kw_list)
        data_region = pytrends.interest_by_region()
        mean_region_scores = data_region.mean()
        return mean_region_scores[keyword]
    except Exception as e:
        return 0

Here, the code illustrates how we extract crucial keywords from the articles and cross-reference their popularity using Google Trends.

We are using the mean to get the mean across all states in the US. We then associate each keyword with the score from trends.

This is then going to be used to generate the Ideas.

Translating Analysis into Actionable Insights

One of the unique facets of TrendsGPT is its ability to transform data insights into tangible business ideas or any ideas for that matter.

Here is a trusted way to provide the prompts with the data in tow.

def get_tshirt_ideas(keywords, article_text):
    prompt = f"Using the keyword data which has the trending score next to the keyword: {keywords} and the article text: \"{article_text}\", suggest 2 memeworthy t-shirt ideas and also suggest an appropriate picture to go with it"
    
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=prompt,
        max_tokens=1500
    )
    idea = response.choices[0].text.strip()
    return idea

    # Generate T-Shirt Idea
    tshirt_idea = get_tshirt_ideas(keyword_data_str, article_content)

This segment of code embodies the translation of market research into potential t-shirt design ideas, epitomizing the agents ability to do a full-fledged data analysis and market research based on actual data and then convert it into actionable results.

Build it out and output the result into CSV or as an email using step functions.

Loop through each article and go over the same steps from above.

# Loop through each memeworthy article
for headline, url in zip(memeworthy_headlines, memeworthy_urls):
    # Fetch Article Content
    article_content = fetch_article_content(url, headline)

    # Get Article Summary
    article_summary = get_article_summary(article_content)

    # Extract Keywords
    keywords = get_keywords_from_article(article_content)
    

    # Get keyword scores
    keyword_scores = [f"{kw}: {check_google_trends(kw)}" for kw in keywords]
    keyword_data_str = ', '.join(keyword_scores)
    
    
    # Generate T-Shirt Idea
    tshirt_idea = get_tshirt_ideas(keyword_data_str, article_content)
    #print(tshirt_idea)
    #Append to DataFrame
    df.loc[len(df)] = [headline, url, article_summary, ', '.join(keywords), keyword_data_str, tshirt_idea]


    time.sleep(2)  # Avoid hitting rate limits

# Save to CSV (or Excel if preferred)
df.to_csv('memeworthy_articles.csv', index=False)

The Human Element.

Here is how the output will look for a sample run.

Subreddit : News

Fetching articles from Reddit...
[]
Headline: Matthew Perry, star of ‘Friends,’ dies after apparent drowning, TMZ reports | Sentiment: negative
Headline: Disabled man drags himself off plane after Air Canada fails to offer wheelchair | Sentiment: negative
Headline: Second person to receive experimental pig heart transplant dies nearly six weeks after procedure | CNN | Sentiment: negative
Headline: Suspect at large after 2 active shooter incidents in area of Lewiston, Maine, sheriff says | Sentiment: negative
Headline: White House opens $45 billion in federal funds to developers to covert offices to homes | Sentiment: positive
Headline: Maine shooting suspect Robert Card found dead, officials say | Sentiment: neutral

Only processing positive articles
Processing article: White House opens $45 billion in federal funds to developers to covert offices to homes
Keywords
['Biden', 'housing', 'developers', 'transit', 'funding', '']
Keyword Trend Data
Biden: 0, housing: 3.356, developers: 3.432, transit: 3.124, funding: 0, : 0
: 

T-shirt Idea 1: "Making our cities great again! Biden, housing, developers & transit" with a picture of a city skyline featuring modern residential buildings adjacent to transit hubs. 

T-shirt Idea 2: "Funding solutions for affordable housing" with a picture of a financial advisor calculating the budget for housing development.

Subreddit: NBA

Fetching articles from Reddit...
[]
Headline: Wemby arriving at the arena on Halloween | Sentiment: neutral.
Headline: [Highlight] Steph Curry cooks Dillon Brooks for his 4th straight 3 | Sentiment: positive
Headline: [Wojnarowski] BREAKING: The Philadelphia 76ers have agreed on a trade to send guard James Harden to the Los Angeles Clippers, sources tell ESPN. | Sentiment: neutral
Headline: [Highlight] 🚨 Luka Doncic Are You Kidding Me? 🚨 He Has Hit 4 Straight Clutch Triples To Reach 49 Points. | Sentiment: positive
Headline: Chris Broussard asks if James Harden is “developmentally disabled” | Sentiment: negative
Headline: [Highlight] Andre Drummomd welcomes Chet to the league, by breaking his ankles and gets the dunk | Sentiment: positive

Only processing positive articles

Processing article: [Highlight] Steph Curry cooks Dillon Brooks for his 4th straight 3
Keywords
['Keywords:views', '267', '696', '00', '00']
Keyword Trend Data
Keywords:views: 0.4, 267: 11.436, 696: 5.392, 00: 0, 00: 4.38
. 

T-shirt Idea 1: "Are you seeing 267? 696 views!"
Picture: An image of a pair of eyes in the shape of the numbers 267 and 696. 

T-shirt Idea 2: "Let's hit those double zer00s!"
Picture: An image of two zeroes, side by side.
Processing article: [Highlight] 🚨 Luka Doncic Are You Kidding Me? 🚨 He Has Hit 4 Straight Clutch Triples To Reach 49 Points.
Keywords
['Luka Doncic', 'video', 'watch', 'archived', 'views']
Keyword Trend Data
Luka Doncic: 1.496, video: 0, watch: 8.836, archived: 8.12, views: 6.972
T-SHIRT 1: "LUKA DONCIC ARE YOU KIDDING ME?"  with a picture of surprise in his face

T-SHIRT 2: "198,617 VIEWS"  with a picture of him doing a celebratory dance.

By gathering these starter ideas and using tools like Midjourney and Canva, You can refine the idea and take it to a final product.

Big Steps for Automated agents and their Impact on Data driven analysis.

The fusion of AI and data analysis, as showcased in our TrendsGPT model, represents what could be the future of market research. By integrating practical code implementations into our multi-step methodology, we ensure that our approach is not only theoretically sound but also practically efficient.

In a world where staying ahead of the curve is paramount and differentiates a well-researched product, automated data analysis tools like TrendsGPT, fortified with the right code, serve as invaluable assets to accelerate your time to completion. They don’t just tell us where the market is; they hint at where it’s heading. And in a competitive world, that speed can make all the difference.

Originally published at https://www.linkedin.com.

TrendsGPT: How to set up an AI agent for automated Data Analysis and Market Research

Let’s dive into the results first.

Market Research:

The Need for a Multi-Step Approach

Written by Sre Chakra Yeddula