GenAI armchair quarterbacks don’t know anything

Gib Bassett
4 min readMay 9, 2024

--

The title of this post may come off as a bit harsh, but let me explain. Over my 20-plus years in the analytics space, nothing has birthed a swarm of so-called experts like generative AI. While there are exceptions, like Sam Altman from OpenAI, many proclaim expertise without genuine experience.

From a marketer’s perspective — yes, that’s my day job — I started exploring ChatGPT in early 2023. It quickly became an intriguing tool, complementing my work at Alteryx by generating use case ideas and even attempting to edit XML code for Alteryx project files to reflect new use cases, showcasing its potential to transform work processes instantly.

Driven by curiosity about how it all works, I delved deeper. Despite hearing about pilot projects, few generative AI initiatives seem to advance beyond the trial phase. Meanwhile, pundits galore, who have never managed such projects, offer advice on deploying generative AI solutions that leverage Large Language Models (LLMs). The discourse is often high-level and oversimplifies specifics. The reality I found is that this domain — today — truly belongs to data science experts and others who understand the particulars of LLMs and are able to direct initial forays with the least risk.

Say what you will from your seat, but I would not stand behind a production system before fully understanding how it works, the probable point of failure, plus what must be watched closely for tuning.

So, I embarked on creating a chatbot. Remember, I’m not a technical solutions engineer or a data scientist. My background is in marketing, understanding and leveraging the best practices of analytics leaders to help everyone achieve outstanding results.

What I discovered first was intuitive: chatbots and LLMs are distinct. Chatbots have been around long before ChatGPT, and while their experiences might have been subpar, they could recognize natural language inputs and provide contextually appropriate responses — a foundation many companies had already established.

Google’s DialogflowCX, a chatbot platform that interoperates with other cloud services like BigQuery for database management and Google’s Webhook function for coding interactions among the Google services and a chat “agent”, demonstrated how relatively easy it is to set up the framework for a chatbot.

Using these tools, I constructed the chatbot with the help of ChatGPT assisting in the coding, before direct code chat was supported by ChatGPT. It actually worked but took some trial and error. My chatbot interface was simply an HTML page. Setting up “intents” that define the questions the chatbot should respond to was straightforward.

However, the system struggled to interpret a wider range of questions about the underlying data, which included actors’ profiles, filmographies, and backgrounds — data I had sourced from the web. This led me to integrate an LLM via a Llama2 API license from Meta, to enhance the chatbot’s ability to provide detailed information about actors when asked by users in a variety of ways.

The concept of “prompt engineering” became crucial here. It’s not immediately obvious how to frame questions to an LLM to elicit the expected responses. The variations seemed endless, and the responses varied significantly, often inaccurately. This was hallucination in its most raw form. Questions that arose included: what’s the balance in answering based on my data, versus data the LLM was trained on and aware of? Good questions, right?

How did I do it? Using ChatGPT to direct my Python code executed on my Windows PC in the Console, I submitted various prompts (questions) to Llama2 and got back immediate answers. This was how I could land on the code needed to insert into my Google Webhook.

Results were disappointing until I discovered that Meta offers a separate chatbot API to structure these interactions more accurately. Much like DialogflowCX, the chatbot intent logic is separate from the LLM, but the scope of the chatbot’s “intent settings” was too limited to achieve the desired accuracy. Here, there was less risk of hallucination.

Ideally, I would have integrated this enhanced capability into my DialogflowCX chatbot. A user query would have been processed through the Llama2 API, retrieving information and presenting a natural language response to the user. While the cost was minimal during testing, the financial implications for a high-volume production system accessing the API are worth considering.

Are there more roads to generative AI success than this? Of course, and they are improving every day. By sharing this experience, I wanted to highlight the absurdity of people laying out prescriptions but who have no clue how these solutions develop. There are pitfalls aplenty and change is a constant in this space. No doubt in my mind that successes will be far fewer than failures for many.

The takeaway here is you can’t oversimplify what appears to be so simple at first glance and apply it to your business or situation. You must become a student of this stuff lest you make mistakes and look foolish.

--

--