ChatGPT Ate My Side Project!…

6 min readJul 6, 2023

TLDR: A single “native” ChatGPT prompt replaced the core functionality of a Python application that used mostly NLP based pattern matching techniques/algorithms to annotate sermon transcripts with Bible verses and return time based deep links to the original YouTube video.

Developers have some serious refactoring to do. I’m no different than the majority of programmers today that currently use LLMs to generate, check and debug code but I’m experimenting with going a step further. I‘m curious about “native” prompting which intentionally structures prompts to use the transformer based architecture as a processing engine to solve problems. With a “native” prompt, a LLM user can pass in JSON with the prompt and the return text/answer will be well formed JSON. In other words, the LLM will function like an web service but without the backend code.

The Example Problem

Some people want to hear a preacher teach on a specific Bible verse they are studying or just to hear a different perspective on that verse from a variety of speakers. YouTube search doesn’t index Bible verse mentions inside of videos so if you search “John 3:16”, YouTube will only return videos with that string (or close variations) in the title or the description metadata in the video. There are hundreds of thousands of video sermons on YouTube but not an easy way to query those sermons by specific Bible verse. A practical application for this functionality would be a mobile application that allows all members of a congregation read a Bible plan together that contains links to their Pastor teaching on the Bible verses.

The Pre-GPT Solution

To create a comprehensive index of Bible verses and their corresponding video mentions I wrote a Python script that processes a transcript from YouTube line for line. Each line is compared with a database of valid Bible Books and verses along with different variations of how the verse may be referenced by the speaker. i.e. — “Let’s turn to verse 16 of the 3rd chapter in the Gospel of John” == John 3:16. The script currently accounts for ~30 different variations. It works like a regex on steroids that can find these variations in a wide variety of mentions. Once a hit is encountered the verse is recorded along with a deep link to the time in the YouTube video where it is mentioned. From there a downstream process indexes the verses and allows for efficient searching by verse (initial indexing done in Redis). Here is an example output csv of the data that is indexed.

The Good: The script returns the most important verses, doesn’t miss the obvious and it is fast… by Pythonic standards ;-)

The Bad: Code Size: ~250 lines of code not counting imports of words2num, num2words and pythonbible libraries. And as with any rule-based system, it does not do well with the edge cases.

The Ugly: Regexes, pattern matching, hard coded Bible book names, custom enumerations and other brute force techniques are implemented.

The Hideous: spaCy Matcher or even an earlier LLM probably would have been a much more elegant and better performing solution then implementing this all from scratch.

Post-GPT Solution

After going through a prompt engineering course I got the bright, or crazy, idea to create a well crafted prompt that natively (without generating intermediate code) parses a json transcript and returns a json list of all annotated verses in that transcript with youtube deep link. The prompt was crafted to be: descriptive and detailed, create an expert persona, provide examples and specify constraints.

The Prompt

Sample code with prompt and text-davinci api call

YouTube Video Processed as Example

{input_json}

The Output JSON returned from text-davinci-003 WITHOUT additional post processing

Verses Returned from Both Solutions

text-davinci-003 finds significantly more verses with minimal false positives

A Couple More Output Examples

Excellent annotation that picks ups on a mention of the actual verse in transcript

This is a bad false positive but a “paraphrase” mention_type which can be filtered out.

Also to note: Even though I specified that I only wanted correctly-formed JSON the model did return malformed JSON a couple times. I just caught the exception and continued. More prompt refinement may correct this.

The Awesome: There was no need to augment any logic with patterns or Bible libraries. The only custom code is to json.load() the response from OpenAI to a dictionary to print. The prompt does all the work!

The Good: The code with API call to OpenAI and the templated prompt is about 75 lines and it annotates more verses in a wider variety of cases. For example: When a verse is mentioned and the line says “let’s go back to verse one”, the model knows that it was referring to the previously mentioned book. It can also annotate paraphrases of verses along with direct mentions.

The Bad: It’s slooooow…. You also have to chunk the requests because of the token size limits and each chunk sometimes takes minutes to process. In some rare cases the JSON comes back malformed. In those cases I throw the results away. This could probably be fixed by adding logic to detect and fix the json or strengthen the prompt to prevent this from happening.

Initial Conclusion

While this method is too inefficient and expensive to run at scale (I currently have tens of thousands of sermon transcripts) it was a fun experiment. I like the concept of native prompting in certain situations. One unfair advantage this use case has is that the text-davinci-003 clearly has a wealth of Biblical knowledge and has a comprehensive Biblical index across all mainline versions. In cases where false positives are not critical and mildly inconsistent results are acceptable this could be useful. Trying to get a prompt to do all the work also helps in discovering the limits of LLMs. Because of all the options available in prompting, especially the ability to define your own markup/template, I plan to continue to attempt to formulate single prompts to solve complete programming tasks. Shoot me a message if you thought this was awesome or an extreme waste of time or… both. Till next time ;-)