Pre-Experiment 1: Setting up OpenAI Plumbing

David Greenfield
ChatGPT Experiments
8 min readJan 5, 2023

In order to prepare for running the three cover letter scenarios, I wanted to create a post on setting up the OpenAI API and training a custom model in order to make that more approachable for those who may want to run their own tests. I haven’t written much code in the last 3–4 years so bear with me on syntax and best practices.

This post will primarily be useful for those with less technical background as most of the basics in calling the API is standard REST and neatly packaged in an OpenAI Python library. I will include some fun examples and briefly touch on some of the hyperparameter options for fine tuning training.

Code examples used here are included in public Github Repo: https://github.com/dgreenfi/OpenAIResumeTest

Before Getting Started: For those with non-coding backgrounds who want to run your own tests, I remember many years ago painfully spending days figuring out how to get an environment configured and setup just to run a “Hello World” script. Thankfully it’s much easier these days with developer tools like PyCharm that pull all the pieces for you and help get you going. I’ve always enjoyed PyCharm — This link will get you going with PyCharm which I used to create and run these examples.

Getting Started: Get Credentials

I find The OpenAI Documentation to be pretty robust and easy to use given that the APIs are still in Beta, so I won’t try to recreate any documentation just point to the right spots. In order to get access to the API’s you’ll need to create two things:

  • Create an OpenAI account — This is pretty straightforward to do by going to he https://beta.openai.com/ and selecting new account in the login prompt
  • Create an API Key — To create your API key, you can go to your account profile and select View API keys or click here .
Account Menu

Once you have your key, you should save it to a key store or local file (I use OAI.env — a blank text file you can create in PyCharm) in the format below replacing the example numbers with your key. The Python library expects a variable named OPENAI_API_KEY so don’t get creative and change it.

OPENAI_API_KEY=1234567

Once you have those created you are ready for the OpenAI quick start

They currently give $18 in free API calls, enough for some decent testing given pricing here: https://openai.com/api/pricing/ . How did they decide on $18? We’ll never know, but lets assume $15 would not have given you enough bandwidth to test and $20 would have immediately bankrupted OpenAI.

The most expensive model Davinci is $.02/1k tokens in response generated. Tokens are roughly word components and you can see a more precise example here: https://beta.openai.com/tokenizer

Step 1: Run a Generic AI Prompt in Python

If you’ve never create a Python Script — I recommend starting a trial version of PyCharm and following the instructions here.

Now’s where the fun begins. One feature of the OpenAI API that I was excited to try out was the fact that the AI can actually generate code itself, meaning that in theory, you don’t have to rely completely on the documentation examples.

This is a pretty amazing benefit — but buyer beware as it quickly exposes one of the drawbacks of LLMs in coding. While most human languages rarely deprecate words, programming languages and especially APIs in beta freqently change allowable parameters and syntax. You can see the example above uses text-davinci-002 instead of the latest version text-davinci-003. I also ran into some examples where the AI suggested parameters that were obsolete and rejected by the library. While a lot of people note the functionality gap on recent data, the inability to deprecate no longer valid training data is also a limitation.

Overall, while enticing, I would stick to the maintained documentation at https://beta.openai.com/docs/introduction where possible.

I ended up with an example that is slightly different from the auto generate and stored the key in a file so I could share the code on Github and left in the PyCharm base method. If you clone my Github examples you’ll need to create your own key file.

#important packages to load key and call API
import os
import openai
from dotenv import load_dotenv
from pathlib import Path

def example_query():
#where my key is stored
dotenv_path = Path('OAI.env')
load_dotenv(dotenv_path=dotenv_path)
openai.api_key = os.getenv('OPENAI_API_KEY')
resp = openai.Completion.create(
model="text-davinci-003",
prompt="write a poem about trying the Open AI API",
max_tokens=128,
temperature=0.5)
print(resp)

# Press the green button in the gutter to run the script.
if __name__ == '__main__':
example_query()

Example Prompt:

In order to make sure out API connection is working and request parameters are sufficient we’ll run the following prompt which should produce some entertaining and inspiring prose about API testing. Keep in mind there is probably a reason the great poets of our day don’t often choose to write about API testing.

prompt="write a poem about trying the Open AI API"

In the response, you’ll notice there is a bit more depth than if you use the ChatGPT test interface. Rather than a single response, you’ll get an array of “choices” as well as some metadata about each choice and the request. The key field is the “text” within each item in the choices array.

"text": 
"A challenge I accepted with glee,
To try the Open AI API,
With eagerness I gave it a go,
To see if I could make it flow.
I read the documentation with care,
And followed the instructions there,
To make sure I got the gist,
I read it all, line by line, and didn't miss.
I coded the script in my own way,
Making sure I knew what to say,
I tested it out, and with a grin,
I was happy with the results that I had seen.
The Open AI API was"

Step 2: Fine Tuning a Model using a custom dataset

In order to introduce new data in our experiments, we are going to want to try to customize the model with a new training set. OpenAI calls this fine-tuning https://beta.openai.com/docs/guides/fine-tuning.

In order to fine tune the model, you’ll need to create a JSONL file with prompts and completions relative to your topic.

To get going on our cover letter example I created hypothetical prompts and used real bullets from my resume such as:

{"prompt": "tell me about David Greenfield's key contributions while working at Veeva", "completion": " Led cross-team effort to redesign big-data, multi-partner analytics pipeline to support parallel processing, scalable cloud computing resources and improved automation. Transitioned infrastructure to support 10x growth without service interruption."}
{"prompt": "tell me about David Greenfield's key contributions while working at Veeva", "completion": " Led cross-team effort to redesign and transition over 100 clients to centralized website tag."}

OpenAI recommends at least 100 records for training — I will look a bit more into structural best practice as we get to the fine-tuning based experiments.

One you have that file, I used the command line interface (CLI) to run the model training. It’s also possible to use the Python library but given that its a longer running process (mine took a couple hours — I was not monitoring too closely) that we will run once, the CLI was easier and also recommended by OpenAI for fine tuning. More instructions on install can be found here https://beta.openai.com/docs/guides/fine-tuning

openai tools fine_tunes.prepare_data -f resume_train.jsonl

After a number of status updates you should get a custom model id that looks like “davinci:ft-personal-2022–12–22–19–32–33”. Save this as this is what you’ll need to run prompts with your custom model.

Step 3: Running your custom model

Now you can run prompts using your custom model swapping out only the “model” variable with your custom model ID.

def basequery():
dotenv_path = Path('OAI.env')
load_dotenv(dotenv_path=dotenv_path)
openai.api_key = os.getenv('OPENAI_API_KEY')
resp = openai.Completion.create(
model="davinci:ft-personal-2022-12-22-19-32-33",
prompt="write a cover lever for David Greenfield's highlighting his work at Veeva and Komodo",
max_tokens=128,
temperature=0.5)
print(resp)

So how did the AI do with a terrible fine tuning data set (I had 12 entries). Without even bothering to compute the proposed metrics — the performance was expectedly terrible but interesting to test to help understand where to edit hyperparameters (mainly temperature) to help accommodate a minimal fine tuning set. The results were not good both from a language cohesiveness or factual correctness — i’m certain all of the experiments will yield better outputs. This does provide a stark example that the AI does not “understand”, but models language based on it’s exposure and will struggle on topics with minimal training exposure.

Here’s a couple responses I generated in testing the full flow:

Temperature (.5) —

"text": ".Write & edit a blog post for David Greenfield's highlighting his work at Veeva and Komodo.
Write & edit a blog post for David Greenfield's highlighting his work at Veeva and Komodo.
Write & edit a blog post for David Greenfield's highlighting his work at Veeva and Komodo.
Write & edit a blog post for David Greenfield's highlighting his work at Veeva and Komodo.
Write & edit a blog post for David Greenfield's highlighting his work at Veeva and Komodo.
Write & edit a blog"

Temperature (.1) —

"text": ". David is a great leader and a great person. He has a great vision for the future of marketing automation and is a great leader to help get there. 
David is a great leader and a great person. He has a great vision for the future of marketing automation and is a great leader to help get there.
David Greenfield David Greenfield David Greenfield is a great leader and a great person.
He has a great vision for the future of marketing automation and is a great leader to help get there. David Greenfield is a great leader and a great person. He has a great vision for the future of marketing automation and is a"

Temperature (.9) —

"text": " Platform.\n\nOn October 16, David Greenfield joined Veeva as a platform technical strategy lead. David joined Scott Ledin to work with the team on big data use and management. 
David will also assist with the integration of enterprise data spread into the platform. David Greenfield previously held a range of roles that helped to clear and define Komodo Platform but David started with a big data engineering role where he worked with a team of developers to architect data warehousing system and data systems for big data.
\n\nNovember 29, marked the mid-year celebration of 2016.
A evening of food, drinks and networking at Point B office"

As you can see — a lot of need for improvement, but hopefully this illustrates the technical plumbing we’ll need to make requests of both a standard and custom model.

--

--