From Transcripts to AI Chat: An Experiment with the Lex Fridman Podcast

QLoRA fine-tuning, OpenLLaMa, podcast transcripts

26 min readJul 22, 2023

tldr; I fine-tuned OpenLLaMa 7B using QLoRA on transcripts of the Lex Fridman podcast to generate a fictional conversation of Lex with Barack Obama. This is not a model ready for use but rather a valuable step-by-step explanation illustrating the process, driven by the desire to learn more about training large language models at home.

As I continue to deepen my understanding Large Language Models (LLMs), I’ve sought to push beyond my comfort zone, transitioning from merely using existing models for inference to training a model myself. While considering possible applications, the idea of training a virtual representation of Lex Fridman came to my mind.

Lex Fridman is a podcast host. What began as the “AI Podcast” has now evolved into the “Lex Fridman Podcast,” a platform where he engages with guests who are experts on a multitude of subjects. His interviews go on for hours, he is an insightful inquirer, posing thought-provoking and also painful questions, sometimes leading to pretty emotional conversations. To illustrate ..

Episode 267 with Mark Zuckerberg:

“There’s another question I’d like to ask you on this. I was looking at various polls and saw that you’re one of the most disliked tech leaders today. 54% unfavorable rating. Elon Musk is 23%. It’s basically everybody has a very high unfavorable rating that are tech leaders.
Maybe you can help me understand that. Why do you think so many people dislike you? Some even hate you. And how do you regain their trust and support?
Given everything you’ve just said, why are you losing the battle in explaining to people what actual impact social media has on society?”

Episode 332 with Kanye West:

“Is there somebody in your life close to you that you trust enough to call you out on your bullshit? We’re all full of shit sometimes.”

Often I like the questions more than the answers and wondered what would happen if I train an LLM on the podcast transcripts. So I did exactly that and share the process and my findings here.

Before walking through the steps of training the model, here is the final result, an interview that only happened inside my GPU.

The trained model was prompted with the following sentence:

Lex: This is my conversation with Barack Obama, an American politician and attorney who served as the 44th President of the United States from 2009 to 2017. Known for his eloquent oratory and progressive policies, he made history as the first African American to hold the presidency. During his two-term tenure, Obama championed healthcare reform with the Affordable Care Act and led initiatives on climate change, notably signing the Paris Agreement. His presidency, characterized by a focus on inclusivity and diplomacy, has been influential in shaping contemporary American politics.
Mr. Obama, welcome and thank you for being here. Let me start by asking a provocative question: Why ..

The rest of the conversation below was entirely generated by the model:

.. do you think Donald Trump won the 2016 presidential election?
Obama: I’m going to give you a very honest answer that may not be politically correct because it’s important for people to understand why we ended up where we are today. The reason that Mr. Trump was able to win despite his manifest unfitness for office was that there were millions of Americans who voted for him while knowing that he wasn’t fit for office. They did so either because they didn’t care about what he said or what he tweeted or whether he would tell lies every day, or they genuinely believed that he would somehow make America great again. And those people felt left behind economically. They had lost their jobs in part because of automation and globalization and didn’t feel like anybody was listening to them. And they saw this billionaire businessman who was gonna shake things up and do something different. Now, having said all that, if I had to guess what might have happened differently, If somebody like Jeb Bush or Marco Rubio or John Kasich had gotten elected instead of Donald Trump, even though they weren’t exactly inspiring, at least they seemed more likely than Mr. Trump to surround themselves with competent people and actually govern effectively.
Lex: But isn’t that still an issue of character?
Obama: No doubt. It’s an issue of character. Character matters. That’s why I wrote two books about it. So yes, character mattered. But these other candidates also didn’t go around saying whatever pops into their head and then doubling down when they get called out on stuff. You know, there’s a real difference between making mistakes and changing your mind versus just lying about stuff over and over and then denying when confronted with evidence that you’ve lied. There’s a big difference between someone who says mean things about immigrants but then goes to visit a detention center and listens to the conditions inside and ends up feeling bad about himself afterwards, versus someone whose whole worldview is based on hate.
Lex: What do you think is the biggest mistake that Hillary Clinton made during her campaign against Donald Trump?
Obama: Well, there were several mistakes, but one of them was that she thought she could take him seriously as a candidate and therefore needed to treat him as a serious opponent rather than the joke that he was. And so she adopted some of his positions herself. She went after his core supporters, she talked about building walls, she gave a speech in which she basically said that she was going to put coal miners out of work. So it’s not surprising that those people decided to vote for him instead of her. If you want to beat somebody like that, you shouldn’t try to outdo them. Just simply say, “I’m better than you.” Beat them soundly.
Lex: Okay, let’s talk briefly about Russia and Vladimir Putin. He’s arguably the most powerful man in the world. In your new book, A Promised Land, you write that you believe Putin wanted to undermine confidence in our democracy. To what extent does his actions contribute to the sense of distrust that exists among Americans?
Obama: Well, I don’t think that his actions alone explain it. But they certainly didn’t help. Because even before the Russian interference, we already had a pretty strong anti-Russian bias. We’d just finished eight years under a president who spent most of his time obsessing about the evils of Russia and trying to build a wall between us and Mexico. So when we got hit with this barrage of Russian propaganda during the campaign, it probably reinforced people’s existing views.
Lex: Is there anything positive that can come from the fact that so much of our government officials now believe Vladimir Putin? Does it make us stronger? Does it make us weaker?
Obama: I think it makes us weaker because it means that we’re giving way too much credence to a foreign leader.

If this has sparked you interest, read on to find out more about how I obtained and formatted the data and fine-tuned OpenLLaMa 7B using QLoRA.

A step-by-step breakdown of the process

Transcription of podcast episodes
Data preparation
Choosing a base model
Fine-tuning test-run: Reproducing Guanaco
Fine-tuning
The resulting model
Thoughts on how to improve
Key points

Transcription

Once I settled for the idea of training a model on this podcast, I looked for transcripts. There are plenty (here, here, and here), however, the transcripts were either not diarised (one flow of text without distinction of who said what) or low quality. I therefore decided to transcribe the episodes again using WhisperX. The pain of going through that process would deserve an article on its own but here is the polished and final version.

WhisperX outputs VTT files, for example:

00:59.954 --> 01:05.177
[SPEAKER_01]: And now, dear friends, here's John Carmack.

01:05.177 --> 01:07.819
[SPEAKER_01]: What was the first program you've ever written?

01:07.819 --> 01:08.239
[SPEAKER_01]: Do you remember?

01:09.168 --> 01:09.668
[SPEAKER_00]: Yeah, I do.

01:09.668 --> 01:20.593
[SPEAKER_00]: So I remember being in a radio shack, going up to the TRS-80 computers, and learning just enough to be able to do 10 print John Carmack.

The exact WhisperX parameters I used were the following:

whisperx --model_dir models_whisper --model large-v2 --hf_token TOKEN --diarize --batch_size 32 audio_file.mp3 --min_speakers 2 --max_speakers 2 --output_format vtt

Assuming the first speaker is always Lex, the second is usually the host, I transformed this into a more processing-friendly JSON file:

{
  "filename": "Lex_Fridman_Podcast_-__309__John_Carmack_Doom_Quake_VR_AGI_Programming_Video_Games_and_Rockets.vtt",
  "conversation": [
    {
      "from": "Lex",
      "text": "The following is a conversation with John Carmack, widely considered to be one of the greatest programmers ever. He was the co-founder of id Software and the lead programmer on several games that revolutionized the technology, the experience, and the role of gaming in our society, including Commander Keen, Wolfenstein 3D, Doom, and Quake. He spent many years as the CTO of Oculus VR, helping to create portals into virtual worlds and to define the technological path to the metaverse and meta. And now he has been shifting some of his attention to the problem of artificial general intelligence. This was the longest conversation on this podcast at over five hours. And still, I could talk to John many, many more times. And we hope to do just that. This is the Lex Friedman podcast. To support it, please check out our sponsors in the description. And now, dear friends, here's John Carmack. What was the first program you've ever written? Do you remember?"
    },
    {
      "from": "Guest",
      "text": "Yeah, I do. So I remember being in a radio shack, going up to the TRS-80 computers, and learning just enough to be able to do 10 print John Carmack. And it's kind of interesting how, of course, you know, Carnegie and Ritchie kind of standardized Hello World as the first thing that you do in every computer programming language and every computer. But not having any interaction with the cultures of Unix or any other standardized things, it was just like, well, what am I going to say? I'm going to say my name. And then you learn how to do GOTO 10 and have it scroll all off the screen. And that was definitely the first thing that I wound up doing on a computer."
    },
...

Make sure to use this patch by TheNiggler when using WhisperX on large audio files to speed up the process.

This led to 388 JSON files, one for each episode, in total 52 MB of raw text. I removed several episodes by looking at the ratio of text characters assigned to Lex versus the guest. A ratio >1 (Lex talks more than the guest) is suspicious of messed up diarisation and were removed (24 episodes).

Data preparation

A few weeks ago the QLoRA paper was released. To showcase their new method the authors fine-tuned various LLaMA models up to 65B on the best ranked chat conversations in the Open Assistant 1 dataset. The resulting chatbot, called Guanaco, ranks (very) high on the Open LLM Leaderboard at the time of writing.

What was surprising to me is how simple their training data was structured, it is basically fine-tuning on raw text. This is what the first row of the Guanaco training dataset looks like (character prefixes bold for readability):

### Human: Can you write a short introduction about the relevance of the term “monopsony” in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.### Assistant: “Monopsony” refers to a market structure where there is only one buyer for a particular good or service. In economics, this term is particularly relevant in the labor market, where a monopsony employer has significant power over the wages and working conditions of their employees. The presence of a monopsony can result in lower wages and reduced employment opportunities for workers, as the employer has little incentive to increase wages or provide better working conditions. Recent research has identified potential monopsonies in industries such as retail and fast food, where a few large companies control a significant portion of the market (Bivens & Mishel, 2013). In these industries, workers often face low wages, limited benefits, and reduced bargaining power, leading to a situation where they are dependent on the employer for their livelihood. This dependence can result in further suppression of wages and a decline in working conditions. Overall, the concept of monopsony is essential to understanding the dynamics of labor markets and the impact of market power on workers. Further research is needed to understand the extent and impact of monopsonies on the economy and to develop policies to address this issue. References: Bivens, J., & Mishel, L. (2013). The Pay of Corporate Executives and Financial Professionals as Evidence of Rents in Top 1 Percent Incomes. Journal of Economic Perspectives, 27(3), 57–78.### Human: Now explain it to a dog”

Several things are noticeable here:

It is not separated into input/output, meaning the model does not train to reply a certain answer depending on a question but it also learns the question.
The sample shown not only contains one question and answer but a second question (or instruction) without any answer.

Since input with as little structure as the above generates a good chatbot, I decided to format my data exactly in the same way. I kept the turn format of Guanaco: Human inputs are prefixed with ### Human, the reply or question formulated by Lex as a reply is prefixed with ### Assistant. Additionally, I close each reply with the EOS token </s> to teach the model how to signal the end of a response.

Here is an example of how the transcript data formatted in this way looks like. It is part of the conversation of episode 309 with John Carmack,
### Human is John Carmack, ### Assistant is Lex Fridman.

### Human: It would really be with Diet Coke. There still is that sense of, you know, drop the can down, crack open the can of Diet Coke. All right, now I mean business. We’re getting to work here.</s>
### Assistant: Still, to this day, Diet Coke is still part of it.</s>
### Human: Yeah, probably eight or nine a day.</s>
### Assistant: Nice. Okay, what about your setup? How many screens? What kind of keyboard? Is there something interesting? What kind of IDE, Emacs, Vim, or something modern? Linux, what operating system, laptop, or any interesting thing that brings you joy?</s>
### Human: So I kind of migrated cultures, where early on through all of game dev, there was sort of one culture there, which was really quite distinct from the Silicon Valley venture culture for things. They’re different groups, and they have pretty different mores in the way they think about things. And I still do think a lot of the big companies can learn things from the hardcore game development side of things, where it still boggles my mind how How hostile to debuggers and ideas that so much of them the kind of big money get billions of dollars silicon valley venture backed funds are all this interesting so you’re saying like like a big companies a google matter are hostile to. They are not big on debuggers and IDEs, like so much of it is like Emacs, Vim for things. And we just assume that debuggers don’t work most of the time for the systems. And a lot of this comes from a sort of Linux bias on a lot of things where I did come up through the personal computers and then the DOS, and then I am, you know, Windows and it was Borland tools and then Visual Studio.</s>

QLoRA requires the transcript data formatted in the way above to be packed into a JSONL file containing one JSON object per line. The required format in the case of Guanaco looks like this:

{"text": "### Human: Question?###Assistant: Answer."}
{"text": "### Human: Question?###Assistant: Answer."}
{"text": "### Human: Question?###Assistant: Answer."}

To respect the context size, I packed as many question-answer pairs into a single JSON object until 2048 tokens are reached for the value of the text key. If a single episode does not fit into 2048 tokens (and it never does), I split it into several JSON objects. A single JSON object however must not contain data from multiple conversations to avoid confusing the LLM by suddenly changing the topic.

Here is an illustration of the process described above:

Base model

When I started this endeavour, OpenLLaMA 7B and 13B just finished training on 1 trillion tokens. This base model is intended as a reproduction of the Facebook LLaMa models for unrestricted use. To shorten training time I went for the smaller 7B model (7 billion parameters).

The recently released Falcon models with 7 and 40 billion parameters would have been a good alternative since they also come with unrestricted licenses. Inference of the Falcon models however was very slow in my hands so I went for what worked well for me at the time.

Fine-tuning test run

I started off fine-tuning by using the script provided in the QLoRA repository for fine-tuning LLaMa 7B on the Open Assistant data to reproduce Guanaco, except that I used OpenLLaMa 7B and two GPUs.

export WANDB_PROJECT=openllama-7b-OASST

python qlora.py \
    --model_name_or_path models/open_llama_7b \
    --output_dir ./output/guanaco-7b \
    --logging_steps 10 \
    --save_strategy steps \
    --data_seed 42 \
    --save_steps 500 \
    --save_total_limit 40 \
    --evaluation_strategy steps \
    --eval_dataset_size 1000 \
    --per_device_eval_batch_size 16 \
    --max_new_tokens 32 \
    --dataloader_num_workers 3 \
    --group_by_length \
    --logging_strategy steps \
    --remove_unused_columns False \
    --do_train \
    --do_eval \
    --do_mmlu_eval False \
    --lora_r 64 \
    --lora_alpha 16 \
    --lora_modules all \
    --double_quant \
    --quant_type nf4 \
    --bf16 \
    --bits 4 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type constant \
    --gradient_checkpointing \
    --dataset data/openassistant-guanaco/openassistant_best_replies_train.jsonl  \
    --dataset_format oasst1 \
    --source_max_len 16 \
    --target_max_len 512 \
    --per_device_train_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --max_steps 1875 \
    --eval_steps 180 \
    --learning_rate 0.0002 \
    --adam_beta2 0.999 \
    --max_grad_norm 0.3 \
    --lora_dropout 0.1 \
    --weight_decay 0.0 \
    --seed 0 \
    --report_to wandb

Training on 2x 3090 GPUs I noticed the obvious: increasing the per_device_train_batch_size makes things faster. According to the authors of the repository, per_device_train_batch_size * gradient_accumulation_steps should always be 16 (qlora/Readme.md: “Make sure to adjust `per_device_train_batch_size` and `gradient_accumulation_steps` so that their product is 16 and training fits on your GPUs.”). This is why i set per_device_train_batch_size to the maximum possible value of 16 and gradient_accumulation_steps to 1.

Only afterwards I found that the batch size influences model quality where higher batch sizes speed up the training process and lower batch sizes make the process slower but sometimes lead to better generalisation and therefore higher quality of the trained model.

The model was fine-tuned for 1875 steps, corresponding to ~3.8 epochs which took 4.5 hours. Here is the Weights & Biases report of this fine-tuning run:

I did a quick sanity check of whether this model is alive or something went terribly wrong.

Proud Guanaco. Interface: text-generation-webui

Works!

Things to notice in the data above:

Training loss is not a smooth curve but rather spiky. After doing a bit of reading, this seems to be caused by the parameter group_by_length which is passed to qlora.py. According to the documentation, setting this flag leads to grouping of “sequences into batches with same length. Saves memory and speeds up training considerably.” Technically this makes sense, however, grouping totally unrelated question/answer pairs into the same context makes no sense to me and apparently is the reason for the spikes in training loss.
Evaluation loss goes up instead of down. Evaluation (and training) was done on answer and question. Even with good generalization, training on some questions will not predict the other questions being asked in the evaluation set. This loss behaviour is therefore expected and has been observed by other people too (one, two) but so far the QLoRA repository maintainers have not commented on this.
dataset_format is set to oasst1. This basically tells qlora.py to train on the entire text field as specified in the input JSON file, corresponding to training on raw text.
target_max_len is set to 512 tokens meaning that only a maximum of this number of tokens are used for training. If the strings (or corresponding tokens) exceed 512 tokens, they will be ignored during training.
source_max_len is 16. This might be confusing. However, as described in the previous bullet point, we are training only on raw text, there is no input or source, the model will train on both answer and question. See the source code part of qlora.py where this is specified.

Fine-tuning

With the lessons learned above I used the following parameters to fine-tune OpenLLaMA 7B on the podcast transcripts.

export WANDB_PROJECT=open_llama_7b_lexfpodcast

python qlora.py \
    --model_name_or_path /home/g/models/open_llama_7b \
    --output_dir ./output/open_llama_7b_lexfpodcast \
    --logging_steps 10 \
    --save_strategy steps \
    --data_seed 42 \
    --save_steps 100 \
    --save_total_limit 40 \
    --evaluation_strategy steps \
    --eval_dataset_size 1000 \
    --per_device_eval_batch_size 2 \
    --max_new_tokens 32 \
    --dataloader_num_workers 3 \
    --logging_strategy steps \
    --remove_unused_columns False \
    --do_train \
    --do_eval \
    --do_mmlu_eval False \
    --lora_r 64 \
    --lora_alpha 16 \
    --lora_modules all \
    --double_quant \
    --quant_type nf4 \
    --bf16 \
    --bits 4 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type constant \
    --gradient_checkpointing \
    --dataset data_v0.3.jsonl  \
    --dataset_format oasst1 \
    --source_max_len 16 \
    --target_max_len 2048 \
    --group_by_length False \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --max_steps 2000 \
    --eval_steps 100 \
    --learning_rate 0.0002 \
    --adam_beta2 0.999 \
    --max_grad_norm 0.3 \
    --lora_dropout 0.1 \
    --weight_decay 0.0 \
    --seed 0 \
    --report_to wandb \
    --max_memory_MB 23000

The model was fine-tuned for 2000 steps, corresponding to ~4.4 epochs, each epoch took around 4 hours for a total training time of ~18 hours.

Here is the Weights & Biases report:

The model was trained on a relatively simple desktop running Ubuntu and 2x RTX 3090, for people interested in multi-GPU homebrew training, this is how my GPUs handled the task:

Not sure what the dip in the first half corresponds to, maybe (for whatever reason) I closed the windows after 10 hours

Model behaviour

Prompting the resulting model with simple questions is not a lot of fun, it is incredibly nosy and keeps on asking questions, digging deeper and deeper, without any reaction to my answers. So using it like a simple chatbot is not very entertaining. I was curious so I tried, but to be fair, this is not how it was trained.

Confronting the model with adequate prompts such as the beginning of a podcast episode yielded better results.

The following data has been generated using checkpoint 1000, ie. after 1k training steps, corresponding to 3.3 epochs and a training loss of 1.8 (see figures above). People usually train for 3–5 epochs and shoot for a training loss of ~1.5 but in general there seems to be no gold standard on for how long you train.

This is the prompt I used to start a conversation:

### Assistant: This is my conversation with Barack Obama, an American politician and attorney who served as the 44th President of the United States from 2009 to 2017. Known for his eloquent oratory and progressive policies, he made history as the first African American to hold the presidency. During his two-term tenure, Obama championed healthcare reform with the Affordable Care Act and led initiatives on climate change, notably signing the Paris Agreement. His presidency, characterized by a focus on inclusivity and diplomacy, has been influential in shaping contemporary American politics.
Mr. Obama, welcome and thank you for being here. Let me start by asking a provocative question: Why

And this is what the model generated. Please note that both, assistant and human responses (in this case virtual Lex and virtual Obama) have been generated by the model (special token </s> was excluded).

Virtual podcast episode with Barack Obama, take 1

### Assistant: This is my conversation with Barack Obama, an American politician and attorney who served as the 44th President of the United States from 2009 to 2017. Known for his eloquent oratory and progressive policies, he made history as the first African American to hold the presidency. During his two-term tenure, Obama championed healthcare reform with the Affordable Care Act and led initiatives on climate change, notably signing the Paris Agreement. His presidency, characterized by a focus on inclusivity and diplomacy, has been influential in shaping contemporary American politics.
Mr. Obama, welcome and thank you for being here. Let me start by asking a provocative question: Why do you think Donald Trump won the 2016 presidential election?
### Human: I’m going to give you a very honest answer that may not be politically correct because it’s important for people to understand why we ended up where we are today. The reason that Mr. Trump was able to win despite his manifest unfitness for office was that there were millions of Americans who voted for him while knowing that he wasn’t fit for office. They did so either because they didn’t care about what he said or what he tweeted or whether he would tell lies every day, or they genuinely believed that he would somehow make America great again. And those people felt left behind economically. They had lost their jobs in part because of automation and globalization and didn’t feel like anybody was listening to them. And they saw this billionaire businessman who was gonna shake things up and do something different. Now, having said all that, if I had to guess what might have happened differently, If somebody like Jeb Bush or Marco Rubio or John Kasich had gotten elected instead of Donald Trump, even though they weren’t exactly inspiring, at least they seemed more likely than Mr. Trump to surround themselves with competent people and actually govern effectively.
### Assistant: But isn’t that still an issue of character?
### Human: No doubt. It’s an issue of character. Character matters. That’s why I wrote two books about it. So yes, character mattered. But these other candidates also didn’t go around saying whatever pops into their head and then doubling down when they get called out on stuff. You know, there’s a real difference between making mistakes and changing your mind versus just lying about stuff over and over and then denying when confronted with evidence that you’ve lied. There’s a big difference between someone who says mean things about immigrants but then goes to visit a detention center and listens to the conditions inside and ends up feeling bad about himself afterwards, versus someone whose whole worldview is based on hate.
### Assistant: What do you think is the biggest mistake that Hillary Clinton made during her campaign against Donald Trump?
### Human: Well, there were several mistakes, but one of them was that she thought she could take him seriously as a candidate and therefore needed to treat him as a serious opponent rather than the joke that he was. And so she adopted some of his positions herself. She went after his core supporters, she talked about building walls, she gave a speech in which she basically said that she was going to put coal miners out of work. So it’s not surprising that those people decided to vote for him instead of her. If you want to beat somebody like that, you shouldn’t try to outdo them. Just simply say, “I’m better than you.” Beat them soundly.
### Assistant: Okay, let’s talk briefly about Russia and Vladimir Putin. He’s arguably the most powerful man in the world. In your new book, A Promised Land, you write that you believe Putin wanted to undermine confidence in our democracy. To what extent does his actions contribute to the sense of distrust that exists among Americans?
### Human: Well, I don’t think that his actions alone explain it. But they certainly didn’t help. Because even before the Russian interference, we already had a pretty strong anti-Russian bias. We’d just finished eight years under a president who spent most of his time obsessing about the evils of Russia and trying to build a wall between us and Mexico. So when we got hit with this barrage of Russian propaganda during the campaign, it probably reinforced people’s existing views.
### Assistant: Is there anything positive that can come from the fact that so much of our government officials now believe Vladimir Putin? Does it make us stronger? Does it make us weaker?
### Human: I think it makes us weaker because it means that we’re giving way too much credence to a foreign leader.

Virtual podcast episode with Barack Obama, take 2

### Assistant: This is my conversation with Barack Obama, an American politician and attorney who served as the 44th President of the United States from 2009 to 2017. Known for his eloquent oratory and progressive policies, he made history as the first African American to hold the presidency. During his two-term tenure, Obama championed healthcare reform with the Affordable Care Act and led initiatives on climate change, notably signing the Paris Agreement. His presidency, characterized by a focus on inclusivity and diplomacy, has been influential in shaping contemporary American politics.
Mr. Obama, welcome and thank you for being here. Let me start by asking a provocative question: Why do you think that Donald Trump was elected president? I don’t mean politically speaking, but why do you think he won?
### Human: Well, there are many reasons. And some of them had nothing to do with him, right? So it’s worth understanding how this happened. I spent eight years trying to get people to vote. We got record turnouts in ’08 and ’12, which were great. Turnout fell back down in ’16. And so while we can argue about what percentage of that dropoff was due to voter suppression efforts, whether it was Russian interference, whether it was just apathy, you know, there were legitimate reasons why people didn’t show up at the polls. But then in 2020, you know, despite everything else going on, you know, including a pandemic, you know, a significant percentage of people turned out again. So if you look at all those factors together, we had a lot of people who were eligible to vote who weren’t voting because they couldn’t be bothered. They felt like their votes didn’t matter. Maybe they believed the falsehoods spread by folks like Mr. Trump suggesting that our elections were rigged. Then you add to that mix somebody who comes along and says, “I alone can fix” whatever problem you want to talk about. “Everything is terrible.” That message resonated even though most Americans thought that the economy was doing pretty well. Most Americans thought that our foreign policy wasn’t working. Most Americans said that we needed to deal with health care. And yet somehow
### Assistant: The singular man, the one person could solve everything.
### Human: Exactly. Even though everyone knew that nobody’s one person can solve anything. You need to have coalitions, you need compromises. Somebody who’s willing to say that they’re going to ignore the Constitution when it suits their purposes. He said things during the campaign that he still hasn’t admitted publicly, although others have come forward to tell us exactly what he said and meant behind closed doors. It doesn’t surprise me that a number of suburban women voted against him because they heard him saying these things. Whether it was about his treatment of women, whether it was about his comments about African Americans or Latinos, and now we hear reports about his own taxes. None of it surprises me. What does surprise me is how hard so many of our fellow citizens are working to normalize what this guy is doing. How often they give him credit for good economic news. I remember during one of the debates, someone asked me about something that I had done in office, and he interrupted before I even answered and said, “You didn’t create jobs, I did.” As if it was his job to create jobs. Then he goes on to brag about how many cars he sold or how many houses he built. Meanwhile, millions of people lost their jobs during that same period. Millions of families lost their homes during that same period. Yet somehow he gets the credit because his businesses were doing better than other people’s businesses. Or he gets the credit for
### Assistant: because the stock market went up.
### Human: Because the stock market went up, which means very little unless you happen to be invested in stocks yourself. And I suspect that many of his voters aren’t.
### Assistant: If you’re listening to this interview, please do not call into any of the shows where I’m appearing and say that I said that the stock market went up because of Barack Obama. I never once in my life talked about the stock market and its relation to anything except actual performance. I am not responsible for the rise of the Dow Jones Industrial Average.
### Human: There’s no doubt that he used every tool available to him to try to boost the economy during his time in office. There’s also no doubt that he inherited an economy that was already showing signs of improvement when he came into office. In fact, unemployment was falling before he took over. When you look at the totality of what he did, especially compared to what George W. Bush did, which was basically throw gas on the fire after 9/11, and then go ahead and take us to war in Iraq without bothering to find out whether we should or shouldn’t. That was a disaster economically and otherwise. I’m not excusing myself. I’m just saying that I tried to do positive things for the country. I focused on education. I focused on expanding access to affordable healthcare. I worked to make sure that we were investing more in infrastructure. And I kept telling everybody that it was going to cost money. And guess what? People wanted lower taxes. So if you want to spend money, you have to pay for it. Either you tax people less, which would mean they wouldn’t have enough money left over to buy stuff, so therefore, demand would fall. Or you raise revenue through higher taxes, which means that some people might decide they don’t want to buy your product anymore, but hopefully you’ve diversified your customer base and you’ll be okay. But anyway, that was the challenge. But what
### Assistant: was the attitude towards government spending during that time? Was there a Republican Congress?
### Human: Yes. Yes, there was. And throughout most of my time in public service, whenever anyone would propose making investments in infrastructure or research and development or early childhood education programs or things that will help lift people up, instead of coming back and saying, “We support that,” what they’d say is, “Show me the money!” And the reason they would say that is because historically, Republicans and Democrats alike, until very recently, understood that we live in a society in which we all benefit when somebody else succeeds. Nowadays, too many people seem to believe that if someone else is getting something for free, it must necessarily be bad for them. That’s just simply not true. And so I think part of what contributed to this environment of rising inequality and stagnant wages is that so much of what we saw happening was justified based on this idea that the pie is shrinking. Whereas in reality, there were plenty of opportunities to grow the pie. Just as we had an opportunity during my time in office, but also prior to my time in office, to invest in things that would actually generate new wealth rather than just redistribute existing wealth.

Not terribly good but a good start. More takes of this conversation using all the checkpoints and the same “Obama”-prompt can be found here.

Thoughts on how to improve

Things move so quickly in the field of open source LLMs that, while working on this, several innovations have been released.

Bigger and better models. OpenLLaMA provides a bigger model, 13B, and recently released versions 2 and 3 of their foundation models which have been pre-trained on additional data.
Facebook research also recently released a version 2 of their popular LLaMa models, this time licensed for both researchers and commercial entities.
Increased context size. The OpenLLaMA model I used here is restricted by a context size of 2048 tokens which is obviously a limitation for training a model to generate long-format conversations. Recently, several people have discovered ways of increasing the context size several-fold. See this and this, and many others which I probably missed. The Subreddit LocalLama is a good place to start looking for hot stuff. Also, Llama 2 provides 4k context out of the box.
Only train on target. The approach above is a fine-tune on raw text including question and answer. By separating the text into input and response and only train on the response I would assume that the model’s quality will improve. Guanaco was trained the same way as described here simply because the raw text approach apparently yielded a good enough chatbot already so there was no incentive to go further and put more effort into.

Key points

QLoRA provides a way to fine-tune available foundation models at home in reasonable amounts of time
Data preparation is key but might be laborious (depending on the use case). Look at the raw data to make sure it is what you think it is (otherwise: shit in, shit out)
Standard training hyperparameters may work but yield suboptimal results, understand all wheels and nobs, change them, see what happens
Observe the training process, using wand.ai for example, see if the numbers and plots make sense
Learn by doing

I learned a lot in the process and it is my hope that you, too, gained some insights from this tutorial.

The code and transcript data (in JSON format) are available as a GitHub fork of the QLoRA repository.

If you liked this story, have additional ideas or questions, or wonder why in hell anyone would spend time doing this, please leave a comment here or on twitter.