Artificial intelligence has made significant progress in recent years, surpassing humans in various tasks. From playing games like chess and go to practical tasks like predicting protein structures and performing matrix multiplication calculations. Large language models, in particular, have benefited from technological advancements in creating sophisticated information and dialogue systems. An excellent example of this is ChatGPT, a language model that performs remarkably well in composing documents, conversing with humans, and answering questions.
However, another area that has piqued the interest of researchers is whether artificial intelligence can write philosophical essays that are innovative and clever. Expert-level professional philosophy has long been thought to require a level of competence and knowledge that current AI models still lack. But, can large language models be trained to write philosophical texts that are indistinguishable from those written by actual philosophers?
Researchers from the University of California-Riverside, École Normale Supérieure (ECN) in Paris, and Ludwig-Maximilians-Universität München aimed to address this question. They created a large language model that can respond to philosophical queries similarly to a specific philosopher. They fine-tuned OpenAI’s GPT-3 language model based on the work of philosopher Daniel C. Dennett. The researchers concluded that the model could produce responses that closely mirror human philosophers’ answers.
The third-generation Generative Pre-Trained Transformer (GPT-3) is an autoregressive language model that uses deep learning to generate texts. The model analyses a massive corpus of text to predict the next word in a sentence by looking at its previous context. The researchers fine-tuned the GPT-3 model based on Dennett’s earlier writings to give his typical word usage patterns more weight when predicting the following word in a sentence.
To evaluate their fine-tuned model, the researchers asked Dennett ten philosophical questions and then posed the same questions to their language model. They collected four responses for each question without cherry-picking and asked 425 human users if they could differentiate between responses to philosophical queries given by Dennett and those created by the machine. Expert philosophers and readers of philosophy blogs could correctly identify Dennett’s responses roughly 50% of the time. In contrast, average participants with little to no philosophical background did so only 20% of the time. These findings imply that a GPT-3 model that has been fine-tuned can be surprisingly close to speaking in the voice of a certain philosopher.
While the language model delivered impressive results, there is still room for improvement. The team plans to develop their model further and apply it to more real-world scenarios in the future. They are also exploring the potential for making it into a tool that would be very useful to philosophers and historians.