Probable Outcomes: Generating Custom Code with ChatGPT

Published in

Metropolitan Archivist

6 min readJul 10, 2024

by Anton Sherin
Associate Archivist, Solomon R. Guggenheim Museum

Large language models (LLMs) such as ChatGPT (Open AI, 2024), which are complex algorithms capable of generating natural language, have been hyped as technology that will fundamentally change everything. Some of my peers are hesitant to engage with the interface until more is understood and regulated about how LLMs function. Though I closely follow coverage of the highly problematic aspects of LLMs, their promise of crowd-sourced ready-reference incentivized me to become an early adopter and explore how this tool could help me manage an overwhelming number of tasks. Many readers are probably, like me, archivists with non-technical backgrounds, seeking creative solutions within highly constrained resources. The following account of my experience developing scripts using ChatGPT is intended to help fellow information professionals assess its possibilities and limitations in the context of their own work.

The feature I primarily use in ChatGPT is its code block (figure 1). When prompted, it delivers a box containing syntax-highlighted scripts with a “Copy code” button in the upper right. The code block usually includes helpful comments and is followed by a brief explanation. The first utility I found for this box was to quickly generate, or troubleshoot, Excel formulas that were otherwise prohibitively time consuming to construct.

Figure 2: Chat GPT generating a sumif formula template in Excel

ChatGPT generates the statistically most likely combination of text tokens by averaging trillions of data points from much of what is not behind a paywall on the internet (Computerphile, 2023). This includes blogs and forums where a plethora of explanations exist for coding in virtually every computer language.

From Excel formulas, I branched out to more involved scripting, such as VBA Macros, PowerAutomate Flows and SQL reports in Koha. To provide some context, VBA (Visual Basic for Applications), is the programming language used by Microsoft to create custom functions and automations in and across multiple apps (Spreadsheet Planet, 2024). Similarly, Microsoft PowerAutomate is a cloud-based service that automates workflows, which they call “Flows,” across a broad range of applications and services (Pandey, 2023). Finally, Koha is a web-based open source software library automation package that runs off a Structured Query Language (SQL) database (Koha, n.d.).

Figure 3: ChatGPT generating VBA for creating digital folder hierarchies from a spreadsheet

I was inspired to experiment with code snippets, or reusable blocks of programming language (Delange 2024), by YouTube videos touting that ChatGPT could code entire websites. I found that when executed in a browser console, code snippets of JavaScript could trigger robotic process automation (RPA), which performs repetitive tasks, such as extracting data or filling in forms (IBM, 2024), within the ArchivesSpace browser interface. My institution does not currently have IT support to complete batch operations using the ArchivesSpace Application Programming Interface (API). The primitive apps that emerged from my tinkering circumvent the need for API operations by iterating through the URLs of a group of records in ArchivesSpace and simulating mouse clicks and other actions on the HTML elements in their form fields. This is not a particularly elegant or robust solution. As a one-person team, I still find it preferable to manually manipulating hundreds of records.

Figures 4 and 5: Container indicator input box from an RPA snippet and the resulting output tab displaying locations.

The automations I created are saving me time, but without formal training in programming, generating code snippets with ChatGPT is a highly iterative process. ChatGPT only spawns text based on the information that I provide and can easily miss important elements of what I intended. It has trained me to be very careful when describing the desired functionality and the necessary sequence of conditional operations. Asking ChatGPT to explain what each part of the script accomplishes can quickly identify where I was unclear in my instructions. With more use, I find my programming literacy expanding, which is in turn expediting the coding process.

A project typically begins by querying ChatGPT if a specific script functionality is possible. If it responds yes, it obliges me with an explanation of the recommended methods and languages that could be used and often provides a code block example. I have learned, however, that it is not always right about whether something is possible or the best method to achieve it. Errors of this kind become increasingly likely with more complex requests. I find it most effective to start by developing small components of more complex functions and then slowly build them up into a complete script. Building components in separate chats can avoid cross-contamination before they are integrated in a master chat.

The bulk of time generating a script is spent iteratively testing what ChatGPT spits out, replying with any console errors or descriptions of what went wrong, and then testing the modified code block it replies with.

Figure 6: Correcting for errors during code generating iterations

Glitches frequently occur, such as code blocks mysteriously returned in a different programming language from the previous iteration. Pointing out the error can sometimes generate a correct version, especially when context is provided for what went wrong. I also find it effective to go back to an earlier point in the chat and edit my prompt. More often, I begin a new chat rather than overwrite the old. Despite all this, there are times that ChatGPT repeatedly gives me an incorrect solution. I speculate that I am either approaching some limitation in the data it was trained on, or that the information I previously provided is too much of an influence. When caught in these cul-de-sacs, beginning again in a new chat can potentially change its approach to the problem.

While learning to effectively interact with ChatGPT has required a significant amount of energy, the cumulative return on that investment has left me with more time for other activities, like writing grants to hire processing staff. I do not believe that LLMs as they currently exist can solve all, or even most, of an archivist’s problems. There are some things I still prefer to do manually, like writing this article. I am wary of sharing personal and institutional information or generating text for professional purposes without knowing the sources. With coding, I can quickly test the validity of the output without the need for attribution. For me, ChatGPT’s potential is in its ability to aid users in building custom tools. The author Ted Chiang astutely quipped that “ChatGPT is a blurry JPEG of the internet” (2023). But as with many tools, a determined user can work around deficits in its performance. I encourage those who are curious to take some time and explore how LLMs can help preserve bandwidth for the cognitive tasks that require your archival expertise.

Bibliography

Chiang, T. (2023, February 9). ChatGPT is a Blurry JPEG of the Internet. The New Yorker. https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web

Computerphile. (2023, March 7). Glitch Tokens [Video]. YouTube. https://youtu.be/WO2X3oZEJOA?si=11IuK1pi-xO15Vs_

Delange, Julien. (2024). What is a Code Snippet? Codiga. https://www.codiga.io/blog/code-snippet-definition/

IBM. (2024). What is RPA? https://www.ibm.com/topics/rpa

Koha. (n.d.). About. https://koha-community.org/about/

OpenAI. (2024). Chat GPT. https://openai.com/chatgpt

Pandey, S. (2023, March 15). How to Use Power Automate Workflows [Tutorial Guide for Beginners]. BrightWork Blog. https://www.brightwork.com/blog/get-started-with-power-automate-workflows

Spreadsheet Planet. (2024). What is VBA in Excel? https://spreadsheetplanet.com/excel-vba/

Probable Outcomes: Generating Custom Code with ChatGPT

Written by Metropolitan Archivist