The Accessibility Illusion of Generative AI in Research

6th #AI4BetterScience series blog

5 min readJul 30, 2024

Our previous blog explored why anthropomorphising generative AI technologies can hinder their practical use in research. Now, let’s delve into another crucial aspect: the perceived accessibility of these technologies. While GPT-3.5 was the expected next step after GPT-3, the true revolution came with its open-to-the-public chat interface. This new access opened the technology to everyone, enabling people to test these tools in their respective fields and realise their potential. However, the perceived simplicity often masks the sophisticated mechanisms behind these models, leading to suboptimal usage. This blog will explore three facets of this illusion: the naïveté of initial interactions, the flexibility-usability trade-off, and the need to become a machine cooperation expert.

Just Asking for It Isn’t Enough

For many users, the “chat” represents their only interface for interacting with large language models (LLMs) and other generative AI. Its popularity is mainly due to its accessibility, which invites users to interact as if conversing with another human. This interface gives a partial illusion of simplicity, inviting users to ask the model what they want with a straightforward question, expecting precise and insightful responses. However, this naive prompting frequently leads to disappointment, causing users to mistakenly conclude that the technology is inefficient or flawed. The real issue lies not in the model’s capabilities but in the users’ approach induced by the accessibility of the chatting interface appearance. Effective use of LLMs requires understanding the model capacity, providing relevant data, and mastery of prompt engineering.

The chat interface invites the user to interact with the model in a naive human conversation. However, while complex usages are accessible in different ways, we can only obtain quality-relevant outputs by understanding the model characteristics, providing the appropriate data, and mastering prompt engineering.

However, while it seems an established recipe, we regularly encounter challenges as optimising AI interactions is an evolving field. For example, a community of researchers, engineers, and enthusiasts try to make LLMs play chess accurately. Various prompting strategies have been used, from embodying different champions to describing the entire history of moves each time. Only recently, the best level reached by LLMs was lower than the average player. However, a recent counterintuitive prompt has been discovered. The prompt is simple: the first information lines of a standardised record chess called the Portable Game Notation (PGN), which includes the date and players’ names. From this minimal information, the model GPT-3.5-turbo-instruct can now beat most of us!

This example highlights that more than intuition is needed to use these technologies optimally. Users need to move beyond surface-level interactions and engage with their underlying complexities. This situation reflects a kind of Dunning-Kruger effect, where the perceived simplicity and accessibility of these tools lead users to overestimate their proficiency.

Flexibility-Usability Trade-Off

While you use a common language to interact, using these models effectively is more akin to programming than casual speaking. However, a prompt is not precisely like a command. It is an input that will resonate with a model that generates a text under the aggregation of word prediction. This specific type of interaction needs to be differentiated from programming, strictly speaking, but not assimilated into human communication. This is a new type of interacting with the machine, which still need to be fully studied. But when you progress in mastering this new machine cooperation, you understand the liberty to ask anything comes with the complexity of asking it correctly. The more specific the request, the more complex cooperation with the machine will be.

This introduces a trade-off that computer and user experience (UX) engineers know well: the balance between flexibility and usability. To make a highly usable tool, you must constrain its capacity to a specific goal, reducing flexibility. Conversely, if you want a very flexible technology, you must keep an open range of commands and increase complexity.

For example, some tools, such as Consensus for literature exploration, are designed for specific purposes. These tools constrain the model to particular usages, making it more straightforward. However, you will quickly face limits if you want to use this tool for something else, such as generating code. On the other hand, if you want an output that an existing specific tool can’t provide, you will need to start with more general access to the model and use more complex cooperation strategy.

Illustration of the spectrum of generative AI technologies regarding their flexibility and usability balance.

The flexibility-usability trade-off is well-known and not inherently problematic. Confusion arises with tools that fall in the middle of this spectrum. For example, using ChatGPT or Co-Pilot might give the impression of having access to the foundational model. However, users interact with a partially constrained model fine-tuned with specific instructions. Most users don’t realise that a pre-instruction prompt organises the interaction and influences the outputs.

Becoming a Machine Cooperation Expert

To fully harness GenAI technologies, researchers need to develop expertise in cooperating with these technologies. This involves skills and knowledge beyond prompt engineering, including primary education on model nature, learning prompt engineering, leading relevant iterations, implementing helpful data, and selecting the appropriate tools for usage.

Indeed, researchers must choose the appropriate level of flexibility/usability based on their needs and willingness to engage deeply. This involves deciding between basic querying, prompt engineering, or even coding. Beyond selecting the right approach, researchers must continuously adapt their methods and understand how their interactions with LLMs evolve. It’s essential to distinguish between irrelevant outputs due to model limits or constraints designed to increase usability. This understanding ensures researchers get the best out of the technology and maintain scientific integrity.

This illustrates an adaptation of the Dunning-Kruger effect regarding confidence in representing the technological capacity for generative AI and the development of machine cooperation expertise.

Conclusion

In conclusion, the journey to effectively harness generative AI in research is more intricate than initially appears. While the open interface of models like GPT-3.5 provides an illusion of simplicity and universal accessibility, true proficiency demands a deeper engagement with the technology. The three facets discussed — naïveté in user interactions, the flexibility-usability trade-off, and the necessity of becoming a machine cooperation expert — highlight the complex reality behind using these advanced tools effectively. Researchers must move beyond mere surface-level usage to embrace the detailed art of prompt engineering and iterative interaction. This approach not only enhances the effectiveness of their work but also mitigates the risk of misinterpreting the capabilities of generative AI. Only by acknowledging and navigating these complexities can researchers leverage genAI to its full potential, ensuring that its use is as transformative in practice as it is revolutionary in promise.

Thank you for reading!

Keeping going with the previous issue: Do Not Humanise Artificial Intelligence (in research, at least)