Harnessing the Power of LLMs: How AI Transformed My Work in 2023

Why 2023 marked an explosive turning point for artificial intelligence and me

Tânia Frazão, M.C.S., D.V.M.
The ABCs of AI
12 min readMar 14, 2024

--

first person view of a stage. the stage has a wood floor and big bright red curtains.
Photo by Rob Laughter on Unsplash

During the year 2023 so far, artificial intelligence is capturing the fascination of technologists and the general public alike thanks largely to the meteoric rise of large language models (LLMs).

As someone who has witnessed firsthand the rapid evolution of LLMs over the last 14 months, I feel compelled to chronicle this technology’s incredible journey as well as candidly address the strides still required.

What are LLMs?

Definition

Large Language Models, or LLMs for short, are a type of artificial intelligence (AI) that have been trained on a vast amount of data and designed to understand existing content and generate new, original content.

These models are trained on a massive trove of articles, Wikipedia entries, books, internet-based resources, and other inputs to produce human-like responses to natural language queries (prompts).

They process natural language inputs and predict the next word based on what they’ve already “learned”. Then they predict the next word, and the next word, and so on until their answer is complete.

In simpler terms, if you’ve ever used a chatbot ChatGPT, Bard, etc, or a tool that can summarize an email, translate text, or even generate creative ideas, you’ve likely interacted with an LLM.

They’re the technology behind many of the AI tools we use today.

Examples of LLM models

Some popular LLMs that I have personally used and I consider to be base mainstream general models and multimodal (read the article Compound AI Systems: The Future of Smarter, Team-Based AI) are:

  • OpenAI’s GPT-3 and 4,
  • Google’s LaMDA, PaLM (chatbot Bard) which is now the Gemini brand with the models:
Gemini three model sizes from Google AI from deepmind.google.com accessed on 14 March 2024.
  • and Anthropic’s Claude 2.3, …

There are many many more… like Hugging Face’s BLOOM and XLM-RoBERTa, Nvidia’s NeMO LLM, XLNet, Co:here, and GLM-130B2…

And many others that are specialized in one particular task, such as generating images (for example Stable Diffusion model), generating code…etc

Models latest advancements

These models have transformed over the past months into powerful AI tools with a diverse range of capabilities. LLMs can now not only communicate through text, but also by listening, speaking, and even analyzing visual information, for example, a few latest advancements that I recall are:

Gemini can analyze YouTube Videos and images

Upload a 500 pages document to Claude 2.3 to analyze versus GPT4–turbo 300 pages.

There are numerous use cases and workflows. My advice is to think of what is your input and what you want to generate.

To make things simpler, I did an infographic illustrating the diverse capabilities of large language models (LLMs) across various output formats and tasks. The top row shows icons representing different output types such as text, code, audio, images, video, and APIs.

The colored dots correspond to three specific LLMs: GPT-4 (ChatGPT) represented by blue dots, Claude 2.3 (Claude.ai) represented by green dots, and Gemini (Google) represented by pink dots.

The grid visually displays the relative strengths and weaknesses of each LLM across the various output types considering the different inputs, allowing for a comparative analysis of their capabilities. This infographic provides an intuitive overview of the diverse applications and potential of LLMs in handling multimodal data formats.

This image illustrates the diverse capabilities of large language models (LLMs) across various input and output formats and tasks. The top row shows icons representing different output types such as text, code, audio, images, video, and APIs.
Comparison matrix of different large language model inputs and outputs, including text, code, audio, images, and an API.

Key discussion points:

  1. All models support text input and output, as indicated by the filled-in circles in those columns.
  2. GPT4 and Claude 2.3 are the only models shown that provide an API for developers to access the model’s capabilities programmatically. While the API is not an output format per se, having an API enables the integration of these models into other applications. Only the API supports all input types listed (text, code, voice, images, video, and web content).
  3. Gemini, Google’s model, does not appear to offer an API. This suggests it may be more of a closed or experimental system at this stage.
  4. The chart does not provide any information on the relative performance, speed, size, or other differentiating factors between the models. The input/output support is just one dimension for comparison.
  5. As language models continue to evolve rapidly, this landscape may change significantly soon with new capabilities and players emerging.
  6. Gemini can only search videos from YouTube and analyze them. It does not generate any. I decided to add it to the discussion points because of its unique capability.
  7. GPT-4 turbo can analyze videos. Link: https://community.openai.com/t/reading-videos-with-gpt4v/523568
  8. GPT-4 can produce reading text and listen to you while you speak but only if you use the phone application and not the desktop version.
  9. Gemini can generate images but only for some users at the moment, depending on your location.
  10. Only on Google AI Studio (https://aistudio.google.com/), users can upload documents for Gemini to analyze, this option is not available to the other user interfaces of Gemini and Gemini Advance (https://gemini.google.com/)

What drew me to start utilizing this technology?

I first stumbled upon chatbots and their rapid development back in January. I was reading a general tech newsletter that mentioned ChatGPT in an article, and I was immediately curious to experiment with it.

Here is my first interaction with generative AI:

Screenshot of my very first personal conversation with ChatGPT you can verify the month on the right top corner “January”. I immediately started using ChatGPT naively as a friend seeking advice and the feeling was good when I put something out of my chest. I didn't know then how this technology worked. We have to take into consideration that LLMs don’t know the actual meaning of the words or feel the words, yet. They are just generating the next probable word. One may seek orientation and guidance but for our safety, we need to know how they work under the hood. An advice for every tool we use.

What personal experience do I have using them?

Using LLMs to Aid My Work

LLMs have become invaluable assistants in my work developing code, learning, and writing about AI technology and other subjects. Although I don’t want these models to do all the heavy lifting for me (because when they do I feel lazy and dumb), they provide useful guidance that enhances my productivity.

For example, when I’m designing code, I’ll describe the overall task I want to accomplish for an LLM. It will then suggest functions I may need to use, potential code frameworks, or name specific libraries that could be helpful. With this advice, I’m able to code more efficiently while still ensuring I thoroughly understand each line by writing it myself. For computer science students and coders in general what I usually recomend is cs50.ai.

Cs50.ai website landing page. It has the log in green button and a photo of a rubber duck with a city that has a bridge and a river ,as a background.
cs50.ai website screenshot by the author on November 28th, 2023.

I also rely on LLMs to proofread and strengthen my articles about artificial intelligence before they get published. I put myself in the reader’s shoes, carefully considering which topics would be most interesting and useful to cover. Then I ask an LLM to review my draft text. It catches typos or grammatical issues, validates the content flow, and recommends additions if certain points could be expanded on.

I use also it for other small things. Generating images, email reading, and summary automation, and now I am into music just for fun. Music makes you more productive and I enjoy mixing different styles.

By becoming a core part of my workflow,

LLMs give me a thoroughness and thoughtfulness I couldn’t achieve alone under tight deadlines.

They’ve allowed me to improve the quality of both my coding projects and writing.

The Hype Year 2023

I’m proud to say I’ve witnessed firsthand the rapid transformation of chatbots into advanced LLMs over the past year. The progress has been truly remarkable — it seems every month brought new capabilities and performance milestones. Even my knowledge of these tools grew exponentially. I am still and will be learning every day.

In January, the mainstream models GPT and Palm 2 could only communicate through text. Today, leading models evolved from solely mimicking conversations to demonstrating true comprehension of diverse inputs as I mentioned before, and with extensions, and plug-ins their capabilities have been augmented. For example, extensions that allow LLMs to do mathematical calculations.

Specialized models can now synthesize striking images (DALL-E, Stable Diffusion), produce coherent audio music samples (Google Deepmind Lyria), videos (Stable Video Diffusion), etc.

Use cases

Additionally, LLMs are beginning to excel across a widening range of use cases — no longer limited to conversational abilities. You can use them for a simple document summary, for example, LLMs like GPT and Claude allow document uploads; more complex use cases may involve additional technical skills, you can have a microphone recording a sports commentator in real-time and have the LLM write or even speak about the game.

Individuals and businesses now have tools that they can use to automate their workflow.

If your work involves a lot of text and if you do copy and paste repeatly or any other action numerous times, this may be a sign to start to automate using LLMs.

Rather than testing every niche AI application that you find online promising the golden ticket to your problem, I recommend going directly to free tools (chatbots) like Bing Chat (GPT4), ChatGPT (free GPT 3.5), and Claude.ai (Claude 2.3) to brainstorm what best fits your needs. Their flexible interfaces allow you to prompt in a more natural human way and build on that, prompt by prompt, to match very specific use cases or workflows. Some of the models have also access to the web so you can also directly to the chatbot if there are any AI tools available for what you need.

In my experience, I have gone through trial and error a couple of times, experimenting with AI tools, many of which offer trials, but I keep things pretty basic and low-cost.

If tackling a particularly unique problem, search online communities on Reddit or elsewhere to find if someone has shared solutions for similar AI modeling challenges.

AI Startups

If you see thousands of AI apps they are likely built on top of underlying AI models that provide APIs (in other words, different ways to interact with a product instead of via a chatbot), through platforms like OpenAI and Anthropic’s Claude which comes with a cost. This is still more affordable than developing proprietary models in-house which requires substantial data, computing resources, and expense.

Others rely on public open-source free models available through Hugging Face and other model repositories.

Today’s tech companies, researchers, and innovators are in a relentless pursuit of the next big breakthrough in AI. The current AI landscape resembles a gold rush. The “gold” in this context is the immense potential that AI holds — from revolutionizing industries to solving complex problems that were once thought to be the exclusive domain of human intelligence. Some will keep going and attract users, others will be left behind. But my main advice here has already been said.

Brainstorm ideas first using the conversational models and then check AI tools afterward.

Sometimes is frustrating…

You can lose a lot of time trying to find the right prompts to get an LLM to generate what you want. Sometimes it feels easier to just do the work manually rather than spend time prompting.

However, mastering effective prompt engineering makes a huge difference in getting useful output from LLMs. Each model is trained differently, so you need to learn prompt engineering specific to that model through guides and experimentation.

It’s also critical to always verify the accuracy of the LLM’s output. I found the generated text was often more wrong than right early on. The quality improved with practice but remains far from perfect. This means the LLM’s output typically mixes correct information with false information. If users aren’t careful to validate the text, they risk misleading themselves and anyone they share it with.

There is still a lot of work needed to improve LLMs. For now, they require extensive human guidance and verification to produce reliable, high-quality results.

Looking to the Future — Artificial General Intelligence (AGI)

The rapid pace of innovation in LLMs this past year excites me about their future potential.

I hope to see progress continue toward developing multi-modal models that combine strengths across audio, visual, and conversational capabilities.

Seamless integration of inputs and outputs could enable interactions with LLMs to become ever more natural and intuitive over time.

Advancement in the quest for AGI also holds enormous promise. As models become able to demonstrate more generalized “common sense” and human-like reasoning, they can take on more responsibilities currently reserved for people. Though current LLMs still have limitations, their evolution toward AGI could greatly expand how we integrate them into our work and lives.

Using Local Assistants — Microsoft Co-Pilot and Google Duet

Local AI assistants like Microsoft CoPilot and Google Duet offer some advantages over non-local models. With local assistants, you don’t have to constantly copy and paste the information to the user prompt — the AI can directly read what you’re typing in real time. This makes the interaction faster and more seamless. Local AI assistants show promise for integrating AI writing assistance into daily workflows, for example, helping you write an email in Gmail, summarizing a webpage that you have opened in your browser, and opening an app, etc.

I have experienced both, and they are great for increasing even further your productivity.

Microsoft Co-Pilot from Microsoft .
Duet AI can reduce the burden of work by generating a summary from your relevant source documents and automatically building a presentation in Slides. From Google

The Emergence of Powerful Open-Source Models

In the dynamic world of AI, a significant shift has been observed towards open-source models, like Llama 2 and Mistl. These models aren’t mere academic feats; they have been setting new standards in application and operational efficiency. This challenges the dominance of closed-source counterparts like OpenAI and Anthropic. As we move into 2024, this competition is expected to escalate further.

Yan Lecun from Meta, often considered a leading figure in the open-source AI ecosystem, offers valuable insights into this evolution. He highlights the pivotal role of open source in shaping the internet and software engineering. According to Lecun, the principles that led to the success of open source in these areas are likely to propel AI forward. Lecun emphasizes the benefits of open platforms: accelerated progress, enhanced security, and superior performance. He addresses a common concern in the closed-source community regarding the potential misuse of AI, arguing that open-source models enable faster identification and resolution of security flaws. I agree.

Centralization vs. Distribution in AI

The recent upheaval at OpenAI, including Sam Altman’s departure and subsequent return, underscores the risks of centralizing AI power. Echoing Nassim Taleb’s theories on fragile and antifragile systems, Lecun suggests that open-source AI, with its distributed and networked nature, offers a more resilient and robust alternative to centralized models.

Sam Altman from Saltwire.

The 2024 Outlook: A Hybrid AI Ecosystem

The trend for 2024 points toward organizations embracing a hybrid approach, integrating both open and closed-source models. This strategy not only balances risks but also capitalizes on the strengths of each paradigm, leading to a more diverse and competitive AI landscape.

In the end…

My LLM Winner in 2023 Award Goes To…

OpenAI’s ChatGPT GP4 marked a significant milestone in 2023, evidenced by its widespread adoption:

  • Over 100 million active users and two million developers have engaged with ChatGPT GP4, showcasing its popularity and versatility.
  • 92% of Fortune 500 companies implemented the model, indicating its substantial corporate influence.
  • Most Fortune 1000 companies are still exploring and testing its potential, highlighting the ongoing evolution in AI integration within the workplace.
  • The ability to do data analysis, ask to save documents such as .doc, .xlsx, .ppt, and user fine-tuning by letting users create their GPTs have such importance and utility that makes this tool great.

This information underscores ChatGPT GP4’s role as a transformative tool in both technology and business sectors.

At the moment,

In my view, the Opus model of Claude 2.3, released on 29th February 2024, is exceptional compared to other models GPT-4 and Gemini. I used it a couple of times. However, it may not offer the practicality needed by most users, in my opinion. When it comes to the quality of generated text, logical reasoning, and accuracy, it outperforms GPT-4. This isn’t just my personal belief, but is also supported by the following table:

Conclusion

While challenges remain, such as the need for improved accuracy and the potential for misuse, the future of LLMs looks incredibly promising. The emergence of powerful open-source models and the trend toward a hybrid AI ecosystem in 2024 signal a new era of innovation and collaboration. As we continue to push the boundaries of what’s possible with AI, it’s crucial to approach these developments with a mix of excitement and caution, ensuring that we harness the power of LLMs for the greater good. I look forward to seeing how these technologies will continue to evolve and shape our world in the years to come.

Follow my ABCs of AI publication and subscribe! to great AI content. No more FOMO, no more AI bewilderment.

Find me on the social media platforms and say Hi!

❤️Thank you for helping me share AI literacy with the world.

Copyright © [MissGorgeousTech/ABCsofAI], 2024. Feel free to share with attribution and a link back to the original.

#artificialintelligence #guide #tutorial #LLM #copilot #gemini #ChatGPT #GPT-4 #Claude2.3

--

--

Tânia Frazão, M.C.S., D.V.M.
The ABCs of AI

Computer scientist &Vet.(DVM) passionate about animals, the potential of generative AI and Python. Shares insights on pet health and tech.