Explorations in AI Tooling: Initial Thoughts

Marissa Biesecker
9 min readNov 14, 2023

--

The only constant is change. — Heraclitus

The state of technological innovation has been in hyper speed for my entire life. First the internet, then personal computers and “high level” programming languages, followed by frameworks and smart phones, and a boom all along in people using and building upon these platforms. Now, we enter the era of Artificial Intelligence.

It seems to follow naturally then, that we are fast approaching the world where my profession, programming, is going through a revolution, and into AI driven development. Others have already written about it in numerous articles. I also have colleagues who have already integrated AI tools like GitHub’s Copilot into their workflow, experimented with Tabnine, or used ChatGPT to generate code. I’ve also been trying to explore the field more and have been listening to podcast like Practical AI and The AI Podcast, where I’ve heard of even more options, like Codieum, and other tools, apps, and models on Hugging Face.

If it wasn’t already obvious, I’ve been going down quite a hole to learn more about AI and Large Language Models (LLMs). I don’t want to be scared of this; the opposite of that, I want to be excited about it, learn to use it, and help build solutions to the world’s problems using it. But I also have reservations. I grew up reading Isaac Asimov’s I, Robot, and I’ve been reading the news about artists’ and striking writers’ concerns, and I’ve seen the movies that have been coming out with the AI as or AI using protagonists. If the technologies themselves aren’t the moral and philosophical concern, the concern is how humans will use them or how we (programmers) might get them wrong.

I need to understand the technology better. I am far less nervous about anything when I feel like I understand it. So, I started experimenting and researching more. Like most people, I started with ChatGPT, the free 3.5 version, which was trained on a dataset that ended in September 2021. I started writing some prompts, trying to emulate the style in the the AI Driven Development article. The chat input contained environment in the browser was a low barrier to entry and felt like an easy and safer place to start. I enjoyed this little experiment and this tool. I especially liked how I felt like it was helping me, to be a better prompter, which is a good skill working with other humans too. But it wasn’t an experience I was expecting or desired. I wanted to try one of the tools that would integrate into my usual development workflow, not just one that added to it.

However, when I started looking into the IDE (integrated development environment) tooling out there, like GitHub’s Copilot, Tabnine, and Codieum, I starting feeling more and more uncomfortable about it. Then it hit me, just like an early 2010’s song that’ll probably pop into every millennials’ head:

Ashanti and Fat Joe serenading us said it best…
Just as Fat Joe and Ashanti said…

It’s about trust

I didn’t trust them enough to start using them right away. But I wanted to take the time and space to explore and articulate why.

Because of the world that we live in, every new great advancement that can bring so much good in the world, can also bring an equal amount of bad, or at the very least pain. We saw this with dynamite, invented as a safer tool for building and mining, but unfortunately, we’ve seen it even more used as a primary tool of war and killing. We’ve seen negative consequences even more recently with the internet and social media. While they are tools to easily share and search for information and stay connected to friends and loved ones, they have also contributed to reducing people to data points for profits, cyberbullying, and declining mental health (just to name a few of the emerging side-effects).

As we enter into yet another technical revolution with AI, while we can’t predict exactly what will happen, but we can analyze similarities and patterns of past events to learn and try to theorize about what might likely happen. I realized that I’m not as naive and readily trusting of these new technologies as I was when Facebook and Reddit entered the scene. I’m nervous about what they are going to be doing with my data that I need to provide, and what data I might be inputting into the tools. And an important consideration I also have is a desire to know if I’m putting in additional (free) labor to help train it, or not get benefits or compensation for this work, only for it to possibly become inaccessible (thanks Reddit) or sold back to me at increased cost later.

As a result, I’ve been trying to ask myself, how can I have enough trust in these tools to use them? To try to answer this question, I’ve been doing some deep thinking and having some great discussions with friends and colleagues. That have lead me to some additional questions and raise additional issues.

  1. Equity for contributions.

I went on a trip recently, and my friend kept asking ChatGPT for travel and itinerary recommendations. And I found myself cringing at this, while at the same time being excited and fascinated. There is no way that individuals could get compensated for the information that ChatGPT provided other than the service itself. If you haven’t used it yet, it’s like a more complete and advanced Google search without a bunch of additional ads and crappy results that you still need to click on and gather the information from; it was only the direct response for information being asked. However, I was feeling guilty and wanted to find comparatively imperfect human recommendations to give credit to an individual contributor who really did the work of visiting the area and then writing about the experiences. Those same experiences and articles that likely trained ChatGPT in the first place. If you ask it how it was trained, the response is something like: “I was trained on a mixture of licensed data, data created by human trainers, and publicly available data.” It is certainly uncomfortable to feel like we are moving further and further away from human responsibility and contributions into a murky area of machine made content that can’t really take the credit for what it knows, which relied on the people who came before, all the while giving profits to a few already hugely profitable tech companies.

I am feeling even more sensitive to this issue as a programmer. These tools will likely be reducing the number of programmers needed on jobs, or at least jobs as we know them and have trained for them. Because I am human, I naturally fear this. I wonder if I use these tools, am I training it to help it put me out of a job? Or am I training myself to be ahead of the curve using these technologies and be in a better position for a future job? I fear this not only for myself, but for those who come after me, especially those with less experience. Just like the Industrial Revolution, there will certainly be a period of painful transition with lots of uncertainty for myself and my colleagues.

GIF By Rebecca Hendin

I’ve come to the conclusion on this topic that for myself, I will try to just trust the makers of the tools on what they say on these topics of training, contributions, and usage. For now, it is almost impossible to know for sure if it is true or not, or will remain true. In addition, I might have already helped in part to train these tools. I save most of my personal projects publicly on GitHub, and they’ve already said that Copilot was trained on public repositories, as were many other AI tools. In addition, I like to share information publicly on my website and in articles like this one, which OpenAI could be feeding into ChatGPT, or any other company into their model. I have reasoned that it is far more important for me to train with these tools and know how to use them for myself and my future than fear the uncertainty. However, I still have a related concern.

2. Data privacy.

We have been shocked to learn in recent history just how much information we have given away on the internet, about ourselves, our knowledge, our experiences, and how utopian freely shared and accessible knowledge and information has become abused (ahem Cambrige Analytica). And now I often think about how these AI tools will only make it easier for just a few powerful people to farm and find the statistical relevance and patterns of information to continue to abuse and manipulate other people.

However, is the damage already done? Have we already put so much of ourselves and our knowledge online that it is already too late and any AI will be able to learn and grow from all that is already out there, and that people will continue to add, even despite better efforts at privacy? Am I already using tools, implicitly trusting, that are actually violating my privacy and abusing the data that I wish to keep private?

Somewhat unfortunately, I’ve come to the conclusion that the answer to these questions is yes. Answering to the affirmative to this took me by surprise. I came to this conclusion after asking a colleague if they trusted GitHub’s Copilot. The response was that they already use GitHub and trust that it ignores the files it is supposed to and honor privacy concerns, so why wouldn’t they trust this new product? I envy this quick and confident response. I have still struggled to be so trusting. I feel like if I use Copilot I give up control of the privacy of my code, and enter solely into trust. Right or wrong, at least at this time, I feel like I have the control as to what I commit and send to GitHub from what I type in my code editor. If I start using Copilot, it will see everything I type in my code editor, not just what I am deciding to show it. I am still exploring how it actually works to confirm if this is a correct mental model, as it does feel a bit too alarmist. My current understanding creates a very powerful difference in trust from my current use of GitHub versus using Copilot, in a way that I am finding it harder to shake my distrust. But as also discussed with my friend, is any of my work really so precious that I need to be very concerned about this? Probably not.

Thinking so deeply about these concepts have also started to make me wonder if I’m too trusting of other technology that I use. Or is it just lack of understanding, the fear of others, and perceived lack of control around AI tools that fuel my distrust of these new tools? Perhaps I am a bit too trusting of the current technology and tools that I use, but I’ve trusted both social and legal contracts thus far to mitigate most of that risk, and cover the rest with deeper knowledge and understanding on my part. Because legal contracts are certainly far from reality in the case of AI, and social contracts are only as strong as the number of people in support and upholding them, which for now is far less by comparison to other tech and tools, I need to mitigate risk and build the trust mainly through thinking through my fears and understanding the tech and tools.

This is taking time. Time that I am currently privileged to have. Now it’s the time to just choose a tool and take the plunge. Should I use GitHub’s Copilot because I’m already a GitHub user? Or should I take a chance on one of the newer tools and companies like Tabnine or Codieum? A large part of me wants to move away from Microsoft (a ChatGPT major investor and owner of GitHub) and take a chance on and support one of these smaller companies, and spread more of the technology risk around. I think I will create a follow up article on the tools I’ve mentioned and a few more I’ve been finding. I’ll share more information about them there and how they work that I will use to ultimately decide which tool I want to use.

Going down the rabbit’s hole, just like Alice in Wonderland
To be continued…

Update: I chose one! Read about it here

--

--