As an engineer, are you aware of the implications and risks of using generative AI?

Federico Dionisi
Think-it GmbH
Published in
11 min readMay 22, 2023
A pencil drawing of a software engineer robot working at the computer by DALL-E

We don’t need any explicit references to say that, as of today, using generative AI and large language models (LLMs) is becoming commonplace. Every day, new products are announced (Roth, 2023; Vincent, 2023), new researches are published (Zhu et al., 2023), and AI-enabled software is released (‘Open-Assistant’, 2023), all with the common topic of exploring these artificial intelligence paradigms.

As the world embraces using generative AI, we’re as well asking ourselves the same question: what is our position towards this disruptive technology at Think-it? More generally, what should the position of anyone doing contract engineering work on behalf of another organization be? What are the legal, intellectual, and ethical risks of using AI code assistants to create value for contracted partners?

We attempt in the next sections to share insights and takeaways from ongoing research at Think-it, w.r.t, to generative AI and its usage.

LLMs at Think-it

Currently, the use of LLMs within the Think-it collective is relatively limited to documentation tasks. The Growth team has already been using it to tailor messages to specific users for some time now. Our Ops team has been taking advantage of these generative tools to produce qualitative content, helping them complete their tasks. Our team of Technical Owners got help from machine learning algorithms by expanding and rephrasing content, being more productive overall.

Within the Engineering domain, we have started exploring how suitable LLMs are in supporting personal and professional growth as an omnipresent coach. Meanwhile, our Data engineering, Data science and AI Chapter is diving deeper into the implications of these algorithms in our environment with the goal of bringing together more formal usage policies that can provide new perspectives to Software Quality Assurance (SQA), namely automation tasks, documentation tasks and code reviews. Finally, some of our SDEs got curious as well and asked AI models with reasoning capabilities for quick help on engineering-related questions and tedious transactional tasks.

It’s already very interesting to see our collective members taking proactive initiative in exploring the potential applications of machine learning in our day to day. It’s important to understand ChatGPT and fiends capabilities early on, as various opinions say it will enhance and evolve many industries which are close to us; customer service, project management, and sales will all face the revolution that OpenAI unleashed (Daugherty, Wilson and Narain, 2023; Nieto-Rodriguez and Vargas, 2023; Sinha, Shastri and Lorimer, 2023).

Surprisingly, though, a very limited number of our engineers are using it for what developers worldwide use: AI-powered code assistants. As the result of a quick poll survey, only 3% of our Think-iteers are using code-enhancing tools such as Copilot.

AI code assistants

In the past year or so, AI code assistants have boomed. The GitHub Copilot release to the general public (Dohmke, 2022) has taken the industry by storm. At a similar time, Amazon announced CodeWhisperer (AWS announces Amazon CodeWhisperer (Preview), 2022), and Google announced that Googlers benefit from code assistant software developed in-house (ML-Enhanced Code Completion Improves Developer Productivity, 2022). But luckily, we don’t only have the big three wondering about pair programming with a machine learning algorithm; a long time before big names announced their initiatives, there were already a few visionaries. Tabnine, one of the first AI code assistants on the market, has survived until today (AI Assistant for software developers, no date). Funded in 2013, it predates by far even Think-it’s own Predictive Coding initiative, and now has raised a total of $32.1m (Tabnine — Crunchbase Company Profile & Funding, no date), highlighting the vivid attention and interest to the market.

After all, the benefits are palpable. In a marketing effort, GitHub led research (Ziegler, 2022) on their Copilot tool. The study explored how AI-enabled code completion systems can improve developer productivity by generating helpful code suggestions based on the context of their current activity; the outcome result suggests that the quality of suggestions is more important than their correctness (Ziegler et al., 2022). This result is interesting indeed: developers are more productive doing “small code reviews” than writing all the code on their own.

In a later blog post, Kalliamvakou (2022) went deeper into the initial research, focusing on GitHub’s meaning of developer productivity and the impact of Copilot on users. Overall, survey responders reported less frustration doing their job, the ability to focus on more important–and fun–tasks, and an increase in speed.

Biases aside, the research highlights the benefits of AI code assistants for both developers’ productivity and job satisfaction. And what strengthens GitHub’s conclusion is Accenture’s experience with Amazon CodeWhisperer; Accenture has been using CodeWhisperer to reduce development efforts by 30%, improving developer productivity and accelerating onboarding for novice developers while detecting security threats early in the development process and empowering developers to responsibly use AI to create secure and syntactically correct applications for AI and machine learning projects (Viswanathan et al., 2023).

But nothing is as simple as some results in a chart. At Think-it, we often work on multiple and diverse partners’ projects, where we handle intellectual property for their benefit. We are the medium for partners’ success and we’re good at it. To keep up with the time, it may feel obvious that we could aim even higher by subscribing our SDEs to start using generative AI and running with it. But that’s not as easy as it seems.

A pencil drawing of a robot helping a software engineer working at the computer by DALL-E

Legal, intellectual and ethical issues

Current LLMs are trained in the wild. As days pass, the field’s exploration evolves, and with it, also legal ones. The most striking news came out recently; ‘ChatGPT banned in Italy over privacy concerns’ (McCallum, 2023) is the title of the BBC article, which appeared like an April Fools joke — but it wasn’t. The OpenAI chatbot was allowed to operate in the Italian network (Mukherjee and Vagnoni, 2023), but the event still highlights the sensitivity of the problem. Privacy concerns are real, as it is currently impossible to make the OpenAI chat assistant forget anything it was told. An example of this is Samsung’s latest disaster (Lewis, 2023), in which employees leaked company secrets to the bot, which are now accessible to the world with no recourse.

This is already enough to stop and reflect; what does this mean for contracted engineers working with partners? As it stands, only one Think-it collective engineer is using tools such as Copilot–which uses OpenAI’s Codex model (GitHub Copilot · Your AI pair programmer, no date)–and is working on internal projects. We can all agree though it may be less consequential than working on a partner project, it would be detrimental if the code were leaked to the general public. This reason alone would call for a clear and agreed policy among our engineers and our partners, but there’s more.

Copilot and other generative AI software have a bigger, wider issue: they possibly infringe on copyright, and the user won’t even know (Appel, Neelbauer and Schweidel, 2023). GitHub’s Copilot specifically is a liability, as the company hasn’t yet revealed the material used for the training of the AI model (Gingerich and Kuhn, 2022). Hence, the problem is even bigger; even if a partner would agree to us using tools like Copilot, the code we produce may be classified as plagiarism, and they could be liable in case of an audit. Sure, it’s rare to hear about software code auditing for plagiarism, but given the above concerns, it may become more common in the future.

Even if it would be the partner’s problem to have taken the final decision and allowed us to work with a generative AI tool, our brand would take a hit. Moreover, ethically speaking simply washing our hands wouldn’t deliver the best experience to our partners, and it’s a selfish act going against our values of integrity and collectivism.

How to safely explore the possibilities

From an individual perspective, exploring AI is fairly straightforward. As previously suggested, different service providers offer proprietary solutions; once you agree to third-party terms of use and privacy policy, you are all set. Moreover, the open-source community is quickly growing, providing a vast range of tools, ideas and proof of concepts. But for an organization’s perspective, the exploratory investigation becomes more complicated. Hugging Face (Hugging Face — The AI community building the future., no date) is a great starting place to discover and keep updated with the latest machine learning trends. Unfortunately, models’ licensing is not always transparent, and not all the available ones are accessible for commercial use (Large Language Models for Commercial Use, 2023). If you’re an individual that’s performing work on behalf of an organization, those aspects can be troublesome.

AI is snowballing, and a pressing need for responsible AI (RAI) comes with it. Renieris, Kiron and Mills (2022) express the same feeling but also highlight the implementation’s difficulties; pursuing RAI isn’t a predefined process that should be followed, but it must be an integral part of an organization’s beliefs and behaviors. It includes top-management involvement, training where is necessary, and considering society as a stakeholder. It takes investment but gives the organization a more holistic view of the technology and its broad impact (Renieris, Kiron and Mills, 2022).

While we start to explore AI and its use, it’s therefore imperative that we keep the conversation alive among all stakeholders. Our decisions must be well aligned, and every study we make should help reinforce our understanding of what it means for us to be responsible with using generative AI. We must allow these initiatives to guide us according to our values and follow the impact we want to have on the wider society.

At the same time, we cannot ignore the benefits of using AI code assistants shown earlier (Kalliamvakou, 2022; Ziegler, 2022; Ziegler et al., 2022); but at what cost? How was the model trained? What are the terms of use? Is the developer’s code used to re-train the model? Tabnine (AI Assistant for software developers, no date) and Amazon CodeWhisperer (AI Code Generator — Amazon CodeWhisperer, no date) look to address these concerns better, but understanding their position goes behind the purpose of this research. Furthermore, while answering these questions, we as engineers must ensure that any work done on a partner’s behalf is done with awareness and agreement with using these tools. Failing to do so may put people in the unfortunate position of justifying themselves and could have future bigger legal consequences.

Different communities and groups now compare the emergence of generative AI models to previous historical disruptive inventions, such as the printing press, with the potential to initiate a new industrial revolution (Candelon et al., 2023; Cremer, Bianzino and Falk, 2023; Devlin, 2023). As time goes by and using generative AI only increases, we as engineers need to continue to understand and reflect on what responsible AI really means. At Think-it we’re taking a proactive position towards this new technology, and we’ll continue sharing our insights as we go.

References

--

--