Unlocking AI Assisted Development Safely: From Idea to GA
Sam Wang | Sr. Technical Program Manager; Joe Gordon | Sr. Staff Software Engineer
At Pinterest we are continuously looking for ways to improve our developer experience, and we have recently shipped AI-assisted development for everyone while balancing safety, security, and cost. In this blog post, we share our journey of unlocking AI-assisted development, from the initial idea to the General Availability (GA) stage. Join us as we delve into the opportunities, challenges, and successes we encountered along the way.
Like many companies, we initially disallowed the use of Large Language Models (LLMs) until we thoroughly evaluated their legal and security implications. During that time, many engineers expressed interest in adopting AI-assisted development and began using it for personal projects on the side, making them eager to use it at work as well.
To determine the true potential of AI-assisted development, we needed to evaluate the impact and benefits while also identifying and addressing any significant risks and concerns associated with its implementation.
The first decision we had to make was build versus buy. While Pinterest possesses extensive in-house AI expertise and builds many of our developer tools, we recognized developing everything from scratch was not essential to our core business. Opting to buy a vendor solution allowed us to expedite this process and provide our engineers with a polished experience with plenty of great Integrated Development Environment (IDE) integration. After careful consideration, we chose GitHub Copilot due to its feature set, robust LLM, and fit with our existing tooling ecosystem.
As with any new technology, the adoption of AI-assisted development comes with its fair share of risks and concerns. Addressing our concerns and risks required working cross functionally with numerous teams throughout the company. The agility of the Pinterest Engineering team was really on display as we were able to scrappily pull together engineers from multiple teams outside of a regular planning cycle to execute. During every planning process we always make sure to set aside some time for unplanned items, as we have learned things can move quickly and we cannot plan for everything in advance.
We conducted a trial program to gather both qualitative and quantitative feedback on the usefulness of Copilot. While many companies ran trials of fewer than 30 people over just a few weeks, we decided to run our trial with around 200 developers over a longer duration. This was done to include developers in the journey and give folks an opportunity to try something cutting edge even if we ended up going in a different direction. This larger cohort also allowed us to ensure we had significant populations across various developer personas. Running the trial over a longer duration helped us control for the novelty effect and other measurement issues. Of the 200 or so participants about 50% used vscode, with many team members using jetbrains IDEs as well. The breadth of supported IDEs accelerated Copilot adoption.
To evaluate the trial we leveraged all our prior work on how to think about and measure engineering productivity, and applied it here. We looked at both qualitative and quantitative data — and spent time sampling real-time user feedback. Qualitative sentiment feedback was collected weekly through a short slack bot based survey; previously we noticed that slack based surveys have higher completion rates than email based surveys, so we wanted to meet developers where they spend more time and reduce friction for them to share feedback. Getting good qualitative measurements was slightly more complex. Our approach was to compare the relative change over time for the trial cohort vs a control from prior to the Copilot trial. Running the trial for longer than just a few weeks helped us isolate external temporal influences like holidays etc.
In close collaboration with our legal team, we ensured that our usage of AI-assisted development adhered to all relevant licensing terms and regulations. Furthermore, in partnership with our security team, we conducted a thorough assessment of the security implications posed by AI-assisted development. We aimed to ensure that the code produced by Copilot remained within our control and was not employed for training future LLM models.
Additionally, we placed high priority on preventing vulnerabilities in our codebase. Our security team leveraged vulnerability scanning tools to continuously audit all code introduced by both Copilot participants and non-participants. This comprehensive approach enabled us to effectively mitigate potential risks to our robust security posture arising from AI-assisted development practices among our engineers.
Expanding Towards General Availability:
Qualitatively, we used a short net promoter score survey to gather feedback. Early NPS results were really positive (NPS of 75), and we watched these increase as the trial continued. Our quantitative data was equally impressive supporting the feedback we heard that Copilot was helping our teams be more productive. This overwhelmingly positive feedback included comments such as “Over time, Copilot has been giving better suggestions according to the work I am doing.” and ‘“Copilot was particularly useful when I had to make a change in Scala, a language I am not familiar with. Being familiar enough with general language concepts, I could let Copilot take care of the syntax and still feel confident that I understood its suggestions.” Based on this positive feedback we made the decision to expand access to Copilot to all of engineering in advance of our annual Pinterest Makeathon, which of course was very AI focused this year. Since our moving to General Availability, to increase Copilot adoption we ran training sessions, streamlined the process to get access to Copilot through integration into our access control and provisioning systems, and partnered with our platform teams to help folks understand how to best take advantage of Copilot in different domains such as web, API and mobile.
The impact of our efforts spoke for itself. Ultimately, we unlocked AI Assisted development safely from idea to scaled availability in less than 6 months, increased user adoption by 150% in 2 months — with 35% of our total developer population using Copilot regularly. This means according to the Technology adoption lifecycle we are well into the early majority phase of adoption.
Moving forward, we are dedicated to further improving the quality of Copilot suggestions by incorporating fine-tuning with our Pinterest source code, and continuing to ensure that as our teams leverage these technologies to go faster — we also do so safely by not introducing more bugs or incidents. We also know that this is just the beginning, with the rapid development of AI Assisted developer tools, we are constantly evaluating new opportunities to build, buy and incorporate new technologies to drive improvements to our developer experience and increase developer productivity — to achieve our goal of enabling every developer at Pinterest to do their best work.
Acknowledgements:
This work would not have been possible without a huge group of people working together over the past few months. We’d like to thank Shriman Gurram, Scott Hebert, Mark Molinaro, Amine Kamel, Andre Ruegg, Nichelle Carr, Roger Lim, Brandon Black, Kalpesh Dharwadkar, Orna Toolan and Anthony Suarez
Additionally we’d like to thank all our trial participants for their support and feedback.
To learn more about engineering at Pinterest, check out the rest of our Engineering Blog and visit our Pinterest Labs site. To explore and apply to open roles, visit our Careers page.