My AI alignment project: fixing open source issues

Anthony Duong
5 min readJun 11, 2024

--

For the last 12 weeks, I’ve participated in the AI Safety Fundamentals AI Alignment Course, which involved a 4-week project and producing a publicly shareable output. My project was fixing open issues in TransformerLens, and my publicly shareable output is this: a blog post about what I chose this project and what my experience was like contributing to open source for the first time.

Why I chose this project

Some high-level reasons

When I first brought up the project idea to my cohort, I got a mix of reactions, from confusion, to curiosity, to excitement. I usually take this as a good sign: that I have an idea that not a lot of people have thought of and which could be very useful. I also hadn’t heard of anyone in the course choosing a project like this, and so I thought this could allow me to distinguish myself, and find a unique way to provide value. Finally, I’m a software engineer, so engineering is what I’m good at and enjoy most, and I was naturally excited about the project.

More-specific reasons

These were the most important reasons, because my goal was to make it as easy as possible for myself to just get started on building in public and learning core skills, rather than to do something novel.

I could actually finish it

Since we had three weeks to develop our project, I asked myself, “realistically, if I dedicate the next three weekends to this project, what could I get done?”. At work, my instinct is to estimate how long something would take by how many PRs I’d have to get merged, so naturally, I thought “on a new project, conservatively, one PR per weekend”. So my goal became three PRs, at which point I realized that “fixing some issues in an open source repo like TransformerLens” could be a very doable project.

It could easily be broken down into smaller pieces

Or rather, the project was built up from smaller pieces (issues) that had already been broken (opened). Not only would this make it easy to feel a sense of progress (for example, by saying that two of my PRs had been merged in the last two weeks), but it made it less stressful to know that if I managed to only have merged, say, a single PR, that would still be valuable for me and others.

I could focus on doing

The course stated that “some participants spent most of their project time planning and researching what to do, rather than executing”. I wouldn’t need to spend any time on planning and researching, since all I’d have to do is set up the repo and pick some issues before I could start working on them. Figuring out what to do had mostly been abstracted away. The most creative part of the project was that I didn’t try to be creative at all.

I could learn by doing

I find trying to learn everything from courses and textbooks, before doing real things, a suboptimal way to learn. There’s nothing quite like having to use an idea to achieve some goal in the real world, that’ll force you to understand it. I also find that when the time between learning something and applying it is long, the learning is unmotivating, and that once it’s time to apply it, I’ve forgotten so much of it. Instead, I’d get my hands dirty, gaining constant exposure to things like PyTorch and transformers, and be able to reverse engineer a lot of what I hadn’t learned. I could learn about these in parallel, or even afterwards, to fill in gaps.

I could actually provide value

Fixing open issues in TransformerLens as a project was an idea I got from Neel Nanda. Someone found it funny that one of the 200 Concrete Problems in Interpretability was, as he put it, “cleaning up my codebase”, but I seriously think there’s truth to this. Software engineers know things like this matter, and I believe that if you want to provide value, do the things no one wants to do. TransformerLens was used by ~160 people, and people went through the effort of opening the issues, so my project would be useful to them, at least. Doing useful things has intrinsic value, but for me, it also has instrumental value. I find it much more motivating to work on projects that people use, than on toy/personal projects.

I wanted the course to force me to start

Contributing to open source was at the top of the list of things I wanted to be doing in my life, but I’d been putting it off. But with a deadline, I’d have to just start. And since I’d meet weekly with a cohort, I’d have to make progress every week. I also knew that with this project, my publicly shareable output would need to be a blog post, and writing blog posts was also at the top of my list, so this course would force me to do multiple things I wanted to do but had been putting off.

What my experience was like

An unexpected benefit has been the opportunity to share my experience and encourage others who’ve also long wanted to contribute to open source but found it daunting to get started.

It is a bit daunting. At work, you’re contributing to private repos, and so only your organization will see your work. This is a small subset of all the people on the internet, and you typically know them well. With open source, anybody on the internet can see it.

When you do start, it’s relatively challenging. Like at work, you’re contributing to a large codebase. Unlike at work, you’re working with strangers on the internet, and so you don’t have the same access or comfort level when it comes to asking questions. You need to “just figure it out” more, when it comes to setting up your environment and working with the codebase.

But when you do successfully contribute, it’s really rewarding. It was such a great feeling to get the first email from GitHub saying a maintainer had merged my PR. It was the confidence that without any help, I could do all the setup work, navigate a large codebase, and start contributing. I’d always envisioned a life for myself where I was working on open source projects outside of work, further building my skills, building more useful things for people, and showing what I’m able to do in public. This was the feeling of knowing that that life had just started.

I’ll definitely be contributing (and writing blog posts) more in the future. I’ll let you know how it goes!

--

--