Unhack all the code!
We are a tiny startup on a mission to make the world’s code safe. Code safety requires constant vigilance, expertise of obscure corner-cases, and a lot of time and patience. It is tedious and error-prone. Our key observation is that many different developers find similar bugs and perform similar fixes in different codebases. Unhack breaks this repetition by generalizing one developer’s fix to similar bugs in other codebases.
As a developer, I strive to write code that I am proud of, and that other people enjoy building upon. But, even after many years of programming, writing good code fast is still a struggle.
At each moment, I not only think about the actual problem I am solving, but I also have an extra thought-thread that checks for the many ways in which my code could wrong, or could be improved:
- Is there a better API for this?
- Could this be a security vulnerability?
- What is the right idiom in this language?
- Am I following the team/project’s conventions?
- Is it dangerous to touch this configuration?
- Am I hurting performance?
I answer each of these questions by mentally pattern matching against a long list of things I learned over time. But the list of mental code patterns is always too short, there are always things that I do not know about, or things that I simply forgot about. Furthermore, this extra thought thread slows me down and distracts me from the problem I set to solve in the first place. And slowing down is not usually an option.
“Move fast and break things.”
The motto that gives quick results. And pain. It trades short-term gains for long-term fragile code, which slows down development, hampers performance, and causes security vulnerabilities. Facebook finally realized this, changed their slogan to “Move fast with stable infra”, and threw many developers at the problem. Smaller companies aim for the same but this is hard to achieve with limited resources.
So, with limited resources, what can we do? One could argue that we should accept the status quo. That it is natural for human developers to spend time pattern matching and fixing code, that this is simply what a dev does, that there is no way around it. I always found this hard to accept.
If you look carefully at many of these patterns, you come to realize that:
- There is a long tail. The extremely common problems are handled by linters and refactoring tools. But, from fixing bugs, to upgrading to the latest library or framework, to issues that pop up during code review, there is a long tail of code problems that are not addressed by any tool.
- Most patterns are quite simple. They can be explained in a few words. The fact we have humans check for simple code problems is almost an insult to human intelligence.
The solution is to teach machines to fix our code. There are still technical, process, and mindset challenges to overcome. But comprehensively checking a long list of patterns is precisely what computers are good at. We should do it!
This drove me to create Unhack. I wanted something that takes care of the details and allows me to focus on what’s important.
What is Unhack?
Unhack is a platform that makes it easy and affordable to improve code safety by automating away repetitive code fixes. It is composed of:
- a new code transformation language, based on matching/rewriting logic, specialized for fixing code in programming languages
- a serverless, scalable cloud runtime that analyzes and learns from code
- a GitHub App with a web interface for monitoring a repo and fixing code
This article focuses on the GitHub App and the UI/UX. I will follow up with articles detailing the underlying language and our infrastructure.
How is Unhack different?
Code analysis, linting, refactoring, etc. have been around for ages. A few have gained traction, but their use is still limited to enthusiasts and devs with enough experience to care about code quality, and enough time to go through convoluted setup processes. Extending the tools to solve project or team-specific problems is even harder, often requiring deep parsing/compiling etc. expertise. So, how is Unhack different?
We started from first principles with the aim to make fixing code painless. We strip away accidental complexity from all layers: setup takes 1 minute, applying a fix is a single click, and extending the tool does not require any parsing/compilers expertise.
Technology: Rewriting Logic and AI
We developed our underlying technology with support from the National Science Foundation, and it builds upon cutting-edge research on:
- program verification and rewriting, from the FSL lab at UIUC,
- code translation, with people from IBM T.J. Watson, and
- program transformation inference, with people from Samsung Research America.
User Experience: Obsession with simplicity and flow
Flow is essential for productivity and, well, happiness. So Unhack aims to never break it. This is the reason we chose our first integration to be a web interface as opposed to, say, a code editor plugin. Having the editor highlight fixes while writing code distracts attention and breaks flow. Once code is committed, the dev is ready to switch context to code improvement. We will integrate with code editors once we figure out how to not break flow.
Common actions should be fast and easy. If something can be done in one step, it should be done in one step. If something requires the dev to remember to check something, or remember a sequence of steps, then we can do better. All of this means that we spend an unhealthy amount of time obsessing over UX and UI choices. We want the dev’s interaction with the tool to be smooth and flawless. We’re not there yet, so your criticism is more than welcome.
Extensibility: Everyone is welcome
Extending traditional code analysis and transformation tools is the exclusive realm of experts, of devs who spent months to years tinkering and learning how program analysis works, and the API and quirks of particular tools. But it does not have to be this way. Most of that effort is just translating common programming concepts into their equivalent program analysis concepts (or the five different names of the same concept — we’re good at coming up with names). Most of the complexity is accidental.
Unhack avoids the accidental complexity and allows non-expert devs to create their first automatic code fix in under one hour. The solution is a new language built from the ground up for making program transformation easy, together with an AI system that “speaks” the language.
Compared to CodeQL
GitHub recently acquired Semml, along with its CodeQL analysis engine. So, how is Unhack different?
Unhack and CodeQL are actually similar in several ways:
- Both aim to improve code quality using static analysis.
- Both allow querying code in a declarative way.
- Both are available as a service.
The main differences are:
- Unhack allows also declaring concrete code fixes for the identified problems. CodeQL only allows querying.
- Unhack is more concise. Unhack has been built from the ground up as a tool for querying and fixing code. Unhack builds upon rewriting and matching logic, which have proves very successful for software verification. CodeQL builds upon Datalog.
- CodeQL is much more mature and has many more rules. Unhack is just starting up.
Compared to ESLint
First, hats off to ESLint. We use it as well. It is designed well and has matured into a comprehensive tool with many rules and a great ecosystem of plugins.
There is some overlap between the tools, but also several differences:
- ESLint is a library. Unhack is a platform.
- Unhack is focused on more complex fixes, and less on formatting or style.
- Unhack was created with fixes in mind. It aims to provide fixes in most, if not all, cases.
- Unhack fixes are declarative and more concise.
- ESLint is open-source (Unhack will be too), it is more mature, and has a larger rule set.
There is a lot of room for collaboration. In particular, we plan to:
- Make the Unhack service available as a plugin for ESLint.
- Show ESLint
--fixresults in the Unhack GitHub app web interface.
How does Unhack work?
The GitHub integration is similar to a CI tool. It uses a status check to inform the developer when it has found code improvements. The authentication is done via GitHub Apps, which allow fine-grained security permissions. Unhack only requests the minimal security permissions necessary to do its job. More precisely, it asks the user’s email address and read/write access to the specific repositories it is installed on.
It takes 30s to get up and running. Go to app.unhack.ai and authenticate via GitHub. This leads you the GitHub App page where you can choose which repos to install Unhack on. Once installed, select a branch that you want analyzed and Unhack kicks into high-gear.
Over time, while developing the platform, we used Unhack to make small contributions to open-source projects, and many of them have been accepted. The code was generated with earlier, WIP, versions of our language so they sometimes contain formatting imperfections.
Free for open-source
Unhack is free for open-source projects, and we plan to keep it this way.
For commercial products, Unhack is currently completely free. We will transition to a freemium model in the next few months.
Unhack itself is closed-source for now, but we plan to gradually open-source the language, execution kernel, and rule set over the next year.
Try it out: app.unhack.ai
Feedback is welcome! In particular, tell us what you want fixed in your code and we might add it the same day. Our current rule set is small but growing quickly.
This post focuses on the UI/UX of Unhack. In the next few weeks I will follow up with articles detailing our infrastructure, showing other features, and revealing the underlying new language.