How to kill a dragon: Rewriting your app to Golang
Let’s imagine that your application is written in some scripting language — e.g. Ruby — and you want to rewrite it in Go. You may ask a reasonable question: what is the point in rewriting a program that is up and running?..
Why?
Firstly, let’s assume that your program is connected to a certain ecosystem. In our case, the ecosystem includes Docker and Kubernetes. The entire infrastructure of these projects is written in Golang. By rewriting our program in Go, we get access to libraries used by Docker, Kubernetes and other projects. Basing your product on the existing infrastructure of main projects, you facilitate its support, development, and refinement. You’re getting immediate access to all new features without the need to rewrite them to another language. In our particular situation this reason alone was enough to decide on changing the language and to choose the language. Besides, there are some other advantages…
Secondly, we admire the simplicity of installing Go applications. There is no need to install RVM, Ruby, a set of gems, etc. You just download single binary file and run it.
Thirdly, Golang programs tend to work faster. And I am not talking about a significant increase in speed in any language which comes from using appropriate architecture and algorithms. It is about the boost that you can literally see and feel when running your program in your shell. For example, running --help
in Ruby takes around 0.8 sec. The same command in Go is executed in 0.02 sec. It’s easy to see a noticeable improvement in user experience.
Main challenges
Okay, you can just write new code from scratch, making it completely isolated from the old one. This way you immediately face some restrictions and difficulties in terms of the time and resources allocated to this development:
- The current version of the program in Ruby is in constant need for improvements and corrections:
- Bugs occur all the time, and they have to be fixed promptly;
- You can’t stop implementing new features since they are often demanded by the users/customers.
2. Maintaining 2 codebases at the same time is difficult and expensive:
- The team of 2–3 developers is insufficient, given the existence of other projects (in addition to our existing application in Ruby).
3. Additional requirements for the new version:
- There should be no significant degradation in functionality;
- Ideally, the transition should be seamless and hassle-free.
So, transitioning must be continuous and smooth. But how can you do this given that the Golang version of your app is being developed as a standalone, independent program?
Simultaneous development in two languages
What about the bottom-up approach? You can start with the basic, low-level components, and then proceed to higher-level abstractions.
Imagine that your program consists of the following components:
lib/
config.rb
build/
image.rb
git_repo/
base.rb
local.rb
remote.rb
docker_registry.rb
builder/
base.rb
shell.rb
ansible.rb
stage/
base.rb
from.rb
before_install.rb
git.rb
install.rb
before_setup.rb
setup.rb
deploy/
kubernetes/
client.rb
manager/
base.rb
job.rb
deployment.rb
pod.rb
Porting a component with functions
Let’s examine a simple case: the existing component, e.g. config
(lib/config.rb
), that is rather isolated from the rest of the bunch. This component has a single function, Config::parse
. It takes the path to the config file as an input, reads it and returns a complete structure. In this case, we will implement this function as a separate Go binary (config
) with a corresponding config
package:
cmd/
config/
main.go
pkg/
config/
config.go
The Golang binary reads arguments from the JSON file and return the result to the JSON file:
config -args-from-file args.json -res-to-file res.json
The config
can output messages to stdout/stderr (our Ruby app always outputs to stdout/stderr, so there are no additional output parameters).
Calling the config
binary is equivalent to calling some function of the config
component. The name of a function and its parameters are specified in the args.json
file as arguments. The result of the function is written in the res.json
. If the function returns an object of some class, then the data of the object of this class is returned in a serialized JSON format.
For example, let’s use the following args.json
to call the Config::parse
function:
{
"command": "Parse",
"configPath": "path-to-config.yaml"
}
We’ll get the following result in the res.json
:
{
"config": {
"Images": [{"Name": "nginx"}, {"Name": "rails"}],
"From": "ubuntu:16.04"
},
}
Our config
field will get a state of Config::Config
object in a serialized JSON format. Now we have to construct Config::Config
object using this state on the caller (Ruby) side.
In case of some foreseen error, the binary might return this JSON:
{
"error": "no such file path-to-config.yaml"
}
The error
field must be handled by the calling side.
Calling Golang from Ruby
We have to turn the Config::parse(config_path)
function into the wrapper on the Ruby side. It calls our config
, gets the result, and processes all possible errors. Here’s an example of Ruby pseudo-code with some simplifications:
module Config
def parse(config_path)
call_id = get_random_number
args_file = "#{get_tmp_dir}/args.#{call_id}.json"
res_file = "#{get_tmp_dir}/res.#{call_id}.json" args_file.write(JSON.dump(
"command" => "Parse",
"configPath" => config_path,
)) system("config -args-from-file #{args_file} -res-to-file #{res_file}")
raise "config failed with unknown error" if $?.exitstatus != 0 res = JSON.load_file(res_file)
raise ParseError, res["error"] if res["error"] return Config.new_from_state(res["config"])
end
end
The binary file could crash with a non-zero, unexpected exit code, or it could exit with one of the predefined codes. In this case, we will check res.json
for the presence of error
and config
fields and return the Config::Config
object using the serialized config
field.
For the user, there are no changes in the Config::Parse
function.
Porting a component class
Let’s use the class hierarchy in lib/git_repo
. There are two classes, GitRepo::Local
and GitRepo::Remote
. It makes sense to combine them into a single git_repo
binary, and to make a corresponding git_repo
package in Golang:
cmd/
git_repo/
main.go
pkg/
git_repo/
base.go
local.go
remote.go
A call to the git_repo
binary corresponds to a call to some method of the GitRepo::Local
or GitRepo::Remote
objects. The object has a state that can change after a method is called. Therefore, we pass its current state in arguments in a JSON format. We will always get the new object state in the output (also in a JSON format).
For example, to call the local_repo.commit_exists?(commit)
method, let’s specify the following args.json
:
{
"localGitRepo": {
"name": "my_local_git_repo",
"path": "path/to/git"
},
"method": "IsCommitExists",
"commit": "e43b1336d37478282693419e2c3f2d03a482c578"
}
We’ll get the following res.json
:
{
"localGitRepo": {
"name": "my_local_git_repo",
"path": "path/to/git"
},
"result": true,
}
There is a new object state in the localGitRepo
field (which may remain the same). We should pass this state to the current local_git_repo
object in Ruby.
Calling Golang from Ruby
So, we have turned every method of GitRepo::Base
, GitRepo::Local
, GitRepo::Remote
into wrappers. They call our git_repo
, get results, set the new state of an object from GitRepo::Local
or GitRepo::Remote
classes.
The rest is similar to calling an ordinary function.
Polymorphism and base classes
We prefer not to implement polymorphism on the side of Golang. The calls to the git_repo
binary explicitly address a specific implementation. E.g., if there is a localGitRepo
in the arguments, then the call has been originated from the object of the GitRepo::Local
class; accordingly, remoteGitRepo
corresponds to the GitRepo::Remote
class. This way, we have to copy some boilerplate code to cmd, and that’s all. Anyway, this code will be deleted after our migration to Golang is complete.
How to change the state of a related object
There are situations when an object receives another object as a parameter and calls a method that implicitly changes the state of this second object.
In this case:
- When calling a binary, we need to pass the serialized state of the object (for which we are calling a method) as well as the serialized state of all parameter objects.
- After the call, we have to reset the state of the object (for which we were calling a method) as well as the serialized state of all objects that have been passed as parameters.
Apart from that, everything is the same.
So, what do we have?
The basic process is to take some component, port it to Golang, and release a new version.
When all underlying components are ported, and we transfer some higher-level component which uses them, this component might “absorb” all these lower-level components. In this case, you can delete the redundant binaries.
This process goes until we get to the very top layer that unites all underlying abstractions. This concludes the first porting stage. The top layer is a CLI. We can leave it in Ruby for a while before moving to Golang completely.
How to distribute such a behemoth?
Fine, now we have a way to port all the components gradually. However, the important question arises: How to distribute such a “bilingual” software?
The Ruby part is distributed as a Gem package. To call a binary, it downloads the required dependency at the hard-coded URL and caches it locally (somewhere in the service files).
While preparing a release of our “bilingual” app, we have to:
- build and upload all binary dependencies to some host;
- create a new version of our Ruby Gem.
There are separate binaries for each subsequent version (even if some component stays the same). Yes, you can implement separate versioning for dependent binaries. Then you don’t have to build new binaries for each subsequent version of the program. In our case, we have decided to make it as simple as it can be without having to optimize the temporary code. So we have been building specific binaries for each version of the app, despite the used space and download time.
Disadvantages of our approach
Obviously, there are some difficulties associated with the repetitive calling of external programs via system/exec
.
The global caching is difficult to implement in Go since all data in Go (e.g., packages’ variables) is created during calling some method and is deleted after its job is complete (you have to keep that in mind at all times). Yet caching is still possible at the class instance level or when we explicitly pass the parameters to an external component.
Also, you have to pass the state of an object(s) to Golang and correctly restore it after the call.
Binary dependencies written in Golang take up a lot of disk space. For example, the size of a single Go binary might be 30 MB. But what if we ported, say, 10 components weighing 30 MB each? Their total size would amount to 300 MB for every version! Because of this, you can quickly run out of disk space in the binary repository and on the host where your app is being run and regularly updated. However, you can solve this problem by periodically deleting the old versions’ files.
Also, note that downloading binary dependencies will take some time with each update of the app.
Advantages of our approach
Despite the mentioned cons, this approach allows you to organize a continuous process of porting to another language using the resources of a relatively small development team.
The most important advantage is the ability to get instant feedback on the new code, easily test and stabilize it.
And, in the process, you can add new features to your program and fix bugs in the current version.
The final transition to Golang
After you have successfully rewritten all the main components to Golang and tested them in a production environment, the time has come to proceed to the top-level interface (CLI) of your program. Now you can convert it to Go and finally abandon the old Ruby code.
At this stage, you have to solve some possible compatibility problems between your old and new CLI.
Hooray, comrades! We did it!
How we have ported werf to Golang
werf (previously known as dapp) is a tool developed by Flant that helps to organize CI/CD processes. Originally, it has been written in Ruby for several rooted reasons:
- We have had extensive experience in developing programs in Ruby.
- We have been using Chef at that time (its cookbooks are written in Ruby).
- Our sluggishness and unwillingness to use a somewhat new and unfamiliar language for developing serious projects.
Then, we have applied the approach described in this article to rewrite werf to Golang. The graph below shows the timeline of the battle between the “good” (Go, blue) and the “evil” (Ruby, red):
Amount of Ruby code vs. Golang code in the dapp/werf project over time
The additional upside is that we have used the described migration and integration with the Kubernetes ecosystem as a starting point for our other project called kubedog. Using the provided approach, we have been able to move the code to watch for K8s resources into a separate project. It can be used in werf as well as in other projects.
Since there are other solutions dealing with the same task (including kubespy, kubetail and others), we would not be able to compete with them (in terms of popularity) without Go as the basis of our tool.
This article has been written by our system developer Timofey Kirillov.