File transporting service in Go — part 1

We’ve noticed a trend when building applications that integrate external systems as data sources: companies often have an existing solution for transferring data and it rarely involves HTTP. Data is transferred as XML or JSON files over SFTP, FTP or god forbid AS2. Creating a solution separately for each project got tiring after the second one, not to mention the horrible hacks that were implemented to monitor the upload folders and the security concerns of giving a third party SFTP access to a production server, even if it’s chrooted.

The idea of a generic service that takes in files via any number of transport protocols and forwards them over HTTP was born. It would act as a bridge between the external systems and our applications, so that our applications wouldn’t have to worry about anything other than processing data uploaded in a HTTP request and those external systems would only have access to the non-critical “bridge” server.

All internal generic services should have a catchy name. Since the service transports files from other protocols to HTTP endpoints, it should be called Shuttle! 🚀

Choosing a language

The service should be robust. It should require little to no maintenance even if running over extended periods of time, be able to handle error conditions gracefully and have some type of compile time safety guarantees. Performance is not critical but it never hurts.
Due to the nature of the service, the language should have easy access to the file system and good libraries for things such as inotify. Concurrency should be easy to implement since files can be uploaded to the service concurrently and they should be forwarded concurrently, i.e. one large file shouldn’t block the transfer of ten small files.

We’ve mainly used three server-side languages at Taito United: Node.js, Python, and Go. Out of these three, Go seemed like the best fit. It was previously used to create a background worker that handled jobs such as downloading files and storing them in S3. Go in general is very stable, performant and has static typing. It’s also much lower level than the other two languages and provides amazing concurrency support. The only downside is that I haven’t been able to convince coworkers to pick it up yet.

System design

The first iteration should support multiple users, uploading files over SFTP and FTP, and forwarding the contents of those files using a POST request to a per-user configurable endpoint. For the first iteration I decided to leverage the underlying operating system as much as possible:

  • Shuttle users are created as Unix users by running Ansible playbooks.
  • Endpoints are configured by having a file named .endpoint in the user directory with the endpoint URL as the contents. These files and the directory structure are also created by Ansible.
  • SFTP and FTP are provided by the underlying system independent of Shuttle, Shuttle simply monitors the user directories for new files and uploads them.

Adding an SFTP service was simple enough since we already have the OpenSSH server installed for SSH connections and it supports SFTP out of the box. For the FTP service I chose ProFTPD since it seems to be the most configurable FTP server as well as being the one most similar to the SFTP server regarding chroots.

Users are chrooted to their respective directories and because of the way chroots work, the root folder of the chroot must not be owned by or writable to the chrooted user. That forces our hands regarding the directory structure, and I ended up with the following structure:

/chroot/
└── [user]
├── .endpoint
└── files
├── [file 1]
└── [file 2]

The chroot path is set as /chroot/[user]/ and the user’s home directory is set as /files which works out perfectly when connecting using an FTP or SFTP client — the user starts off in his own files directory which is where we want the files to go.

Moving on to the program itself.

Shuttle design

Shuttle uses the awesome fsnotify library to monitor for changes in the user file directories. However fsnotify does not support the IN_CLOSE_WRITE event, that is crucial to knowing when a transfer has finished, because there is no cross-platform IN_CLOSE_WRITE event. The solution was to fork fsnotify and add support for IN_CLOSE_WRITE as we are only running Shuttle on Linux.

On start the program goes through folders in the chroot folder, looking for a .endpoint file and a files folder. If both are found, the path to the files folder and the contents of the endpoint file are added as a Route.

After discovering the routes, a fsnotify watcher is added for each of the directories contained in the routes. An fsnotify event contains the type of the event and the path to the file (or directory) that triggered the event. Whenever an IN_CLOSE_WRITE event is fired, the program finds the appropriate route for the event’s path, creates a Shuttle with the route and the path of the file, and pushes it onto the Queue.

Because we are only monitoring for file changes, a crash in the application or the server itself would leave ghost files in the user file directories that the application would never see. This is why whenever a shuttle is added or removed, the list of all shuttles that have not yet reached their destination is saved to a file using encoding/gob and loaded on start. That is, if the uploaded file is noticed once by the application, it is guaranteed that the file will eventually reach it’s destination.

Shuttle launches a configurable amount of workers on start. These workers listen to the queue and grab shuttles off of it. The worker creates a HTTP request with the file contents in a multipart/form-data and fires it off. If the request does not succeed or returns a non-200 status code, the shuttle is pushed back to the queue after 5 seconds to be retried.

Conclusion

Go was a very good choice for the application, especially with it’s simple concurrency model and awesome standard library (shoutout to path/filepath!).

Currently we are trying out Shuttle in our newest project where we hope that it will make the process of file transfers much simpler while keeping the security aspects in mind.

Finally, stay tuned for next part where we make Shuttle completely self-contained by implementing the FTP and SFTP servers in the application itself!

We are hiring!

Do you want to build exciting and interesting apps with modern web technologies? Check out our open positions and join us at Taito United!

Can’t find positions directly suitable for you? Feeling adventurous? Send your resume and an open application to jobs@taitounited.fi — we are more than happy to hear you out and see what you got cooking 🍳