A traditional digital audio workstation (DAW) is a monolithic piece of software. You download and install a single application that provides everything you need to produce music — from the user interface all the way down to the audio engine. Internally that application will, of course, be well structured such that different parts of it deal with different jobs; the user interface code deals with drawing buttons and labels while the audio engine deals with loading and transforming audio. However, the user only ever uses it as a whole that is typically working together all at once.
That all-together-ness makes it hard to stick to rigid boundaries between components. It’s quite easy for parts of the running system to know a lot about other parts. For example, is drawing a peak meter a job for the user interface or the audio engine? The data belong to the audio engine, but the drawing belongs to the user interface. It’s highly likely that the two components will communicate with each other by sharing a piece of memory. In other words, they probably share state.
In a distributed piece of software several components communicate with each other to solve a problem — in our case recording audio. In this model, sharing memory is impossible because the components are completely separate. Instead, we have to send messages to communicate everything needed to work together.
To re-quote a common saying: “we don’t communicate by sharing, we share by communicating.”
This difference, though perhaps subtle, has some profound implications. Firstly, it adds a whole bunch of complexity. In the traditional case, one component looking at the state of another is a really good idea in terms of performance and simplicity. In the distributed case, packaging up data for everyone else into messages and transferring those messages is plainly more complicated.
In our peak meter example, something with access to the audio has to process that raw data into some intermediate form, then make it available to the user interface via some communication channel. The user interface has to know where its data is coming from and receive the right bits. That could be done like downloading map tiles in Google maps — the client knows what view it is showing and asks for the appropriate tiles at the appropriate zoom level.
So, what do we gain for all this added communication complexity? If we’ve structured our distributed application well, it means we can connect any number of laptops, phones, tablets or even bespoke hardware like foot pedals to the same audio session. Each device can offer the same functionality — there’s no special master user interface that has special access to what the audio engine is doing.
What’s more, with a distributed system we can peer together audio engines to synchronise data between them. This means we can share sessions across the internet in a very natural and cohesive way. Instead of adding a sharing layer on top of a user-interface-centric, monolithic application, we build a system that has sharing by communicating at its very core.
Lastly, because everything is handled by streams of messages going back and forth between components, we can start to do something else quite neat: we can use the same kind of messages to save the state of our sessions as we do to share live updates. This blurs the boundary between real-time session sharing and distributed version control, and lets us share changes with other people who aren’t in our session right now.
As a simple example, imagine I’m offline whilst you re-record some vocal on a project we’re collaborating on. You can share those changes and I can apply them to my copy of the project to hear how they sound when I’m next online. Information can flow between our audio engines asynchronously.
These three features — connecting with multiple devices, sharing sessions across the internet, and version control — all need to be backed by a well-structured distributed system. And that’s what we’re trying to build — a distributed audio workstation.