Native Messaging as bridge between web and desktop

Daniel Belz
fme DevOps Stories
Published in
5 min readJan 15, 2022

--

During my latest customer project we had the requirement to bi-directional communicate between a browser extension and a native application running on the machine of the user.
The first idea which came to mind was building a websocket server which runs local only and responds to the requests of the extension.
It then turned out that this wasn’t a feasible solution due to security concerns of the customer.
In need of another solution we then stumbled upon the term “Native Messaging”. This article is therefor about my adventures with it and how we implemented it for Microsoft Edge on Windows.

Note: Parts of the implementation differ between operating systems or/and browsers, so please take a look at the respective documentation as well. I will try to mark the parts which differ.

So what is “Native Messaging”?

Native messaging is a Web-to-App communication mechanism supported in all modern browsers (Firefox, Chrome, Edge) to exchange UTF8-encoded JSON messages between a browser extension and a native host application.

You may wonder: Why should I want to do that?

Well because the host executable runs outside of the browser sandbox and is therefor capable of things that an extension simply can’t do. Things like:

  • showing a native user interface
  • making network connections
  • reading and writing files
  • calling operating system API’s

In our case we wanted to implement a checkin and checkout feature and also had the requirement to call and automate some external applications like Microsoft Outlook via COM Interop.
Because normal web pages can’t communicate directly with a registered Native Messaging hosts they must use the message passing API to instruct the extension to communicate with it instead. This restriction adds complexity, but also increases security.

Registration process for a Native Messaging host

To register a new Native Messaging host we first need a JSON manifest file. A typical manifest could look like this:

{
"name": "com.company.product",
"description": "product description",
"path": "absolute executable path",
"type": "stdio",
"allowed_origins": [
"chrome-extension://extension_id/",
"chrome-extension://extension_id/"
]
}

As you can see the manifest contains a property called “path”¹ which contains the path to the executable which should be launched.
Another important property is “allowed_origins”² which contains a list of extensions which are able to call this host.

Note ¹: Under windows a relative path can be used. On OS X and Linux you can only use a absolute path.
Note ²: The layout of the manifest differs between Chrome and Firefox. Please take a look at the respective documentation.

The next step is to create a registry key under \Software\Microsoft\Edge\NativeMessagingHosts\with a default value which points to the previously created manifest file. You can either create it under the HKCU or HKLM hive. A key under the HKLM hive requires admin privileges and is normally used in enterprise scenarios (policy to disallow user-level defined Native Messaging hosts). When defining a key in the HKCU hive it takes priority over a key defined in HKLM.

To add a key under the HKCU hive launch a cmd session and execute the following command after you adjusted it for your needs:

REG ADD "HKCU\Software\Google\Chrome\NativeMessagingHosts\com.company.product" /ve /t REG_SZ /d "path_to_manifest_file" /f

After that and restarting your browser the “registration” process is finished.

The Native Messaging host

So now let’s start with how the workflow would look like when the Native Messaging host is launched. Please take a look at the diagram:

¹ Triggered via a call to chrome.runtime.connectNative or chrome.runtime.sendNativeMessage

As you can see in the diagram the extension is the part which initiates the launch process of our host. Either via a call to chrome.runtime.connectNative or chrome.runtime.sendNativeMessage.
The difference between these two calls is that connectNative is used when we want a persistent connection because we need to exchange multiple messages.
On the other hand sendNativeMessage can be used for one-time-requests. Both methods have a parameter to include the unique “extensionId” to tell the browser which host we want to launch. The browser then verifies that the extension has the “nativeMessaging” permission in its manifest and that a host with the unique “extensionId” is registered.
If both cases are true the browser spawns the host as a separate process and passes two command-line arguments to it (origin extension, handle to the browser).
A communication channel between the extension and the host is now established and can be used to exchange messages.

The next important piece of the Native Messaging puzzle is to understand how the protocol works and which restrictions we have.
From the native point-of-view messages are sent via standard input (stdin) and received via standard output (stdout).

Messages are sent as UTF8-encoded JSON preceded by a 32bit unsigned length in native byte order.

In our application the message that initializes the checkout process looks like this:

{
"id": "unique request/response id",
"command": "checkout",
"transfer": {
"id": "unique file id",
"filename": "filename.extension",
"chunk": "part of the file content as base64",
"status": "sending"
}
}

When the file is created and the chunk is appended we need to give a result back to the extension. So our answer to this request looks like this:

{
"id": "unique request/response id",
"status": "success"
}

Or in case of an error like this:

{
"id": "unique request/response id",
"status": "error",
"errors": [
"unique_error_key",
"unique_error_key"
]
}

As you can see above we are sending file splits. This is because the maximum message size for the host is capped at 1 MB. The extension is able to send messages up to 4 GB in size. Because the concept is not bound to a specific platform you can implement hosts in all programming languages and on all operating systems which support stdin and stdout. You can find several samples for it on GitHub.

The Native Messaging extension

Every packaged browser extension contains a manifest file with different properties which describe the extension like Name, Version, Description, Permissions, etc.. In order to be able to invoke a Native Messaging host the extension needs the “nativeMessaging” permission:

{
"name":"extension name",
"version":"0.0.0.1",
"manifest_version":2,
"description":"extension description",
"icons":{
"32":"icons/logo-32x32.png",
"64":"icons/logo-64x64.png",
"128":"icons/logo-128x128.png"
},
"background":{
"scripts":[
"js/background.js"
]
},
"permissions":[
"nativeMessaging"
]
}

As already pointed out before the host launch is either triggered via a call to chrome.runtime.connectNative or chrome.runtime.sendNativeMessage. A call to chrome.runtime.connectNative returns a port object which offers some events to which you can subscribe and also a postMessage method to send a message to the other end of the pipe.

Security concerns

Because Native Messaging is only working in the bounds of the machine the security risk is minimal when the host was carefully implemented.
An attacker has to be already on the machine to be able to abuse it.

The Native Messaging API is a great method of bridging the gap between the web and desktop world. It is a quite powerful but also pretty primitive way to communicate bi-directional.
Depending on the functionalities you may need to implement a custom protocol on top of it.

I hope you liked my introduction post about Native Messaging which was also my first post on medium.com 😊!

--

--