Controlling my media center by voice

12 min readDec 21, 2016

Amazon Echo devices are popular, and I’ve spent a few dozen hours getting up to speed on how I can use them. I did the first thing everyone does when they get one and hooked it up to Spotify and commanded Spice Girls into my living room. I then connected my blinds and was able to summon darkness at will. But, uh… that was pretty much it.

Home automation is pretty much a money pit. I knew this, but the main reason why I had held off was because I didn’t want to control my appliances with some terribly designed tablet app with a screen of buttons, eg:

An image search for “home automation controller”. Ick.

I didn’t want the Echo because I didn’t need the speaker it comes with. The Echo Dot, meanwhile, was a better fit because I could hook it right into my stereo system. Err, sort of. It’s worth it to describe my setup so that you can understand my constraints:

A shiny new Samsung TV acting as a hub for all of my devices
An aging HDMI audio receiver, connected via ARC to the TV
A HTPC running Windows 10 and Kodi (media center software)
A PS4 [not relevant to this post, but ultimately also voice controlled as a TV input]

I don’t have a Fire TV or any other peripherals — my apps all run on that HTPC, which I’ve named Peter. Peter runs Windows, which wouldn’t normally be my first choice but I’ve had buttery smooth media playback with it where previous Linux (and hackintosh) builds have failed me. It unfortunately makes projects like this more difficult.

Goals

Turn my TV on / off by voice. Bonus: switch HDMI inputs by voice
Control Kodi by voice for watching movies and TV shows
Use my speakers so I can hear the Echo while simultaneously playing audio from the HTPC

I tackled this by from pulling information from various Reddit and blog posts, but unfortunately I didn’t record most of them to give credit. No single post met my needs, and for each case I ended up getting a little, uh, creative.

That’s a useful warning for you as well. Neither voice control goal was trivial to complete. I expect that even with this post and others as a resource, getting this to work for yourself may take a good amount of tenacity and cause quite some frustration.

You can read through the rest of this post, but if you just want the payoff, I’ve got a video of me turning on my TV and kicking off a TV episode:

Controlling the TV

Controlling my TV was by far the simplest operation. Steps involved here include access to the TV via its EX-Link (serial) port, using EventGhost to send serial commands to the TV, and using a Philips Hue emulator to fool the Echo into controlling the TV. [Note: see the update at the bottom of this section for a different solution]

I bought a Samsung KS8000, which is a 2016 UHD TV. That may be important, but I gather any moderately recent Samsung TV will work as I’ve set it up. These TVs come with a nifty extra port, which uses a standard 3.5 mm headphone connection. Far from driving headphones, it’s for EX-Link, which is used for remote control or possibly debugging. Sniffing around Amazon, I didn’t identify any consumer devices that operate a TV via EX-Link, so I’m guessing it’s more for debugging. It ends up just being a serial connection, which had no issues communicating with my HTPC using one of these cables. If you don’t have a serial port on your computer, or if you don’t want to risk the motherboard, you can always buy a USB<->serial adapter for dirt cheap.

Next, it’s useful to talk about EventGhost. It’s a glue application to execute arbitrary behavior given certain triggers. It has a lot of useful applications, but for our purposes ends up being great for controlling Samsung TVs via EX-Link. The EX-Link protocol itself is somewhat inscrutable, but luckily EventGhost ships with a plugin for exactly this purpose:

From here, you can fool around with a bunch of behaviors for controlling the TV. I’ve only engaged power and HDMI input switching, but you’ve got volume control and a bunch of honestly pretty scary controls for picture adjustment. Imagine that… “Alexa, dim my TV”. Here’s where I ended up with my EventGhost configuration for the TV:

You can see that I’ve got a couple HTTP triggers, which drive between one and three actions. The most interesting one is “Power On”, which turns on the TV, switches the input to the receiver, then immediately to the HTPC. I had to do this because the power on behavior that EX-Link uses is very different from what happens when pressing the power button on the TV remote. There’s no animation on the screen and the TV doesn’t automatically wake devices up via HDMI-CEC. Activating the HDMI input achieves the same effect without any noticeable delays. Note: this should be a hint. EX-Link is likely triggering behaviors that may void your warranty or worse actually damage your TV. I really have no way of knowing, but I’m betting on its behavior being fairly benign.

The EventGhost HTTP triggers operate because you can enable an internal HTTP server inside of EventGhost that can act as triggers based on the URL. Useful for the next part. EventGhost can run on startup and in general just sits in the tray.

Probably the most vital part of this is the Hue emulator (titled the Amazon Echo Bridge by the creator). I had some trouble with the most recent versions of the repo, but had decent luck with v0.2.1. There’s no issue with the jar running on Windows, and I daemonized it by making it a service with NSSM. I run a few services on the HTPC so I had to mess around with the arguments when calling it. Port 8080 is otherwise in use, and unfortunately the “server.port” argument alone wasn’t enough to prevent the webserver from trying to bind on it.

Pasted because I couldn’t figure out the double dash in Medium posting…

Visiting the web server at /configurator.html presents a page like this:

For clarity, I have EventGhost running on port 6666. Both of these are “unsafe” if visiting in Chrome, so I had to use a different browser. You can see that pretty much label can follow as a parameter in the URL to EventGhost. Once EventGhost gets called, you can drag and drop the event in the event log on the sidebar of EventGhost to the appropriate event that it should trigger. Pretty easy.

After I got that set up, I had my Echo look for additional devices and it spotted these three “devices”. I control each of them by saying something like “Alexa, turn on Samsung” or “Alexa, turn off Playstation”. It pretty much just works. I wasn’t able to use the name “TV” for some reason, which is why I labelled it “Samsung”.

Notes if you don’t have a Samsung TV or EX-Link

When I linked to this post from Reddit, /u/sidoh mentioned a couple strategies that could work. The first is fairly simple — you may be able to use Wake on Lan to turn on your TV. You won’t be able to control it other than that, but it could be useful if TV power is a focus for the rest of your home theater setup. EventGhost (naturally) has support for using WoL, which you can use in the same fashion as I described above. In the same comment, sidoh described controlling a Sony TV via UPnP:

I think a lot of them respond to WOL packets (my Sony does). At least with my TV, turning off is trickier, but still possible via network calls. There’s some shitty auth procedure, but once that’s done (it’s permanent) I can turn it off via a UPnP service.

That’s an intriguing notion, since you could use these methods to control many more devices than just a TV. Googling a bit, I see that there’s a tool called easy_upnp which among other things allows UPnP control over HTTP. This means that you can use the Hue emulator to control these devices as well.

Update: SmartThings

I’m not sure why I didn’t spot this the first time, but you can simplify a lot of this setup. If you have SmartThings already (say, to control your lights), there’s a decent chance that you can turn it on and off with SmartThings + Echo. Just a warning that some of the more intricate controls like volume and input selection may not work via voice control.

Controlling the media server

Kodi has a built-in webserver for remote control, which is the basis for this section. For this section, there’s a single tool that has a really great README. It’s also seeing active development at the moment so I won’t transcribe the setup here. In essence, you expose the port for Kodi’s webserver to the internet and host a service to accept requests from a custom Alexa skill. I’ll mention a couple issues that I ran into.

Accessing Kodi while NATted

First, this process requires exposing Kodi’s webserver to the internet. I imagine you could partly mitigate this by running the kodi-alexa service locally as well, but then you’d have to expose that so the AWS Alexa skill can reach it. I wasn’t happy about this requirement, but I figured it to be relatively low risk. To my dismay, my ISP (Webpass) doesn’t allocate IPv4 public addresses so I’m effectively stuck behind a giant NAT.

Luckily, from really sad work experiences I’m familiar with remote tunnels via SSH. Using Putty (specifically plink), trusty NSSM, and a VPS that costs $5 / year, I managed to expose Kodi via another host. I’ll paste the plink command here for reference. Here, 5051 is the Kodi port — the result of yet another port conflict.

plink -R *:5051:127.0.0.1:5051 <vps host> -l <remote user> -pw <remote pass> -T -N

If you don’t have a VPS to use here, there are regularly deals on lowendbox. For purposes of a tunnel, the cheapest possible options are more than plenty. If you haven’t burned your AWS free tier credits, a t1.micro instance is also more than sufficient. The only thing to keep in mind is that in the SSH config on the host, “GatewayPorts yes” will have to be added in order for the port to be publicly exposed.

Actually deploying kodi-alexa

I was hoping to kill two birds with one stone and use this crappy VPS to host the kodi-alexa service, which would have been super since then I wouldn’t need to expose my HTPC’s Kodi instance directly to the internet. Unfortunately my VPS instance only has 128mb of memory and literally can’t install the python dependencies for that project. I then hoped to use the docker install and ship a completed image to the instance, but of course OpenVZ instances have custom kernels that can’t run docker. I messed around a little bit with it but ended up just using the Lambda deploy suggested in the kodi-alexa README. I couldn’t actually deploy from the HTPC (because Windows) or from the VPS (because memory) but it was really trivial to execute the deploy from a Macbook.

Note: it may be a good move regardless of the above two issues to attempt to expose Kodi only to the server running kodi-alexa or to not expose it at all if they’re running on the same box. I’m thinking it’s a lot less dangerous if kodi-alexa is attacked than if Kodi is attacked directly.

Using home theater speakers to hear Echo

Mentioned earlier, one of my goals was to use my speakers to hear my Echo. The Echo Dot actually has a reasonably loud internal speaker, but it’s not stereo and it has pretty poor quality for music. I want to use my more expensive speakers to listen to music while controlling playback by voice, which the Echo is already really good at.

Unfortunately the issue I ran into here is that my receiver (and most receivers) don’t support mixing playback from multiple devices simultaneously. Newer receivers have Bluetooth support and I suspect could override playback of other content to play what’s coming over Bluetooth, but it’s just a guess. There are mixers that one can buy, but my TV is connected to my receiver via HDMI like imagine most people have it connected. My TV supports optical output, but a mixer that supports both optical and a regular line in seems pretty expensive and I lose the HDMI-CEC features like changing the volume via my TV remote (and my echo!). Without any more innovative solutions, I was reduced to switching inputs on my receiver to either hear my Echo or anything else.

The next obvious move was to use my HTPC to mix the Echo’s audio into its own. Windows is traditionally pretty good at mixing audio from multiple sources simultaneously. However, my HTPC is actually a server (a Dell T130) that doesn’t come with a built-in sound card. I can only use it as a HTPC because sound is transmitted via the video card that I installed. The situation is almost unthinkable given the proliferation of onboard sound for at least a decade. PCIe sound cards cost at least $30 and worse I don’t have any spare PCIe slots due to said video card. Well…

I already had an old USB sound card — one of the cheap $7 devices on Amazon. That had worked alright to replace a broken jack on an older netbook with similar quality to most onboard sound. However, the mic input is a different story — I found that unlike most onboard sound, these cards really treat mic level separately than line level. Microphones and line devices (like the Echo) ultimately have very different voltages and the Echo sounded really distorted using the mic input, not to mention that the input in not stereo. Derp.

Next, I investigated using Bluetooth to connect the Echo to my HTPC, since the Echo supports connecting to external speakers and I figured I could have the HTPC pretend to be speakers. I happened to have an Orinco Bluetooth USB adapter kicking around, so it was easy to try. This operation has the PC acting as an A2DP sink for the Bluetooth profile. Earlier versions of Windows apparently worked well in this capacity, but Windows 10 actually butchered it with the default drivers. I tried using BlueSoleil, which has an alternate set of drivers. BlueSoleil might be the most hideous software that I’ve used, but it did successfully set up an A2DP sink and I was able to play music with my speakers. Unfortunately the quality was subpar — I’m not sure how to quantify it, but there was definitely some loss. BlueSoleil isn’t cheap, so this left me pretty disgruntled. Later I tripped over Broadcom’s WIDCOMM stack, which apparently is compatible with many (most?) Bluetooth chipsets. This integrated much more cleanly with Windows and had decent quality from the get-go.

Still, I wasn’t very satisfied. Neither BlueSoleil nor the WIDCOMM drivers actually seamless presented the HTPC as an an A2DP sink for the Echo and there was somewhat of a manual attempt of a connection each time a connection was lost. In addition, the Bluetooth device didn’t behave like a normal input device from the perspective of Windows sound devices. That stops me from controlling it inline with other applications.

Luckily I had already ordered a $16 5.1 USB sound card. This sports an actual line in (with stereo input, no less!) Plugging the Echo into this sound card worked immediately and sound quality was back to what I expected. Of course, I still can’t hear my Echo when using my PS4, but I’m pretty sure that was never in the cards.

Conclusions?

For me, the effort of all 3 of these projects was worth it. First, I seem like an absolute badass during parties, but it also makes a lot of media operations more trivial. I can learn episodes of TV shows that I haven’t seen yet, control playback while eating dinner, and switch between gaming and TV watching seamlessly.

Of course, the journey was probably worth more than the destination. Dozens of hours don’t just evaporate!