Focusrite Scarlett Solo and Audio-technica AT2020 for VoIP: a review

Ismo Puustinen
9 min readAug 15, 2015

--

This review is about Focusrite Scarlett Solo USB audio interface and Audio-technica AT2020 microphone. I’m not a musician, so I’m concentrating on the suitability of these devices for voice-over-IP (VoIP) chats using Teamspeak or similar software.

Some history

I like to play computer games online with my friends and I also have a job that requires me to do a VoIP teleconference from home every now and then. I was previously using a Sennheiser headset (PC360) for VoIP purposes, but was getting complaints from my friends about the way I was talking to the headset microphone. I was either talking too silently (keeping the microphone boom too low below my chin or too far to the side) or making Darth Vader noises by breathing to the mic. Also, I wanted to buy new headphones that did not have an integrated mic, so I started looking into external microphones. I eventually settled for a Samson Go Mic, which is a small, inexpensive USB microphone. I put that sitting on top of my display and everything appeared to be good. However, I could not get the microphone recording level high enough to my liking. My friends said that my voice seemed quiet over VoIP, and especially if I was leaning back in my chair, it was difficult to figure out what I was saying. I learnt that spoken voice is actually pretty quiet compared to many other microphone use cases, such as recording instruments or singing.

An obvious solution to the problem was to get the microphone closer to my mouth. I put the mic on the edge of the table next to the keyboard, but then the key clicking noise was overwhelming. I then bought a microphone gooseneck which I screwed into a shelf. This allowed me to position the Samson mic much closer: it was positioned on the side of the display, above the mouse pad, about 50 centimeters from my mouth. I set the Windows recording volume level to 100%, got quite a lot of static hiss, but Teamspeak background noise reduction feature was pretty good at removing it.

I was content with this setup for quite a while. The complaints about my voice quality had stopped. However, I wasn’t completely happy with it. Maybe because I had the recording volume set so high, the microphone was prone to clipping. Clipping means that the sound is too loud to be recorded properly. The Samson mic had a led that turned from green to red when the sound was clipping, and that happened pretty often. If I leaned forward in my chair or raised my voice, there was clipping. Also, in my opinion, my recorded voice did not sound too good — it was somehow bit unnatural and robotic.

I started originally looking at external sound cards because I wanted to have a physical volume control for my headphones. As I was investigating the options, I started to look into the more advanced sound cards — called audio interfaces — that musicians and audio hobbyists use. These audio interfaces contain preamplifiers for microphones and are typically able to provide 48V “phantom power” that condenser microphones require. They also feature XLR connectors for big microphones. I decided to get rid of the Samson mic and just use one audio device that has both my headphones and my microphone connected. Some googling later I had settled on Focusrite Scarlett Solo audio interface and Audio-technica AT2020 condenser microphone. They cost bit below 100 euros each, and I had them shipped to me with a pop filter and microphone cable for about 200 euros.

Taking Solo and AT2020 into use

My first impressions with the devices were good. The audio interface and the microphone both have a metal chassis and are quite heavy. The design is understated and practical-looking for both products. I think the computer peripherals industry might learn something from audio industry here.

Audio-technica AT2020 next to the display. The gooseneck attachement to the shelf is seen on back right, while the audio cable goes to the back of the display on the left.

The AT2020 came with a stand connector that could be directly screwed to the microphone gooseneck that I already had. This made the microphone installation trivial, and it was positioned to the same place on the side of the display where the Samson Go Mic had been. The AT2020 looked quite a lot bigger looming there, but I got used to it surprisingly quickly. The physical connecting of the cable to the microphone and to the audio interface was simple and could not be done wrong. The XLR connectors are really heavy-duty and solid. The Solo is connected to the computer with a regular USB cable.

The Solo needed to have Windows drivers downloaded and installed, which was a chore. Also when I upgraded to Windows 10 I had to go to Focusrite’s website and download Beta drivers for Windows 10, which luckily worked fine. Oddly, Linux and OS X come with suitable drivers pre-installed, and only Windows needs to have them downloaded. Having to install drivers was actually a problem to me for my other planned use case, which was connecting the audio interface to my work laptop for teleconferencing. Many companies do not allow installing drivers from the Internet to their computers. Using a Mac or Linux computer of course solves this, but the amount of VoIP software compatible with corporate VoIP solutions might be limited on those platforms.

Scarlett Solo usability and microphone sound quality

After the driver installation everything just worked. I didn’t bother installing any of the application software that came with the audio interface since I don’t plan recording and processing any audio. The key thing is that for Teamspeak (and other VoIP) purposes the hardware could hardly be better. The audio interface is easy to use, the required microphone volume level can be easily achieved, and the headphone volume is adjusted using a big control knob. There are only two improvements for the Solo that I could use:

  1. There could be a separate volume control for microphone direct monitoring. When using closed headphones you typically don’t hear yourself talking, which is very annoying. With direct monitoring sound from the microphone is routed directly to the headphones, meaning that you hear yourself talking without any latency. However, the proportion of direct monitoring volume compared to the overall volume can not be adjusted from the hardware. The only way to control that appears to be setting the mic and playback levels from the computer software.
  2. There is a button that controls the 48V voltage that is fed to the mic. The button has a red led which is on whenever the 48V voltage is enabled. Since the AT2020 doesn’t work without the extra voltage, the button has an unintentional secondary function as a “mute” button. However, the button is small and awkwardly placed, and thus very difficult to push quickly. Having a much bigger and easier-to-press button with the same functionality would have helped a lot.
Focusrite Scarlett Solo sitting below the display. The cable on the left goes to AT2020 and the cable to the right is for my headphones. Note the red 48V phantom power button and the direct monitor switch.

Audio quality of the microphone is really good. I mean, I don’t have any experience with any other “real microphones”, but the recorded audio sounds the way I want it to sound. Of course, the VoIP codecs compress the audio and cause some quality loss, but still I feel that the voice comes through clear and with a lot of presence. The AT2020 has a cardioid pickup pattern, meaning that it records more audio from in front of the microphone than from the sides or the back. Still, I wish the effect was more pronounced. The microphone picks some computer fan noise from the computer that is directly to one side of the microphone. I use the “remove background noise” setting in Teamspeak which does a good job of removing the fan noise and possibly also other noise outside of human speech frequencies.

There doesn’t appear to be much audible static when the microphone gain is turned up, which I guess is due to the good quality of the microphone and the audio interface preamplifiers. The microphone and audio interface together have easily enough gain to handle spoken word from 40–50 centimeters away, and I don’t have to keep the physical or software recording gain at maximum.

The microphone buying guides I read told that a pop filter is essential for any microphone use. A pop filter is a piece of thin cloth (think of nylon stockings) that is stretched before the microphone and makes ‘P’ and ‘S’ sounds sound better. I got one that was in a bundle deal with the microphone. I quickly found that a typical circular pop filter is impossible to use in my setup because it would go in front of the display. Also, since the microphone is so far from my mouth, ‘P’s and ‘S’s sound normal. I think the issue is more for people who talk less than 20 centimeters from the microphone, and even then I would recommend trying to find a curved pop filter that would not shadow a large part of the display.

The driver software for Solo is not perfect — I managed to get some audio lockups when I was playing around changing the recording frequencies. Even though the Solo supports pretty high frequencies, I found that 48 kHz is plenty for my application, since I don’t plan editing the audio or doing any fancy tricks to it. The recording volume level can be set on software in addition to the hardware knob. However, the documentation doesn’t tell what is the “native” level governing how the audio interface is recording audio and what (if any) is the software volume boost to the audio coming from the audio interface. In practice this doesn’t mean much, but it’s bit annoying that there is not a guide for users for setting the relationship between software and hardware gain to the optimal values.

Overall, I think that my audio setup is now as good as it gets for Teamspeak or Skype use. The audio is pretty crisp and clear, and using my microphone setup doesn’t require any extra effort from me. The limiting factors in sound quality are now the audio codecs that VoIP services use. I can warmly recommend both the audio interface and the microphone. Of course, the 200 euro price is quite big, but having good VoIP audio quality really matters. Especially in corporate teleconferences, there are those folks who call in from a car or are talking to their laptop’s integrated microphone. I find following their speech difficult, because it’s often hard to figure out what they are saying, and it’s easy to “zone out” while they talk. Having good VoIP quality gives certain presence to what you are saying.

Audio quality samples

The following audio sample is recorded from Teamspeak using the Scarlett Solo + AT2020 setup after connecting to typefrag.com Teamspeak server. The recording was made on a different computer connected to the same server, so it shows the actual audio quality the other people are going to hear from you. Typefrag.com documentation says that they are using “full quality” for Teamspeak audio, which probably means that they have set Teamspeak server audio quality to the highest value. The “remove background noise” feature is on from Teamspeak settings.

Teamspeak recording with noise reduction

For comparison, I recorded another sample directly without Teamspeak. I applied some noise reduction in Audacity, because noise reduction was used also in the Teamspeak sample. In side-to-side comparison there is a clear difference between the two samples — the compression in the audio codecs that Teamspeak uses makes the sound a bit muddier and “metallic”. Overall, the difference isn’t so big — the audio codecs have been steadily improving, and as the network bandwidth increases, I’m expecting the VoIP sound quality to keep getting closer to the original recording quality.

Direct local recording, noise reduction in Audacity

Just to make things interesting, I added another Teamspeak recording using the same recording method as previously. This time Teamspeak’s “remove background noise” was off. You can hear clear increase in audio quality with the price of some hiss. If there was a game being played, fan noise from the computer might be audible in the recording.

Teamspeak recording, no noise reduction

--

--