In-Depth Guide on How to Greatly Boost Video Chat Quality

Van Nguyen
Oct 7 · 12 min read
Example of a boosted video chat set up with blurry bokeh background, RGB lighting, well lit subject, no visible AV equipment.
Example of a boosted video chat set up with blurry bokeh background, RGB lighting, well lit subject, no visible AV equipment.
Figure 1 — Example of a boosted video chat set up with blurry bokeh background, RGB lighting, well lit subject, no visible AV equipment.

This guide was originally posted on Notion (here) on June 26, 2020.

Overview

In this document, I’ll go over what are the elements required so that you can make it an to be on video chat with you. As some may have noticed — it can make you stand out (I’ve seen people get distracted by the video & stare way too long…), it can speak for you (by reflecting your personality & interests), and it can leave quite an impression. You may be surprised to know that with even a few minor changes, you can greatly enhance your own experience with some relatively inexpensive and non-technical tweaks. However, all of the top of the line upgrades will unfortunately require 💸 money.

Now, there are three primary issues to contend with in order to create a high quality video experience. I will go through each aspect in detail and how to address them all as well as potential upgrades for multiple budgets.

The 3 issues, sorted from the easiest (and cheapest) to fix to the hardest (and most expensive) to fix, are:

  • ⏳ Latency
  • 🎙️ Audio Quality
  • 🎬 Image Quality

Then, at the end, I’ll share lists of different configurations, again, at different budgets. And, we’ll open up to Q&A and consultation. Feel free to interrupt at any time, all of this information will be available online after the session.

Latency

⏲️ Baseline

Regardless of your internet connection bandwidth, Zoom adds a baseline latency of about 500ms. This means that after you talk, it takes 500ms PLUS travel time to the other attendees. For a call from San Francisco to NYC, this travel time is about 80ms.

This means… with no other latencies involved, after you start talking, the other attendees won’t hear you for another half second and change. Let’s calculate what happens when you add more latency.

📡 Wifi

You may be surprised to learn that wireless internet is a terrible choice for video calls. Wired ethernet adds only .3ms delay even over long cable runs. That delay is imperceptible.

At best, a typical wifi connection will add an extra 3ms hop for every router or repeater you have. At worst, it will cause your visual frames to freeze and audio to be interrupted every few hundred milliseconds causing an overall delay of 500ms. This delay is incredibly perceptible.

Using wifi for video chat is like trying to talk to your date at a restaurant where everyone around you is yelling at the top of their lungs. For you to be heard, you have to keep yelling at the top of your lungs repeatedly until your date receives your message. Try to have a conversation like that and see how you like it.

So, switch to a wired network connection (.3ms latency).

📶 Bluetooth

Similarly, Bluetooth is a terrible choice for video calls. One of the more popular options is the Apple AirPods Pro. This has an at-best latency of 144 ms. Assuming you’re both on wifi and both use AirPods Pro:

That is just shy of 1.4 seconds each way.

So, switch to a wired audio connection (<.1ms latency).

🥇 Best case scenario

Switching to all wired connections gives you a latency of:

This can mean the difference between having a comfortable, but not ideal conversation (.5 second delay) to having one where you are constantly interrupting each other (1.4 second delay).

🎙️ Audio Quality

🎧 Audio Processing

If you weren’t already convinced about the value of switching to wired audio, ponder this: if you and your attendees sit in noisy rooms, using speakers, and a built in microphones, you are all going to send a noisy signal — you’ll be sending over the internet more than JUST the sound of your voice. Your computer will do the best it can to filter out frequencies outside normal vocal range but there are many noises that overlap with that range. Then, as it reaches another speaker’s computer, that computer will also try its best to filter out noise. But it won’t be perfect and as they speak, they will also be including your noise with their noise (which may also include your voice too). And as the number of guests grows, this problem becomes untenable.

So, use headphones.

I recommend in-ear monitors (IEMs) which provide studio grade audio with sound isolation (passive noise reduction) and if you buy clear ones, they give a very subtle, nondescript appearance.

Van tilting head to side to showcase in-ear monitor headphones which are subtle and non-distracting.
Van tilting head to side to showcase in-ear monitor headphones which are subtle and non-distracting.
Shot illustrating the subtle nature of in-ear monitor headphones.

If you can, it would be advisable to use a dedicated microphone connected to an audio interface that supports studio-grade audio processing. This includes using compressors (ensures your voice is never too loud), noise gates (prevents a lot of noise before going to Zoom or other tool), and boosts (amplifies certain frequencies to increase audible clarity).

🎶 Acoustic Treatment

You can eliminate a lot of noise before it even hits your microphone by acoustically treating your room. This can mean acoustic foam on walls, wood/fiberglass sound diffusers, or acoustic blankets. These are meant to prevent noise from bouncing around in your room, prevent your voice from echoing and reverberating through the room, and to muffle stray sounds.

🎙️ Cardioid Microphones

If you are in a room that may have loud noise intermittently, use a dedicated microphone with super-cardioid pickup patterns. This just means that the microphone has a narrow range of where it can read audio from the environment and therefore eliminate any sound that comes in at the wrong angle. This will dramatically drop off the amount of noise introduced before it even reaches the audio interface.

If you are in a soundproofed studio, you are free to use a cardioid pickup microphone which ignores sound from the rear but picks up everything with high sensitivity in the front.

Image for post
Image for post
Image for post
Image for post

I don’t recommend a hyper-cardioid shotgun microphone or an omnidirectional microphone for indoors as it is very likely to pick up reflections in even a perfectly acoustically treated room.

🎬 Image Quality

A high image quality experience consists of three things:

  • composition & style
  • sufficient network bandwidth for your desired resolution
  • proper exposure & lighting

⛰️ Composition & Style

This is where you can take things up a notch. Well-framed composition and visual styling can be a game changer.

🖼️ Composition & Framing

Ideally, you’ll want to position yourself in the center of the frame. Your face should be close enough to clearly be the focus or most important part of the frame yet not be so close that your guests can see your nose hairs. And not so far away that the guests can’t make out your facial expressions & body language.

Additionally, if you place yourself off to one side, you’ll create a large area of negative space which can draw viewers’ eyes away from the intended focal target: you. Rather than suggest that you should do this, you should only do this if your intent IS to draw your viewers’ eyes to something in that negative space like a brand logo, informative picture, or comic element.

To create the best possible effect, I suggest investing in a teleprompter for your camera. With my setup, I have my camera behind the teleprompter and an iPad mirroring my computer’s screen. The iPad’s screen is therefore reflected through the half-reflective teleprompter into my eyes. As I am looking directly into the camera lens, I am seeing the other person’s face giving the impression of eye contact from the guest’s point of view. This way, the guest isn’t staring at my forehead the entire time watching me constantly look down below the camera.

⬅️ Don’t Neglect the Background

The background can be incredibly useful when video chatting with someone. It provides a place for viewer’s eyes to rest from focusing on you and your facial expressions and a place for them to explore. For my setup, I have deliberately arranged the camera to showcase my actual home office setup as well as interesting bits and pieces that might be on my desk for that week. Some have noticed the oscilloscope, others notice the additional camera, but most notice the massive screens with a weekly changing wallpaper.

🎨 Color Correction & Color Grading

Depending on what kind of camera you decide to use, image sensors are not perfect. They approximate (but not closely) with what the human eye can see; however, the color spectrum overlap is often… poor. Sony alpha cameras in particular are known to emphasize yellow too much and improperly handle reds.

Example of color grading using a LUT. Right is before grading, left is after grading. Performed using an Atomos Ninja V with
Example of color grading using a LUT. Right is before grading, left is after grading. Performed using an Atomos Ninja V with
Example of color grading using a LUT. Right is before grading, left is after grading. Performed using an Atomos Ninja V with custom LUT.

Therefore, it is helpful to color correct the image. Some color correction is possible in-camera, if you take a look at some popular camera-specific LUT packages like Leeming LUT Pro, they come with camera-specific settings to change to compensate for sensor idiosyncrasies.

🚄 Resolution & Bandwidth

It is very simple. More resolution packs more pixels into the screen which means more details are visible. Zoom has a maximum supported streaming resolution of 1080p; Hangouts, 720p. With those constraints, it may seem like overkill to use a higher resolution camera, but your chat guests will notice the difference between 1080p and 720p. However, unless your chat system supports 2160p or higher, your guests are unlikely to notice the difference between an expensive 8K camera and a 4K camera.

Many cameras, like webcams built into laptops and dedicated ones in smartphones, are built to be incredibly small. They are built to be so small, in fact, that they have hit the limits of optical physics a long time ago. And therefore, to get good imagery, they must use tricks in software to compensate.

You may notice that all dedicated cameras like DSLRs & mirrorless cameras (as well as professional film & ENG cameras) are quite large by comparison for what may seem like the same job. This is on purpose. With that, you get: high resolution directly out of the camera (no processing, no tricks) as well as high creative control of the optics themselves (and therefore control over the resulting video).

Unfortunately, laptops, even recently released ones, will typically come with non-removable webcam with a maximum resolution of 720p. This is true of the latest 2019 MacBook Pro. Recently released phones will frequently have a 1080p camera (or even 4K like the latest iPhone 11) so if that is an option for you and it is higher than your laptop’s camera resolution, you may want to consider using that as a webcam.

You also happen to need enough bandwidth to send your video (and to receive others’). The uploading of 1080p video requires ~5Mbps, 4k video requires >24Mbps. If your connection does not support this, uninterrupted, you will not be able to stream your video regardless of how amazing your video quality is.

💡 Exposure & Lighting

This, by itself, will probably give you the most bang for your buck visually — lighting! Cameras and lenses, given a specific set of aperture + shutter speed + ISO, will operate best at specific exposure or brightness ranges. Even a 720p laptop webcam will benefit greatly from being properly exposed to the sensor’s ideal brightness range. So, if you’re using a webcam and see a lot of pixelation (a lot of grainy noise or blocky patches of different colors that show up especially in darker regions of your video), you need to add more light to your scene and that will dramatically boost your image quality.

For our purposes, we want a well-lit, properly exposed scene and so there are three things you need to worry about: the background, the foreground, and if you wear glasses.

THE BACKGROUND

So, some people here will typically work with their back facing a window. On even an overcast day (let alone a sunny one), the sunlight from the window will almost always overpower anything else in your scene — causing your face to be, at best, a dark silhouette. To compensate for this, you would need an equally bright light focused on your face. High quality lights bright enough to match the sun “properly” are upwards of $4000 each. But even your basic desk lamp will technically be an improvement over the silhouette. Or you could, you know, move your desk. Facing the window would balance the light on you with the room behind you for upwards of $0 and you get the benefit of a typically soft, flattering light on your face. And you probably get a nicer view. Win-win.

THE FOREGROUND

Once you have your background set to not be overpowering to the foreground, you’ll want to make sure your foreground is well lit. Ideally, this will be with a soft, diffuse light which casts equally soft, flattering shadows. This light is referred to as the “key” light. A hard light, like your typical desk lamp or flashlight will cast very sharp shadows on your face which can give a dramatic look but is typically distracting for video calls because the lit and shadowed portions of your face will be exposed differently.

To separate the background and the foreground, one of the best ways is to use a backlight, also referred to as a rim light. This should be a bright light but dimmer than the key light at approximately a 45 degree angle behind you in either left or right of you. It should ideally be 45 degrees up above you as well. Backlights add specular highlights to your hair and shoulders which create a distinct separation between you and whatever is behind you. This is especially useful for green screen chroma keying.

If you can improve the lighting of your scene using any of the above techniques, you can greatly improve your output even with a weak 720p camera.

DEALING WITH EYEGLASSES

Lighting angles and placement changes drastically if you wear glasses. You’ll want to take advantage of angle of incidence to ensure the light is not visible reflected through your glasses. This usually means placing it in front of you but above your head. This will, however, unavoidably create some harsher shadows.

Alternative articles I found after writing this document:

The Startup

Medium's largest active publication, followed by +734K people. Follow to join our community.

Van Nguyen

Written by

Designer / Engineer. ODF4. Building modern video editing tools at https://vidbase.co

The Startup

Medium's largest active publication, followed by +734K people. Follow to join our community.

Van Nguyen

Written by

Designer / Engineer. ODF4. Building modern video editing tools at https://vidbase.co

The Startup

Medium's largest active publication, followed by +734K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store