A WebRTC voice call cannot get through problem investigation

Wan Xiao
20 min readJul 19, 2022

--

My company’s internal IM has made a voice call feature based on WebRTC. Recently, the backend is migrating WebRTC-related servers. After the migration, the following phenomena were found in the test environment:

  • If an Android phone is connected to the company’s intranet and the other Android phone is connected to the company’s intranet and uses a United States VPN (US VPN for short), the two phones cannot make a voice call. After clicking accept button, it is always connecting and cannot be connected.
  • If the call is between iOS and Desktop, they can get through.
  • If the iOS device is connected to the intranet and the US VPN is used, and the Android device is directly connected to the intranet, they can get through.

So the investigation of this issue was handed over to the Android team.

Preliminary conclusions of others’ investigations

I am not familiar with WebRTC, so I asked my colleagues who are familiar with webRTC to investigate, and the preliminary conclusions are as follows:

The STUN binding request of Android on both sides of the call has not been responded, resulting in failure to connect.

And found that the Android device cannot ping the STUN server.

It is suspected that after using the US VPN, the Android phone could not connect to the STUN server, causing the ICE to fail.

There was no clear conclusion and solution for the time being.

Later, it was found that the STUN server could not be pinged because the server deliberately did not respond to the ping. If telnet to the corresponding port, it can connect.

Since my colleagues had spent nearly two days investigating this problem, they had no clue, and they still had to develop product requirements and there were deadlines. They couldn’t spend too much time investigating. Other colleagues had no relevant experience, and they all had their own tasks. If this problem was not resolved, the migration work would be affected, so I put down my work and concentrated on this problem.

In my opinion, there are still a lot of doubts about the results of the preliminary investigation. Since it is now up to me to continue the investigation, the first thing to do is to raise my doubts.

My questions about the pre-investigation

  • The preliminary investigation raised the suspicion that after using the US VPN, the Android phone could not connect to the STUN server and the call could not get through. Then why does the STUN binding request on both sides of the call fail to respond? In theory, whether the other party uses VPN or not will not affect the connectivity between my side and the public network server.
  • The two ends of iOS and Desktop, either end uses VPN, and the other end does not use VPN, they can get through, so how does they get through?
  • Whether the VPN software used in the test environment has the global proxy mode turned on.

I don’t know much about WebRTC, but I know that in the case of a two-person call, the two parties to the call can be connected through P2P directly or relayed through the server. The preliminary investigation did not investigate how iOS and Desktop are connected, and I would like to know why.

Why can non-Android devices get through?

I work from home myself, and I have no way to access the network environment and equipment where the problem occurs. Fortunately, my former colleagues built a call history record platform, which can aggregate and display the logs of the client and server in a certain call. According to the time provided by my colleague, I checked several call records that had got through, and randomly picked a screenshot.

A phone call that once got through

Although the log here is a bit strange, it can still be seen that the two parties on the call are talking through P2P connection. It doesn’t matter if you don’t understand it, I will explain it later, and when you look back, you can understand why this is a P2P call.

This is a crucial clue. Almost all voice calls made by iOS and Desktop use P2P connections, so my guess is here.

Guess: The voice call in the test environment cannot be relayed through the TURN server

Since the calls that can be connected all use P2P connection, and the Android can not be connected. It is most likely that the Android device cannot use the P2P connection because the VPN is turned on, and it needs to be relayed. Maybe in the test environment, the migrated server was unable to provide relay services, resulting in failure to connect.

It is easy to verify this guess. In the test environment, use an iOS device and use the cellular network to make a voice call to the Desktop which is connected to the intranet. Generally speaking, the cellular network cannot achieve NAT penetration, so it must be relayed. The test result has the same problem as Android, the call cannot be connected. So the crux of the problem is not that Android cannot get through, but that the relay service cannot work properly. As long as it is a call that needs to be relayed, it cannot be connected.

So the crux of the question is why it cannot be relayed. This server migration happens to be the TURN server migration, which is responsible for relaying. At least now I can reproduce the problem at home.

Why Android can’t use P2P connection after using VPN

iOS and Desktop can use P2P when VPN is turned on, but Android cannot. I feel that the implementation of this VPN software is different on each platform. On Android, it may be a mode similar to a global proxy. On other platforms, at least it doesn’t proxy LAN traffic. I actually downloaded the same VPN software used for the test, and found that I couldn’t connect it at all in my home. Maybe it’s been blocked. But from the UI of the software, this software does not provide options like global proxy.

Guess: Android can’t use P2P connection because VPN proxy LAN traffic

Fortunately, now I can reproduce the problem even in the test environment at home, so it is very simple to verify this guess. I have an iPhone and an Android.

  • Let both devices are connected to WiFi, the voice calls can be made. Through the call history record platform, it is confirmed that the two devices use P2P to make calls. Prove that the two devices can be directly connected in this LAN.
  • Let the Android device connect to WiFi without VPN, and the iPhone uses the cellular network to make a voice call. This time, it cannot be connected, which proves that the relaying service is not working properly.
  • Let the Android device connect to WiFi with VPN turned on, and use bypass LAN mode, so the traffic to the LAN is not proxied. Let the iPhone connect to WiFi. The voice call can be made. Through the call history record, can be confirmed that the two devices use P2P connection when the VPN is turned on but the LAN traffic is not proxied.
  • Let the Android device connect to WiFi with VPN turned on, and use the global mode. Let the iPhone connect to WiFi. The voice call can not be made. This proves that when using global proxy mode, the device cannot be P2P connected, and because the relaying service does not work properly, the voice call cannot be made.

So far, it can be determined that the Android device is unable to communicate with other devices in the LAN due to the implementation of the VPN software, resulting in P2P direct connection cannot be used during voice calls.

Why is there no response to STUN binding request

After I reproduced the problem on my mobile phone, I also observed that there are many STUN binding requests in the logcat, but there is no STUN binding response. Normally, there is at least one STUN binding response.

It is also found that when the problem occurs, the client actually collected a series of ICE candidates of its own, and also obtained multiple ICE candidates of the remote peer. Probably the ICE process failed.

Guess: Unable to reach STUN server

When I first saw that the STUN binding request did not respond, I subconsciously judged that there was a problem with the connectivity of the STUN server, but this guess was wrong. I came to this wrong conclusion because of my unfamiliarity with WebRTC. Here is my understanding of WebRTC after knowing some information.

WebRTC:STUN、TURN、ICE

Gathering ICE candidate
If two mobile phones want to talk through WebRTC, they must establish a connection. The technology used to establish the connection is called ICE (Interactive Connectivity Establishment).

However, since both parties of the call do not know where the other party is, they must be able to exchange some information first, and the information exchange is carried out through the signaling server. In our IM, the client will use WebSocket to establish a connection with the signaling server, so that information can be exchanged.

In addition to the respective media information, the most important thing for establishing a connection is the ICE candidate. The gathering of ICE candidates is impossible with the WebRTC client alone. It needs the assistance of a server on the public network. This server is called STUN (Session Traversal Utilities for NAT) server.

To get the help of STUN, you must first send a UDP request to STUN. This request is generally called STUN binding request. After that, the client will make multiple requests with STUN, and the client will get the candidate information it wants. In addition to STUN, there is also a TURN (Traversal Using Relays around NAT) server. The WebRTC client will request an address from TURN to relay data. This address can be understood as an express forwarding company. If the other WebRTC client finds a way to send the express to this address, TURN will forward the express to your WebRTC client.

UDP ICE candidates include the following types:

  • Host candidate (host for short): the local IP address of the device. Generally, the IP address of the device we see is this.
  • Server-Reflexive candidate (srflx for short): When a device accesses STUN, after passing through the NAT device, the address of the device seen by STUN generally represents the public network address assigned to it on the NAT device. If the device is directly on the public network, it is the same as the host candidate.
  • Relayed candidate (relay for short): The relay address applied by the device to the TURN server. If cannot connect via P2P connection, another device can send data to this relay address, and the relay server will forward the data to the device. Of course, the remote peer may also have its relay address. The relay server can be TURN itself.
  • Peer-Reflexive candidate (prflx for short): In the process of checking the connectivity of the ICE connection, the local device may have received a request from the remote peer, and the remote peer’s address seen by the local device may not be in the above candidate addresses, so it is called peer-reflexive candidate.

There are also three ICE candidates for TCP, which will not be described here.

Offer & Answer
Both parties to the call will collect their own ICE candidates, and exchange information with each other. In the stage of exchanging ICE candidates, the two parties have not established an effective connection, so they still need to exchange through the signaling server. In this step, one end A must actively send an Offer, and when the other end B receives the Offer, it knows A’s ICE candidate, and sends the Answer to A. A receives B’s Answer and knows B’s ICE candidate.

The signaling server can decide who sends the Offer. For example, the signaling server sends a require_offer command to a party, the party sends the Offer after receiving it. Then the signaling server hands it to the other party, and sends a require_answer to the other party. The other party sends the Answer after receiving it, and the signaling server forwards the Answer to the party that sent the Offer.

In this way, both parties have their own and each other’s ICE candidate lists. The candidates in the list are matched one by one, and there will be various candidate pairs.

For example (local host, remote host), generally if two machines are in the same LAN, or both are directly on the public network, and can access other devices in the LAN, this candidate pair can be connected.

Another example is (local server-reflexive, remote server-reflexive), if both parties are behind their own NAT devices, but these two NAT devices are very friendly to NAT penetration, they can be connected through this pair.

Connectivity check
Due to the complexity of the actual network, many candidate pairs cannot be connected in theory. In addition, even if they can be connected in theory, they may not be connected in practice. Therefore, the connectivity of the candidate pair should be checked. After the check, the candidate pair with the highest quality can be nominated to use on both sides.

The action of checking the connectivity of the ICE candidate pair here is to send a STUN binding request. For exmaple, A sends a STUN binding request for each ICE candidate pair. If the ICE candidate pair can be connected in one direction, B will receive the request, and B will send a STUN binding response on the corresponding channel. This connectivity check will be done on both sides of the call, that is, B will also send a request and wait for A’s response. In order to distinguish it from the previous STUN binding request that interacts with STUN, I prefer to refer to the STUN binding request in the connectivity checks as peer-to-peer STUN binding request.

In the connectivity check phase, each WebRTC client will issue a STUN binding response after receiving the STUN binding request. The STUN binding request & response here is used to detect the connectivity of the candidate pair, but the values of the STUN binding request and STUN binding response used in the stage of collecting the local ICE candidates are reused. Don’t confuse the STUN binding request here with the STUN binding request for interacting with STUN when collecting ICE candidates.

In the end, both parties will select the candidate pair with the best quality for data transmission. The ICE Candidate Change seen in the portal of our call history record platform is the candidate pair nominated by both parties. Therefore, as long as the word “relay” does not appear in it, it means that relay is not used and the call use P2P connection.

The figure below is a typical record of using relay.

A call record using relay

Timing problem
Although I divided these steps into several stages above, but in fact they are not strictly separated in time.

For example, the time when the ICE candidates of the two parties are collected may be different. When the A that wants to send the Offer has collected the ICE candidates, it can send the Offer and wait for the Answer; after the B receives the Offer, if its own ICE candidates have been collected. Answers can be sent.

At this time, since the B side has already obtained the ICE candidates of both parties, it will start to check the connectivity of various candidate pairs, but A may not have received the Answer, so A has not yet entered the stage of connectivity check. Therefore, all peer-to-peer STUN binding requests sent by B will not have any response, that is, all these requests will time out. B will continue to check, and will not stop checking due to timeout. At the same time, B will also respond to the peer-to-peer STUN binding request it receives, but no requests are sent to it at this time.

When A receives the Answer, A will also enter the stage of checking the connectivity. At this time, on the one hand, A will send the peer-to-peer STUN binding request, on the other hand, A will also respond to the received peer-to-peer STUN binding request.

Who should respond to the request

According to the above description and the actual execution, the STUN binding request printed in the log refers to the peer-to-peer STUN binding request. These requests are actually responded by the remote ICE candidate. There is also a relay address of the remote peer in the remote ICE candidate, so the STUN binding request is responded by the relay of the remote peer or the remote peer itself. The requester’s TURN may need to relay traffic, so the peer-to-peer STUN binding request may be sent to the remote peer, requester’s own TURN or the remote peer’s relay, but the response is still from the remote peer.

The STUN binding request in the stage of collecting ICE candidates is responded by STUN, and the peer-to-peer STUN binding request in the check connectivity stage is responded by the remote peer, which cannot be confused.

Checking the log, both parties on the call have collected their own ICE candidates, and also received the ICE candidates of the remote peer. Both candidates have server-reflexive and relayed addresses, so the client must be able to connect to STUN and TURN.

Then why can’t sender get STUN binding response?

The binding request sent directly to the remote peer obviously cannot be received by the remote peer, because the VPN is used, and the VPN is in global proxy mode, which makes it impossible to communicate with the devices in the LAN, so the relay is needed. The crux of the question is why the relay server failed to forward the peer-to-peer STUN binding request successfully.

Unfortunately, STUN binding request failures in WebRTC are very common, and in many cases, the connection establishment process is not a simple request to complete. Our IM does not print the details of STUN binding request failures. After looking through all the logs, I’m sure that the logs don’t have the answer I’m looking for.

When the relay address is a public IP

Intuitively, the relay address should be a public network address. For example, it can be the TURN address of the local peer with a TURN assigned port. Of course, TURN can also assign the address of another server as the relay address.

If the relay address of the remote peer is the public IP with port, you can directly send data to it during the connectivity check/data transmission phase. But our situation here is a little different.

Why is the relay server not responding

During the investigation, I found that the relay’s address is not the same as the TURN server address, not even a public network address. For example, the following is a screenshot of the release environment:

Candidate pair for a successful call

In the release environment the relay is a private IP. TURN needs to provide services on the public network, so it cannot be this address.

The results returned by the STUN/TURN server can be tested through this tool https://webrtc.github.io/samples/src/content/peerconnection/tricle-ice/. I actually tested the TURN in the release environment and found that the relay address is indeed a private IP address.

Is there a problem with private IP as a relay address?

According to the general understanding of TURN, the relay address IP should be at least a public network address. After capturing the traffic, I found that relay address was a deliberately given private IP.

Since Android is inconvenient to capture network traffic, here is the Desktop traffic captured by Wireshark:

Request TURN to allocate a relay address:

Allocate request to TURN

Response returned by TURN:

Allocate response from TURN

The XOR-RELAYED-ADDRESS in the response is a private IP.

Because the relay address of the release environment is also a private IP, requests sent directly to it will not receive any response. The traffic capture also confirms this.

No response from relay address

Send a large number of Binding Requests, but there is no response. Because there is no relay server deployed in my LAN. How does the release environment relay server work?

How private IP relay works

In the release environment, the relay address is a private IP, and the voice data is not sent to the relay address, but to sender’s own TURN server.

Before sending data, CreatePermission will be sent, and XOR-PEER-ADDRESS will be filled in the relay address of the remote peer.

After the CreatePermission succeeds, because the relay address must appear in a candidate pair, the next step is to check the connectivity of the candidate pair, that is, send a peer-to-peer STUN binding request.

If a response is received, a Channel-Bind Request is sent to TURN, and the relay address of the remote peer appears again as XOR-PEER-ADRESS with a ChannelNumber specified by the client.

After getting the Channel-Bind Succues Response, client can send ChannelData to the TURN server, with the same ChannelNumber in the ChannelData, and TURN knows who it should forward to when it sees this ChannelNumber.

Therefore, the relay address, in client’s point of view, is just some virtual address of the remote peer, not an address that can be directly accessed. Tell TURN that I want to send data to this virtual address, and TURN will forward the data to remote peer.

(The relay address in the screenshot has changed because it comes from different call record, but they are all private IPs)

CreatePermission, Channel-Bind Request and ChannelData

So it is meaningless to investigate why the relay does not respond, because the data is not sent to the relay address, but to the TURN.

Why can’t the test environment make a call through relay

From the above we know how relay works in the release environment. Next, let’s see why it doesn’t work in the test environment. After capturing the traffic, it was found that the CreatePermission step failed.

CreatePermission failed

WebRTC: CreatePermission and Indication & ChannelData

TURN can be used to forward data. If there is data that you want to send to the remote peer, you can ask TURN to forward it.

TURN is not only used to forward data to the relay address, but also to forward data to srflx address.

Since TURN acts like a relay server when forwarding data, enterprise IT departments may be concerned that it is used to bypass firewalls, so TURN needs client to send CreatePermission before forwarding data.

After CreatePermission is successful, TURN can forward the client’s data. There are two ways for the client to request it to forward data.

Send Indication & Data Indication
Send Indication can carry data, and it is necessary to specify the address of the remote peer, which may be server-reflexive or relayed address. If the Send Indication sent by the remote peer is forwarded to this device, it is a Data Indication .

ChannelData
Send Indication and Data Indication have additional address overhead. For application scenarios such as VoIP, the impact is relatively large. You can send a channel bind request to request TURN to bind a ChannelNumber to a certain address. After that, as long as ChannelNumber is included in the ChannelData sent by the device, TURN will know who to forward to. This saves additional address overhead.

Why CreatePermission failed

Since I’m not a TURN developer, I don’t know how our TURN is written. But I think that our TURN should learn from the WebRTC sample. So I looked through the source code of WebRTC sample and found the code of CreatePermission along “Forbidden”:

void TurnServerAllocation::HandleCreatePermissionRequest(
const TurnMessage* msg) {
// Check mandatory attributes.
const StunAddressAttribute* peer_attr =
msg->GetAddress(STUN_ATTR_XOR_PEER_ADDRESS);
if (!peer_attr) {
SendBadRequestResponse(msg);
return;
}

if (server_->reject_private_addresses_ &&
rtc::IPIsPrivate(peer_attr->GetAddress().ipaddr())) {
// 这里导致 CreatePermission 失败
SendErrorResponse(msg, STUN_ERROR_FORBIDDEN, STUN_ERROR_REASON_FORBIDDEN);
return;
}

// Add this permission.
AddPermission(peer_attr->GetAddress().ipaddr());

RTC_LOG(LS_INFO) << ToString() << ": Created permission, peer="
<< peer_attr->GetAddress().ToSensitiveString();

// Send a success response.
TurnMessage response;
InitResponse(msg, &response);
SendResponse(&response);
}

Here, the address corresponding to XOR-PEER-ADDRESS will be obtained from the CreatePermission request. TURN decides whether to reject the private address according to the reject_private_addresses configuration of TURN. The judgment code of the private address is as follows:

static bool IPIsPrivateNetworkV4(const IPAddress& ip) {
uint32_t ip_in_host_order = ip.v4AddressAsHostOrderInteger();
return ((ip_in_host_order >> 24) == 10) ||
((ip_in_host_order >> 20) == ((172 << 4) | 1)) ||
((ip_in_host_order >> 16) == ((192 << 8) | 168));

According to this code, the relay address 10.127.5.4 assigned by the release environment is a private IPv4 address, and the rejected relay address 10.71.19.38 assigned by the test environment is also a private IPv4 address, so the release and test TURN should be different in the reject_private_addresses parameter.

This property is false by default, and you have to call set_reject_private_addresses deliberately to change it to true. I feel that this is the problem.

Final conclusion

Contact SRE to check, there is an option to deny private IP when starting the service, which is configured as true in the test environment. After changing to false, the problem that Android can’t get through voice calls when VPN is turned on disappears.

It took me more than a day to investigate and solve this problem. To investigate this problem, I need to look at the logs and search the information of WebRTC. I took a lot of detours and wasted a lot of time when investigating because I don’t know much about WebRTC. In addition, the whole process is basically an investigation without source code. Generally, for unfamiliar things, I will not investigate the problem by inspecting the source code, unless I really can’t figure out the root cause.

The investigation of difficult problems is actually an inspection of knowledge reserves and experience. If you are a person who knows WebRTC well, you should be able to find out the root cause of this problem within 20 minutes.

Another probem

A few weeks after the investigation of this problem, a similar problem was found. This time, CreatePermission succeeded, but the client log showed that all STUN binding requests to detect connectivity failed, so there was no Channel bind request.

At first, this problem was investigated by other people, including those who specialized in audio and video conferences. The conclusion was that the network provider blocked UDP. So we were suggested to use TCP to communicate with STUN/TURN. My first reaction was to challenge this conclusion. Because UDP communication is used from the ICE stage to CreatePermission. Why does the provider only block the STUN binding request after CreatePermission but let all other UDP traffic pass?

When the relay address of the remote peer is a private IP, the local client first requests CreatePermission from its own TURN, and then asks the TURN to forward the STUN binding request to check the connectivity. At this time, the remote peer needs to respond. If there is no response to the STUN binding request, the channel bind request will not be sent.

If both sides of the candidate pair use relay, the remote peer should send a STUN binding response to the relay of the local peer, then local peer can receive the response.

The difference in this question is that the TURN is fine. The key is that one of the calling parties got the ICE candidate of the remote peer, and then disconnected from the network, causing it to disconnect from the signaling server. Its ICE candidate is not sent to the signaling server, so that the remote peer does not receive its ICE candidate.

At this time, one party is still waiting for ICE candidates, and the other party has already started ICE connection establishment, but there is no network, so the connection cannot be successfully established at all.

I can overturn the wrong conclusion on this question because I believe that the STUN binding response for checking connectivity should be responded by the remote peer, not TURN. I checked the remote peer’s log and found that it has not reached the stage of checking connectivity at all, let alone responding to the STUN binding request, and has been waiting for the peer ICE candidate. With this key discovery, I found out that one of the two parties had disconnected from the Internet. The reason for the disconnection was related to the iOS system, but at least it had nothing to do with the network provider blocking UDP.

--

--