Cisco 9800 WLC Client Disconnections

wirelesslab.io
10 min readFeb 13, 2024

--

Client disconnections are one of the most common issues in an enterprise wireless network. This article will cover how to troubleshoot and fix some of the most commonly seen problems on Cisco 9800 Wireless LAN Controller. Most of the problems mentioned here are also applicable to the old AireOS WLCs.

Common Issues & Best Practices

1. Session Timeout

Recommendation:

  • Session timeout = 3600 seconds

Session timeout is a time after which the client is going to be kicked out of the network and forced to re-authenticate. On older releases of Catalyst 9800 WLC, this value is by default set to 1800 seconds (30 minutes), which is very aggressive and forces many unwanted disconnections throughout the work day. Newer releases increased default value to 43200s (12 hours).

Setting the value to 0 does not mean that session timeout is infinite and is not recommended as it can stop clients from performing fast roams on certain 9800 releases.

However, it is recommended to increase this value to 1 day (86400 seconds). This can be done via CLI under the policy profile config:

configure terminal
wireless profile policy <Policy-profile-name>
session-timeout 3600

or via webUI:

2. Idle Timeout

Recommendation:

Idle timeout = 3600 seconds

Idle timeout represents the time after which the client is kicked out of the network if it doesn’t send any data. By default, this value is set to 300 seconds (5 minutes), which is, again, too aggressive. Many client devices, especially phones, tend to go to sleep longer than 5 minutes, resulting in very frequent idle timeouts.

Once the client ungracefully leaves the network, their stale entry can remain on the WLC for the duration of the idle timeout, which is why it is not recommended to increase this value too much. It is recommended to increase this value to 3600 seconds (1 hour).

It can be changed under policy profile using the commands:

conf t
wireless profile policy <Policy-profile-name>
idle-timeout 3600

or via webUI:

3. Fast Roaming (802.11r) & OKC

Recommendation:

  • FT = Enabled
    Over the DS = Disabled
  • Auth Key Mgmt = 802.1x & FT+802.1x
    OKC = Enabled

Without fast roaming protocols, moving between 2 access points will result in your client device performing a full authentication process. For open and PSK secured networks, this is fine, as these types of authentication are generally fast. However, for dot1x secured networks, full authentication can take upwards of 1 second. This means that every time your client changes AP, you will experience 1s+ gap, which can be very noticeable if a user was on a voice/video call.

Fast 802.11r roaming protocol allows clients to roam within a few milliseconds. It can be set to disabled, enabled, adaptive mode and so called “mixed mode”, with or without Over the DS. Over the DS option is not widely supported by client devices, so it is recommended to keep it disabled and use the default, over the air mode.

Adaptive mode is a special mode supported mostly by Apple and Samsung smartphones, which allows only them to fast roam. Everyone is still allowed to join.

Just setting FT to Enabled will allow only client devices that support it to join the network. Devices that do not support it will not be allowed to join.

However, setting FT to Enabled and changing the Auth Key Mgmt option to 802.1x & FT+802.1X will set it to so called “mixed” mode, where both clients that support and do not support Fast Roaming will be able to join. This is the recommended mode and the one that is compatible with the broadest set of client devices.

In order to set up the Mixed mode FT, run the commands under WLAN config:

conf t
wlan <Profile-Name> 1 <SSID-name>
security ft
security wpa akm ft dot1x

or do it via webUI:

As an alternative for devices that do not support it, there is an OKC roaming algorithm. It is slightly slower than 802.11r Fast Roaming and is enabled by default and it is recommended to keep it enabled.

4. DCA Algorithm Interval

Recommendation:

  • 2.4 GHz DCA interval = 4 hours
  • 5 GHz DCA interval = 8 or 12 hours
  • 6 GHz DCA Interval = 8 or 12 hours

Dynamic Channel Assignment algorithm is a process responsible for assigning the channel to all APs and runs by default every 10 minutes. This means that potentially, the channel that AP can change every 10 minutes, increasing the chance that channel will change and the chance that client will be disconnected.

DCA algorithm intervals are configured globally using command:

configure terminal
ap dot11 24ghz rrm channel dca interval 4
ap dot11 5ghz rrm channel dca interval 12
ap dot11 6ghz rrm channel dca interval 12

under in WebUI under Configuration > RRM > 2.4/5/6 GHz > DCA:

Do not forget to make the change for 2.4, 5 & 6 GHz networks.

5. DCA Channel Width

Recommendation:

On 5 GHz — Static 20/40 MHz

On 6 GHz — Static 20/40/80 MHz

Having static channel width across the whole network makes it easier for clients to scan around the network. If client is already connected to a 20 MHz channel, it will first scan all 20 MHz channels before it starts scanning the 40 MHz ones. Having channel width set to “Best” makes it possible to have a mix of 20/40/80 MHz wide channels in the network, slowing down the roaming process.

In order to make roaming more reliable, it is recommend to keep channel width static across all APs. Due to number of channels available in on 5GHz, it is not recommended to enable 80 MHz wide channels in usual office-like environment.

Change can be made globally or under RF profile using commands:

configure terminal
ap dot11 5ghz rrm channel dca chan-width 20

ap dot11 5ghz rf-profile <RF-Profile-Name>
channel chan-width 20
ap dot11 6ghz rf-profile <RF-Profile-Name>
channel chan-width 20

or via webUI:

Global config — value should never be set to Best
Config under RF profile

6. EAP Broadcast Key Interval

Recommendation:

EAP-Broadcast Key Interval = 86400

By default, AP and all connected clients renegotiate the broadcast keys every 1 hour. This means that average office user has to perform this exchange 8 times per day, increasing the chances of something going wrong.

It is very common for client to be sleeping and not respond to an M5 packet coming from the AP requesting to exchange broadcast keys, resulting with client being deleted due to CO_CLIENT_DELETE_REASON_GROUP_KEY_UPDATE_TIMEOUT delete reason.

It is recommended to increase this value to 1 day (86400 seconds) using command:

wireless security dot1x group-key interval 86400

or in webUI under Configuration > Advanced EAP:

7. EAP Request & EAP Identity Request Timeout And Retries

Recommendation:

  • EAP-Identity-Request Timeout = 1s
    EAP-Identity-Request Max Retries = 10
  • EAP-Request Timeout = 1s
    EAP-Request Max Retries = 10
  • Client driver upgrade

When dot1x clients that do not support Fast Roaming have to move from one AP to another, they have to perform a full dot1x authentication, including the whole EAP exchange process.

During this exchange process, WLC sends out 2 important packets EAP Identity request and EAP request, to which the client is expected to respond to. If the client doesn't respond, WLC will wait 30 seconds to re-transmit the packet. It attempts a total of 2 re-tries before it gives up and kicks the client out. This means that if client is failing to respond, it can take up to a minute for the whole process to fail.

Most commonly, due to driver side bugs, the devices fail to respond to Identity request packet. This issue is commonly followed by a CO_CLIENT_DELETE_REASON_CLIENT_EAP_TIMEOUT_FAILURE and CO_CLIENT_DELETE_REASON_CLIENT_EAP_ID_TIMEOUT and the error 5440 Endpoint abandoned EAP session and started new on Cisco ISE.

Setting this value to 10 retries every 1 second insures that WLC attempts to reach out to client more aggressively. Changes to these values do not guarantee that issue will be resolved. The only way to fix it is to upgrade the drivers of the device and hope that the client device starts responding to the EAP (identity) request packets.

Timers can be configured using

conf t
wireless security dot1x identity-request retries 10
wireless security dot1x identity-request timeout 1
wireless security dot1x request retries 10
wireless security dot1x request timeout 1

or via webUI under Configuration > Advanced EAP:

8. WPA3 & PMF

Recommendation:

WPA3 Policy = Disabled

PMF = Disabled

WPA3 and Protected Management Frame (PMF) feature have been out for quite a few years, but there is still a lot of issues to be worked out with a lot of older client devices.

It is recommended to disable WPA3 support and Protected Management Frame (PMF), especially on non-enterprise network where you do not have control over the clients. It can be done via webUI under WLAN config:

9. DHCP Required

Recommendation:

DHCP Required = Disabled

With this feature enabled, the client devices are required to perform DHCP every time they join the network. All the clients will do this by default. However, the problems come up when they start roaming.

Most of the time, devices will perform only part of the DHCP DORA (Discover — Offer — Request — ACK) process when they move from AP to AP by echanging only Request + ACK to speed the roaming up. Certain client devices will, however, after moving from AP to AP, decide to not perform DHCP at all and simply re-use the old address — essentially behaving like a client with static IP address.

With the option DHCP Required enabled, these devices that do not perform DHCP after roaming will not be allowed to join and eventually will be kicked out. Most commonly, they will automatically recover when after they kicked out by performing the whole DHCP DORA process from scratch, but this will still cause a significant gap in connectivity. These occurrences are followed by a CO_CLIENT_DELETE_REASON_IPLEARN_CONNECT_TIMEOUT delete reason or by a CO_CLIENT_DELETE_REASON_MN_AP_IPLEARN_TIMEOUT in case of flex connect deployments.

This option affects only clients performing 802.11i slow roaming (which is essentially a full authentication when they roam from AP to AP). This does not affect clients performing OKC or 802.11 Fast Roaming. Keep in mind that clients that do support 802.11 Fast Roaming might occasionally not perform fast roam and opt out to do slow, full authentication roam.

It is recommended to have this feature disabled under policy profile using a command:

configure terminal
wireless profile policy <Policy-profile-name>
ipv4 dhcp required

or via WebUI:

10. RX-SOP

Recommendation:

RX-SOP = Auto (disabled)

RX-SOP is a feature where AP drops all packets heard at an signal strength worse than the configured one. The intended use of this feature is for an AP to drop packets sent out by clients that are very far away from the AP in hopes that the client will realise that AP is not responding and try to roam to a new one.

It offers 4 values. 3 predefined ones and a custom one:

  • High = -79 dBm on 2.4GHz & -76 dBm on 5GHz
  • Medium = -82 dBm on 2.4GHz & -78 dBm on 5GHz
  • Low = -85 dBm on 2.4GHz & -80 dBm on 5GHz
  • Custom = defined by user

In reality, this feature has very little effect on client roaming decisions. My tests with MacBook and Android phones show that sticky clients would repeatedly try to talk to an AP that is far away and having RX-SOP enabled had almost no effect on their roaming decision. The only result of having RX-SOP enabled is that instead of having bad connection/signal when trying to talk to a far away AP, the client would have no connectivity at all due to packets being dropped due to RX-SOP.

In deployment where there was no proper active post site deployment survey performed, this feature can cause very apparent outages for sticky clients, especially the ones that are performing voice/video calls.

This feature should be disabled (set to auto) globally or under the RF profile, depending on what is in use using commands:

configure terminal
ap dot11 5ghz rx-sop threshold auth
ap dot11 24ghz rx-sop threshold auth

ap dot11 24ghz rf-profile <24GHz-RF-profile-name>
high-density rx-sop threshold auto
ap dot11 5ghz rf-profile <5GHz-RF-profile-name>
high-density rx-sop threshold auto

or via WebUI:

This is global config
This is RF profile config. Keep in mind that predefined RF profiles cannot be changed

--

--