AWS Redshift WLM(Work Load Management)

7 min readMay 25, 2020

Workload Management

Amazon Redshift workload management (WLM) enables users to flexibly manage priorities within workloads so that short, fast-running queries won’t get stuck in queues behind long-running queries.

Basically, when we create a redshift cluster, it has default WLM configurations attached to it. In default configuration, there are two queues. First is for superuser with concurrency of 1 and second queue is default queue for other users with concurrency of 5. Which means that users, in parallel, can run upto 5 queries. So for example, if this queue has 5 long running queries, short queries will have to wait for these queries to finish. To solve this problem, we use WLM so that we can create separate queues for short queries and for long queries. WLM also gives us permission to divide overall memory of cluster between the queues.

Redshift Parameter Group

In Amazon Redshift, you associate a parameter group with each cluster that you create. The parameter group is a group of parameters that apply to all of the databases that you create in the cluster. These parameters configure database settings such as query timeout and datestyle. WLM is part of parameter group configuration.

Defining Query Queues

When users run queries in Amazon Redshift, the queries are routed to query queues. Each query queue contains a number of query slots. Each queue is allocated a portion of the cluster's available memory. A queue's memory is divided among the queue's query slots. You can configure WLM properties for each query queue to specify the way that memory is allocated among slots, how queries can be routed to specific queues at run time, and when to cancel long-running queries. You can also use the wlm_query_slot_count parameter, which is separate from the WLM properties, to temporarily enable queries to use more memory by allocating multiple slots.

By default, Amazon Redshift configures the following query queues:

● One superuser queue. The superuser queue is reserved for superusers only and it can't be configured. You should only use this queue when you need to run queries that affect the system or for troubleshooting purposes. For example, use this queue when you need to cancel a user's long-running query or to add users to the database. You should not use it to perform routine queries.

● One default user queue. The default queue is initially configured to run five queries concurrently. You can change the concurrency, timeout, and memory allocation properties for the default queue, but you cannot specify user groups or query groups. The default queue must be the last queue in the WLM configuration. Any queries that are not routed to other queues run in the default queue.

You can add additional query queues to the default WLM configuration, up to a total of eight user queues. You can configure the following for each query queue:

● Concurrency level

Queries in a queue run concurrently until they reach the WLM query slot count, or concurrency level, defined for that queue. Subsequent queries then wait in the queue.
Each queue can be configured with up to 50 query slots. The maximum WLM query slot count for all user-defined queues is 50. The limit includes the default queue, but doesn’t include the reserved Superuser queue.
Note: WLM concurrency level is different from the number of concurrent user connections that can be made to a cluster. The maximum number of concurrent user connections is 500.

● User groups

A comma-separated list of user group names. When members of the user group run queries in the database, their queries are routed to the queue that is associated with their user group.

● Query groups

A comma-separated list of query groups. When members of the query group run queries in the database, their queries are routed to the queue that is associated with their query group.

● WLM memory percent to use

The percentage of memory to allocate to the queue. If you specify a memory percentage for at least one of the queues, you must specify a percentage for all other queues, up to a total of 100 percent. If your memory allocation is below 100 percent across all of the queues, the unallocated memory is managed by the service. The service can temporarily give this unallocated memory to a queue that requests additional memory for processing.

● WLM timeout

WLM timeout doesn’t apply to a query that has reached the returning state. To view the state of a query, see the STV_WLM_QUERY_STATE system table. COPY statements and maintenance operations, such as ANALYZE and VACUUM, are not subject to WLM timeout.
The function of WLM timeout is similar to the statement_timeout configuration parameter, except that, where the statement_timeout configuration parameter applies to the entire cluster, WLM timeout is specific to a single queue in the WLM configuration.
If statement_timeout is also specified, the lower of statement_timeout and WLM timeout (max_execution_time) is used.

● WLM query queue hopping

A query can be hopped due to a WLM timeout or a query monitoring rule (QMR) hop action.
When a query is hopped, WLM attempts to route the query to the next matching queue based on the WLM queue assignment rules. If the query doesn’t match any other queue definition, the query is canceled. It’s not assigned to the default queue.
The following table summarizes the behavior of different types of queries with a WLM timeout.

The following table summarizes the behavior of different types of queries with a QMR hop action.

● Query monitoring rules

Query monitoring rules define metrics-based performance boundaries for WLM queues and specify what action to take when a query goes beyond those boundaries.

https://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-query-monitoring-rules.html

Modifying the WLM Configuration

You might need to reboot the cluster after changing the WLM configuration.

Short Query Acceleration

Short query acceleration (SQA) prioritizes selected short-running queries ahead of longer-running queries. SQA executes short-running queries in a dedicated space, so that SQA queries aren’t forced to wait in queues behind longer queries. SQA only prioritizes queries that are short-running and are in a user-defined queue.CREATE TABLE AS (CTAS) statements and read-only queries, such as SELECT statements, are eligible for SQA.

SQA is enabled by default in the default parameter group and for all new parameter groups. To disable SQA in the Amazon Redshift console, edit the WLM configuration for a parameter group and deselect Enable short query acceleration.

When you enable SQA, your total WLM query slot count, or concurrency, across all user-defined queues must be 15 or fewer. If you enable SQA using the AWS CLI or the Amazon Redshift API, the slot count limitation is not enforced.

Monitoring SQA

To check whether SQA is enabled, run the following query. If the query returns a row, then SQA is enabled.

select * from stv_wlm_service_class_config
where service_class = 14;

WLM Queue Assignment Rules

If a user is logged in as a superuser and runs a query in the query group labeled superuser, the query is assigned to the Superuser queue.
If a user belongs to a listed user group or if a user runs a query within a listed query group, the query is assigned to the first matching queue.
If a query doesn’t meet any criteria, the query is assigned to the default queue, which is the last queue defined in the WLM configuration.

https://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-queue-assignment-rules.html

Assigning Queries to Queues

https://docs.aws.amazon.com/redshift/latest/dg/cm-c-executing-queries.html

WLM Dynamic and Static Configuration Properties

The WLM configuration properties are either dynamic or static. If you change any of the dynamic properties, you don’t need to reboot your cluster for the changes to take effect. While dynamic changes are being applied, your cluster status is modifying. If you add or remove query queues or change any of the static properties, you must restart your cluster before any WLM parameter changes, including changes to dynamic properties, take effect.

The following WLM properties are static:

● User groups

● User group wildcard

● Query groups

● Query group wildcard

The following WLM properties are dynamic:

● Concurrency

● Percent of memory to use

● Timeout

If the timeout value is changed, the new value is applied to any query that begins execution after the value is changed. If the concurrency or percent of memory to use are changed, Amazon Redshift transitions to the new configuration dynamically so that currently running queries are not affected by the change.

WLM Query Monitoring Rules