AWS Redshift: Troubleshooting made simple

Arnaud Milleker
Jan 2, 2020 · 10 min read

Welcome! Here you will discover how to make Redshift great again by identifying the exact problem you have, and the adequate solution for it.

At the beginning, everything was so simple. You were alone on your beautiful new cluster. He was new, big, and so fast. But quickly, more and more people are working on it. Just like invaders on a new promise land, they plunder the wealth of your cluster without thinking about consequences.

Pioneers discovering Redshift

Only one person is responsible for it: You! After a while, you encounter some problems that are penalizing:

  1. Disk full error
  2. Your query is not running
  3. Your query is slow

We will see together how to tackle those problems. This article also discuss the basics of a performant Redshift cluster, where you can improve a query by an average x20-x30 time, especially Distkeys and Sortkeys parts. Feel free to use it for troubleshooting, but also for performance tuning.

Disk Full

error: Disk full

It can’t be possible. Not during your watch. You run toward your admin interface (AWS Portal > Redshift > Clusters > Your cluster > Cluster performance > Percentage disk) … And you see it: The Great Wave off Kanagawa

Disk hits 100%

This is the apocalypse. When disk hits more than 100%, every query is killed (not only yours). So, you can expect to have every user coming to you to know what did happen. Fun fact: the user often thinks he is the source of the problem and keeps it to himself, ashamed.

What did happen? There was not enough memory space, so the query used the disk to store the temporary needed data. This process is called “SWAP”. But again, it was not enough. All the disk was used, causing a massive crash. We will see how to avoid this through volume management.

To identify the tables that need your attention the most:

SELECT “table”, size, tbl_rows
FROM svv_table_info
ORDER BY size DESC;

Architecture for volume management

  1. Good practice: 66% of disk used by data when there is no Swap. If you are above it and there is nothing to reduce it, you need more nodes. Choose classic resizing if you add some nodes.
  2. If you need temporary space, you can do an elastic resizing of your cluster to get more space. As it is a fake resizing (it redistributes the slice, not the data, which can induce skewness), go back to your previous configuration after doing it.

Table management for volume improvements

  1. Remove tables that are never used (not scanned since 30 days). You can find a script here.
  2. Reduce at maximum the varchar columns as they are responsible for most of your disk space use. You can use the following query to know the max varchar size of a column:
-- get max varchar size of your_column
SELECT max(octet_length(your_column)) FROM your_table

Query optimization for volume handling

  1. Never use select *, take only the column you need.
  2. Limit the source tables with “where” filters.
  3. Avoid data redistribution between nodes that multiplies the volume of your query SEVERAL times. Use Distkeys, which indicates how your data are split between your nodes. For example, if you have several tables with client_id and you know you will make JOIN with it, use it as the Distkey.
  4. Do regular Vacuum and Analyze, so Redshift can use the Distkeys.
  5. No nested SELECT, they don’t remember the Distkey.
  6. When you JOIN data coming from Spectrum, use a temporary table with Distkey to avoid data redistribution from a huge Spectrum table.
  7. Read the “query plan” to know if there is a data redistribution:
Query plan: look at uppercases, if the last one is not “NONE” then there is a data redistribution - and it’s bad!

The good thing when you have a full disk error is that at least, you had your query running!…

Query is not running

-- displays every query that run or will run
SELECT user_name, db_name, pid, query
FROM stv_recents WHERE status = ‘Running’;
-- displays every running queries
SELECT * FROM stv_inflight;

If your queries are not running, you will see it in the result of the first query, but not in the second. What if there are more queries like yours? You can see it in the AWS Redshift interface: AWS Portal -> Redshift -> Clusters -> Select your cluster -> Database performance

Waiting time in green

You can see that on Dec-30 at 2:40 (ETL scheduling), we had more waiting time than query execution (reading + writing time). There are 4 possible causes: locks, bad queues, architecture, bad performance.

Locks

SELECT * FROM svv_transactions;

If “lock_mode” is “AccessExclusiveLock”, you have a locked item. To unlock the item, you can either wait, or delete the query by doing:

SELECT pg_terminate_backend(your_pid_here);

In some cases, the query can be slow to revert (as Redshift needs to rollback some tasks). In other cases, this “terminate” command is not enough and you have to … Reboot the cluster. It will take several minutes.

Bad queues and WLM management

  1. Managing query priority
  2. Prevent SWAP on disk

As I won’t focus on WLM here, I can’t develop all the possibilities. I prefer manual WLM for all the control it gives compared to elastic WLM (didn’t work on our cluster because of the reporting tool).

WLM can allocate minimum memory always available for a user (you, the reporting tool, ETLs, …). The downside is that if you give 10% reserved memory for the ETL, those 10% will not be used, even if there is no ETL running. That’s why we do not use “Memory %”. As we are only using WLM to avoid “Disk full” and memory SWAP, limiting the memory would not be a good idea. Note that the AWS advice is to have around 15 queues.

Why and how do you prevent Redshift SWAP with WLM? SWAP uses disk, which is much slower than memory, and you want to avoid it. By knowing how many queries of the same type can run on your cluster, you can prevent SWAP (or other negative effects) with WLM. Here is an example with our cluster:

  • 3 queues for ETLs and automates. They hit the CPU hard and slow each other if there are more than 3 queries doing Insert/Copy/Drop/Truncate/…
  • 10 queues for reporting. They are fast but come all together. With 10 queries at the same time we rarely hit the disk for swap.
  • 3 queues for everything else. We can only have 3 user queries at the same time, which is enough for 25 data people. We added a timeout at 15 minutes for bad queries. NB: 15 minutes for a SELECT is often a REALLY bad performance.

To know if your query is blocked by WLM:

SELECT * FROM stv_wlm_query_state ORDER BY query;

You will see the current state (waiting or running) of your query depending on the WLM with the queue_time.

Architecture

  1. Stream events: prefer Kinesis or Lambda to aggregate bulk of data.
  2. Do multiple insert to load row by row. Prefer COPY.
  3. Connect a reporting tool directly on Redshift. Every common reporting tool has a cached/in-memory database. Put the Redshift data in it.

The best way to avoid blocked queries, is to have only quick queries. Let’s see how we can do this…

Your query is slow

Your cluster nodes

Distkeys

  1. Diststyle key:Data are distributed across nodes by the key. The same keys are on the same node, even between different tables. Example: the distkey of table1 and table2 is the customer_id. If you join both tables by customer_id, data are joined locally. It spares a lot of networking and memory, as you move only a subresult and not the full data. Do it if the table is big, with a lot of different values on the key, with data well distributed between keys, and the key regularly used in the JOINs.
  2. Diststyle ALL: It puts the full table on every node. It’s interesting when the table is small and you have several JOIN keys in it. With this, every join on this table will be done on the local node. The data will never be replicated during a query as they are already on the node.
  3. Diststyle EVEN: The data are redistributed equally between nodes. Do it when the other two solutions aren’t available.
Table 1: distkey(col_id_T1), Table 2: distkey(C_ID_T2). Same IDs are on the same slice. If they are JOIN together, there is no data network

Again, the main goal of distkeys is to avoid any broadcast of data between nodes. By doing so, you also want your nodes to be equally loaded. If data are not equally distributed, you will have “skewness”: the overloaded node will always be late compared to others.

It’s not EQUAL, you have a 6% difference. You lose a 6% performance because the overloaded node needs to work 6% more

To identify table with skewness, you can use this link.

-- to know the slice distribution between nodes. It should be exactly equal. If not, you need to do an manual (not elastic) resizing of your cluster
SELECT node, count(*) FROM stv_slices GROUP BY 1;

Sortkeys

Compound sortkey

This is the default sortkey. It orders the data by the column that you specified, ordered by the priority you specified. Example: sortkey(id, date, city) will order your data by id, then by date for each id, and finally by city for each date for each id. It will help a lot the WHERE filter. Be aware that if the city is in the WHERE clause, but not the date, your sortkey will probably be a handicap as you will still need to get every block for specific city. You need to be sure that the first sortkey will always be used if you want the others. Dates are usually good compound sortkeys as they are often used in queries.

compound sortkey (col_id_T1, date, city) with 3 values by blocks. For id1, Redshift will know in advance BLOCK1 & 2 are good. For Paris, it needs to check every blocks

Interleaved sortkey

With this type of sortkey, you order the data by block without priority. If you know that you will want to filter either by date (without city), or by city (without date), or by other columns, this is the good sortkey.

One optimization above all: the Merge Join

When your distkey and your first compound sortkey are the same, Redshift uses the “merge join”. This join is incredibly fast and doesn’t need to use memory. As soon as your distkey can be the value you use to join tables, you should go for it. As you will have a sortkey with this join value, “WHERE” clauses won’t be so effective, but this is a small loss for a big win. This is super effective because Redshift has only one scan to do on both tables. As they are sorted alike (sortkey) with the same node distribution (distkey), the join can be done at the same time.

Merge join mechanism

Example: A perfect match for this would be every customer table. “customer_id” is a good distkey (high cardinality, data split more or less equally between customers, very often used for join). In this case, you put “customer_id” as a distkey and sortkey of every table containing it.

If you want to fully understand and get recipe for distkeys and sortkeys, I strongly recommend you this AWS article.

Reminders

  • Run regular Vacuum and Analyze (after ETL for small tables, end of day for big tables). Be careful, it’s a long and heavy process. You need this because sortkeys are not respected by some queries (“INSERT INTO” for example), and tables <80% sorted can’t use sortkeys optimizations.

I hope you liked the article. More than just a troubleshooting review, these tips are my best practices. I regularly use these tricks to reduce queries time by 90–95% … Really! Feel free to ask any advice on Linkedin.

If you want more technical news, follow our journey through our docto-tech-life newsletter.

Unleash your Redshift !

Doctolib

Improving Healthcare for Good

Arnaud Milleker

Written by

Passionate about Data, startup, gaming … And love to share that

Doctolib

Doctolib

Founded in 2013, Doctolib is the fastest growing e-health service in Europe. We provide healthcare professionals with services to improve the efficiency of their organization, transform their patients’ experience, and strengthen cooperation with other practitioners. We help pati

Arnaud Milleker

Written by

Passionate about Data, startup, gaming … And love to share that

Doctolib

Doctolib

Founded in 2013, Doctolib is the fastest growing e-health service in Europe. We provide healthcare professionals with services to improve the efficiency of their organization, transform their patients’ experience, and strengthen cooperation with other practitioners. We help pati

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store