Tearing Down the Netezza Technocracy (or: Is my query using a Zone Map?)

Ross Leishman
Jul 4, 2018 · 9 min read

I used to be an Oracle developer. There, I said it. Actually, darn it all to heck, I’ll SHOUT it.

I USED TO BE AN ORACLE DEVELOPER, AND I’M PROUD.

You know what I liked about Oracle? If you wanted to know how it was going to execute a query, you did an Explain Plan. Oracle parsed it, decided how it was going to execute the query (like, for instance, which indexes it would use), then — and I can’t stress this bit enough — IT TOLD YOU WHAT IT WAS GOING TO DO.

But this is not an Oracle blog. You didn’t come here to listen to me rhapsodise about a database technology you don’t even have. No sir, you’re a Netezza dev and let me guess:

  • You have a table
  • Your table has a column
  • Your column has a Zone Map
  • You have NO FREAKIN IDEA whether your query is using the Zone Map.

Welcome to my world a few weeks ago. It’s almost like IBM are making it deliberately difficult, fostering a digital elite of technocrats where the humble developer cannot hope to tune their own queries.

Well I hope your job doesn’t rely on you being one of those digital elite because today I’m shedding some light on Zone Map usage and tearing down at least one of those walls protecting the Netezza Technocracy.

So, what is a Zone Map anyway?

Objection, Your Honour! Asked and answered. See my earlier blog: Netezza Zone Maps — They’re The Same As Indexes, Right? Too lazy to click? Fine, it’s a bit like an index (except it’s not — you really should read that other blog). Like an index, a Zone Map is a database structure that helps optimise a query by skipping blocks containing no matching rows.

Zone Maps make queries run faster, so if you’ve got a slow query, knowing whether it’s already using a Zone Map — and which Zone Map — is pretty key.

When is a Zone Map helpful?

Zone Maps act on filtering WHERE clauses (not joining WHERE clauses) of the form

There are some odd cases that are outside the scope of this blog, but here are some examples where a Zone Map will probably NOT be used:

EXPLAIN

The first stop for anyone with any kind of database development experience should be to use EXPLAIN to show the execution plan. So let’s try that.

First, here’s my sample table, which I’ve pre-loaded with 14M rows:

And here’s my query:

Note that WHERE clause — it filters the 14M row table down to a single row, so it would be great for performance if Netezza could avoid reading 99% of the data. It’s also worth noting that the column AN_INTEGER is a BIGINT data type, and Netezza creates Zone Maps for all integer, date, and timestamp columns by default, so we know a Zone Map exists for it to use.

Let’s try to get that plan. We can prefix the query with EXPLAIN VERBOSE, or if we’re using SQL Aginity, just click the Explain button.

Okay, so it doesn’t explicitly say it’s using a Zone Map, but that estimate of 1 row looks promising. Does that mean it looked at the Zone Map and figured out there’s only one matching row? Who can tell? Not me.

Let’s try a counter example by querying on a column without a Zone Map. We’ll use the column A_STRING, which — being a VARCHAR — does not attract a default Zone Map.

Hmmm, same result. It seems it’s not the Zone Map that’s giving us that row estimate, it’s Netezza’s gathered statistics.

Introducing PLANFILE

If you hadn’t guessed already, EXPLAIN is next to useless. It won’t tell you whether Netezza plans to use a Zone Map. To the best of my knowledge (ie. I’m guessing), that’s because Netezza doesn’t make that decision at parse time, it makes it at execution time. Ie. You need to run the query before it decides whether to use a Zone Map.

Whenever you run a query, Netezza writes away a bunch of runtime statistics to a Plan File. Now there’s probably a bunch of tricky ways to view the plan file, but if you’re like me then you want the easiest way, which is to issue the command SHOW PLANFILE immediately after running a query. Here’s what happens:

* The Plan File is quite big, so I’ve chopped out a lot to keep this readable

Now I grant you, this is pretty hard to read, so we’re going to ignore most of it and focus only on these two lines

For every table you read, you’ll see these pairs of Snippet and Detail in the Plan File. The identifiers (eg. 49100_1) can be matched back to the plan. Ie. Snippet/Detail ending in “_1” matches line 1[00] from the Execution Plan higher up in the file, which gives you the table name.

Looking at the Detail line, we see:

zm — that looks promising. Surely it stands for Zone Map. Looking at the TechNote on IBM’s support site we read that:

  • zmread is the number of zonemap pages read.
  • zmhits is the number of zonemap page reads that were satisfied by the cache.

Well that’s no help. It means that it did read a Zone Map, but I don’t care whether the Zone Map was found in the cache (ie. read earlier from disk), I want to know whether we avoided reading part of the table as a result of what we found.

Looking further:

  • table is Min/Avg/Max number of real table blocks.
  • scan is the Min/Avg/Max number of blocks that survived the zonemap filter.

BINGO. I can compare these two numbers and if scan is less than table, then the difference is the number of block reads we avoided.

Looking back at my query:

table 73/73.04/74 means that each Netezza S-Blade (or SPU) contains between 73 and 74 extents, with an average of 73.04 extents per SPU.

scan 73/73.04/74 means that after Zone Map optimisation, Netezza read between 73 and 74 extents per SPU, with an average of 73.04.

Ie. The Zone Map excluded no extents at all.

If we understand how Zone Maps work, this is no great surprise. For a Zone Map to be effective, the table needs to be more or less ordered on the zonemapped column (and if you’d read the blog I linked above, you’d know this). This often happens naturally with dates and some integers allocated from a sequence, where low values are found in older extents and high values in newer extents, but it is not the case for my table where values for AN_INTEGER are scattered throughout.

So what now? We’ve found that the query is using a Zone Map, but that it’s useless.

Q: Is there any way we can tune the Zone Map to make it more effective?

A: The answer is yes — we can re-order the table using an Organizing Key — but sadly that’s a topic for another blog.

Q: What if I don’t see the zmread/zmhits statistics? Does that mean it’s not using the Zone Map?

A: Correct. Either you’re querying on a column that has no Zone Map (eg. A String or non-integer Numeric), or it’s impossible to use a Zone Map because of your WHERE clause (functions, arithmetic, OR conditions — see above).

So what should it look like?

Here’s the Snippet/Detail line from a new version of this query after I rebuilt the table with an Organizing Key.

Note that this one has only one figure each for table and scan instead of Min/Avg/Max triplets — that’s because the Min, Avg and Max were all the same.

Also note that while each SPU contains 74 extents, only ONE extent was read from each SPU — a saving of 98% IO… thanks to Zone Maps.

Summary

So what did we learn?

  • EXPLAIN won’t tell you whether Netezza is using a Zone Map in your query.
  • We all hate EXPLAIN
  • SHOW PLANFILE after running a query displays the Plan File, from which you can determine whether the Zone Map was used, and if so, how effective it was.
  • We all love SHOW PLANFILE
  • Look for snippet/detail pairs of lines in the Plan File that contain table and scan statistics. For a complex query there will be many to choose from — some won’t be in pairs and some won’t be table scans. Match the suffix of the snippet ID back to the Execution Plan to find the table name.
  • If the scan figures are lower than the table figures, your Zone Map is working. If they are much lower, your Zone Map is working splendidly.

Easy, huh? Okay, it’s not, I agree, but the Netezza Technocracy is a formidable foe. Instead of knocking down the wall, maybe I’ve just loosened a brick or two. There’s more to come, though, so follow the DWS+Symplicit blog to get it all in your in-box.

* * * *

If you thought this was amusing, leave me a “clap”. If you thought it was informative, leave me a few. Thanks for reading, and watch out for more Netezza tuning fun in the DWS+Symplicit blog.

DWS GROUP | DWS, Symplicit, Phoenix, Projects Assured

Some thoughts from the folk at DWS Group

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade