Index Only access with Oracle, MySQL, PostgreSQL, and Microsoft SQL Server
In my previous post about the advantages of index access over full table scans, I mentioned covering indexes. This is when an Index Range Scan can retrieve all columns without going to the table. Adding to an index all the columns used by the SELECT or WHERE clause is an important tuning technique for queries that are around the inflection point between index access and table full scan. But not all RDBMS are equal. What PostgreSQL calls ‘Index Only’ actually reads the table, except for static data with no concurrent modifications.
I’ll show the execution plans for this Index Only access on Oracle, MySQL, PostgreSQL, and MS SQLServer. As my skills on the non-Oracle ones are very limited, do not hesitate to comment if you think something is not correct.
I create a DEMO table
As I don’t want to bother with datatypes names. I use CREATE TABLE AS SELECT
This works on MySQL and PostgreSQL:
create table DEMO as select 1 A,1 B,1 C union all select 1 A,1 B,1 C
With Oracle I need to mention a table and DUAL is there for this goal:
create table DEMO as select 1 A, 1 B, 1 C from dual
union all select 1 A, 1 B , 1 C from dual;
With SQL Server , no ‘Create Table As Select’, but the INTO syntax is used to mention the table to create:
select 1 A, 1 B, 1 C into DEMO union all select 1 A, 1 B , 1 C;
This table has 2 rows. I’ll add more rows so that the optimizer/query planner can base its estimations on statistics.
My goal is to query with a predicate ‘where A=2’ returning a small number of rows, as I want to benefit from index access. This works on all databases: insert a lot of (1,1,1) with a cartesian product and a few (2,0,0)
insert into DEMO select 1,1,1
from DEMO a,DEMO b,DEMO c,DEMO d,DEMO e,DEMO f,DEMO g,DEMO h;
insert into DEMO values (2,0,0);
insert into DEMO select a.A, a.B, a.C from DEMO a;
create index DEMO_AB on DEMO(A,B);
select A,count(*) from DEMO group by A;
Because this version of Oracle gathers statistics during the CTAS but not for further DML, I delete statistics to rely on dynamic sampling:
begin dbms_stats.delete_table_stats('','DEMO'); end;
For MySQL I don’t know how to get the execution plan with execution statistics, so I simply used EXPLAIN
explain select sum(A) from DEMO where A=2;
explain select sum(B) from DEMO where A=2;
explain select sum(C) from DEMO where A=2;
The goal is to show the plan when I select only the column used for access(A), or an additional column found in the index (B) or a column which exists only in the table. The one with B is the one where I expect Index Only access.
For PostgreSQL the EXPLAIN statement can execute and show execution statistics:
select sum(A) from DEMO where A=2;
For SQL Server, this is done with SET STATISTICS PROFILE before running the statements:
set statistics profile on
For Oracle, I used DBMS_XPLAN :
select /*+ gather_plan_statistics */ sum(B) from DEMO where A=2;
select plan_table_output from dbms_xplan.display_cursor(format=>'allstats last +projection');
I’ve run them with db<>fiddle so that you can see the exact execution with the link provided.
When I access only A or B there is no need to go to the table with Oracle. This is a true Index Only access.
The +PROJECTION format shows which columns are returned by each operation and here it is clear that B comes from the index. No need to go to the table for each index entry.
The last example which selects column C is the only one where Oracle has to access the table:
In summary, with Oracle, I have to go to the table only when I need a column from it that is not in the index. This is a very interesting solution when I have a range scan which selects a large number of rows, where accessing rows scattered in the table is expensive, but where the table is too large for a full table scan to be efficient. Then the Index Only access is a solution. This technique is also known as Fat Index (from Tapio Lahdenmaki work).
PostgreSQL, since 9.2, has an access path explicitly called Index Only but if you run my example, you will not see it. And even when you query only the column A you will access the table:
This is just an optimized version of going to the table for each index entry, where a bitmap from index entries is built before scanning the table. The query planner may have chosen Index Only access, but it would be less efficient because - despite its name - Index Only access will finally access to the table rows. You see ‘Heap Fetches’ when it is the case. I blogged about it in the past. The reason is that PostgreSQL MVCC (multi-versioning to be able to read without blocking writes) is at Tuple level, and the index has no information about what is committed or not.
Postgres has a little optimization for this, using the visibility map, but this is maintained asynchronously by the vacuum process. In my db<>fiddle example you will see a real Index Only after a full vacuum:
In summary, PostgreSQL Index Only is really ‘Index Only’ when you see ‘Heap Fetches: 0’ and this means that the ‘fat index’ optimization is possible only for tables with rare modifications and frequent vacuum.
MySQL with the InnoDB engine is MVCC at block level like Oracle, where versioned index blocks contain the information about the visibility. You can see that when I select column A or B which are in the index, the ‘Extra’ information of the execution plan mentions ‘Using Index’ which means that there’s no access to the table:
SQL Server 2017
SQL Server can do Index Only:
I can see the difference when selecting the C column which is not in the index. The need to go to the table for each index entry (RID Lookup) makes the optimizer chose a Table Scan for this example (except when hinted):
As db<>fiddle is down for SQLite when writing this, I’ve run it with sqlfiddle http://sqlfiddle.com/#!5/59a60/4 and I see ‘COVERING’ mentioned when selecting only A or B:
SEARCH TABLE DEMO USING COVERING INDEX DEMO_AB (A=?)
Oracle and MySQL with InnoDB engine are the ones who can really do Index Only, aka Covering Index, in a high OLTP (where readers do not block writers). SQL Server can also do it, but with reads blocking writes (except with snapshot isolation). PostgreSQL can also do it, with non-blocking reads, but not efficiently when the tables have concurrent modifications.
Of course, my favorite is Oracle as this access path, and the MVCC at block level, is there for decades. Here is Oracle 7.3 execution plan for the same example: