Single SQL to count rows in all tables
Oracle 18c XE
Well that was easy, not a COUNT(*) in sight and the query completes in 20ms according to SQL Developer. But why are last_analyzed and stale_stats queried and what is this data dictionary table user_tab_statistics?
Let’s delve into it…
To do demos in Oracle I like to use the Human Resources (HR) sample schema which ships with every version. This is a default install of 18c XE connecting to the standard port (1521) and the XE pluggable database (XEPDB1)
C:\>sqlplus hr/hr@//localhost:1521/XEPDB1SQL*Plus: Release 18.0.0.0.0 - Production on Thu Jul 4 16:15:34 2019
Version 18.4.0.0.0Copyright (c) 1982, 2018, Oracle. All rights reserved.ERROR:
ORA-28000: The account is locked.
Oops, that didn’t work. We’ll need to unlock the HR account first.
C:\>sqlplus sys/<password>@//localhost:1521/XEPDB1 as sysdbaSQL*Plus: Release 18.0.0.0.0 - Production on Thu Jul 4 16:12:21 2019
Version 18.4.0.0.0Copyright (c) 1982, 2018, Oracle. All rights reserved.Connected to:
Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production
Version 18.4.0.0.0SQL> ALTER USER hr IDENTIFIED BY hr ACCOUNT UNLOCK;User altered.
Log into the HR account using HR as a password we IDENTIFIED BY above:
C:\>sqlplus hr/hr@//localhost:1521/XEPDB1SQL*Plus: Release 18.0.0.0.0 - Production on Thu Jul 4 16:17:28 2019
Version 18.4.0.0.0Copyright (c) 1982, 2018, Oracle. All rights reserved.Last Successful login time: Thu Jul 04 2019 16:14:47 +08:00Connected to:
Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production
Version 18.4.0.0.0SQL>
Now we can run a simple query:
SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYYMMDD';
SQL> COLUMN last_analyzed FORMAT 99999999 HEADING 'Last|Analyzed'
SQL> COLUMN num_rows FORMAT 99999 HEADING 'Num|Rows'
SQL> COLUMN table_name FORMAT a19
SQL> SELECT table_name, num_rows, last_analyzed
2 FROM user_tables;
Num Last
TABLE_NAME Rows Analyzed
------------------- ------ --------
REGIONS
COUNTRIES
LOCATIONS
DEPARTMENTS
JOBS
EMPLOYEES
JOB_HISTORY7 rows selected.SQL>
Unfortunately we have NULL in the num_rows column when we were expecting numbers. [1]
These row counts are calculated in real time, they come from statistics. The freshness of the statistics dictate the accuracy. It is the LAST_ANALYZED column which tell us when the stats were most recently gathered, NULL means there are no statistics at all. So let’s gather some, first for a single table and then for all the tables in the schema to see the different methods.
SQL> SET TIMING ON
SQL> EXECUTE dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'DEPARTMENTS',estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,cascade=>TRUE,degree=>DBMS_STATS.DEFAULT_DEGREE,method_opt=>'FOR ALL INDEXED COLUMNS');PL/SQL procedure successfully completed.Elapsed: 00:00:00.20
SQL> SET TIMING OFF
SQL> SELECT table_name, num_rows, last_analyzed
2 FROM user_tables;
Num Last
TABLE_NAME Rows Analyzed
------------------- ------ --------
REGIONS
COUNTRIES
LOCATIONS
DEPARTMENTS 27 20190704
JOBS
EMPLOYEES
JOB_HISTORY7 rows selected.SQL> SET TIMING ON
SQL> EXECUTE dbms_stats.gather_schema_stats(ownname=>'HR',estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,cascade=>TRUE,degree=>DBMS_STATS.DEFAULT_DEGREE,method_opt=>'FOR ALL INDEXED COLUMNS');PL/SQL procedure successfully completed.Elapsed: 00:00:00.51
SQL> SET TIMING OFF
SQL> SELECT table_name, num_rows, last_analyzed
2 FROM user_tables;
Num Last
TABLE_NAME Rows Analyzed
------------------- ------ --------
REGIONS 4 20190704
COUNTRIES 25 20190704
LOCATIONS 23 20190704
DEPARTMENTS 27 20190704
JOBS 19 20190704
EMPLOYEES 107 20190704
JOB_HISTORY 10 201907047 rows selected.
Great, we got our row counts and the stats gathering took under a second but these are small tables. For larger tables generally we might want to gather sample statistics, not count every row. So let’s create a table with 10K rows:
SQL> CREATE TABLE test_stats_estimate AS
2 SELECT LEVEL id, CAST(DBMS_RANDOM.string('a',100) AS VARCHAR2(100)) txt FROM dual
3 CONNECT BY LEVEL <= 10000;Table created
But gather stats on only 10% of the table:
SQL> EXECUTE dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'TEST_SMALL_ESTIMATE_PERCENT',estimate_percent=>10,cascade=>TRUE,degree=>DBMS_STATS.DEFAULT_DEGREE,method_opt=>'FOR ALL INDEXED COLUMNS');PL/SQL procedure successfully completed.
What we find is that the number of rows is an approximation, not exactly 10K:
SQL> SELECT table_name, num_rows, last_analyzed FROM user_tables WHERE table_name LIKE 'TEST%'; Num Last
TABLE_NAME Rows Analyzed
------------------- ------ --------
TEST_STATS_ESTIMATE 9979 20190704
What happens if we delete half the rows in the table?
SQL> DELETE FROM test_small_estimate_percent WHERE ROWNUM <= 5000;5000 rows deleted.Elapsed: 00:00:00.08
SQL> SELECT COUNT(*) FROM test_small_estimate_percent;COUNT(*)
----------
5000SQL> SELECT table_name, num_rows, last_analyzed FROM user_tables WHERE table_name LIKE 'TEST%'; Num Last
TABLE_NAME Rows Analyzed
------------------- ------ --------
TEST_STATS_ESTIMATE 9979 20190704
We can see the difference in a COUNT() but there’s no change to num_rows. What about if we add new rows?
SQL> INSERT /*+ APPEND */ INTO test_small_estimate_percent
2 SELECT LEVEL id, CAST(DBMS_RANDOM.string('a',100) AS VARCHAR2(100)) txt FROM dual
3 CONNECT BY LEVEL <= 10000;10000 rows created.SQL> COMMIT;Commit complete.SQL> SELECT COUNT(*) FROM test_stats_estimate; COUNT(*)
----------
15000SQL> SELECT table_name, num_rows, last_analyzed FROM user_tables WHERE table_name LIKE 'TEST%'; Num Last
TABLE_NAME Rows Analyzed
------------------- ------ --------
TEST_STATS_ESTIMATE 9979 20190704
How about if we truncate the table?
SQL> TRUNCATE TABLE test_stats_estimate;
Table truncated.
SQL> SELECT COUNT(*) FROM test_stats_estimate; COUNT(*)
----------
0SQL> SELECT table_name, num_rows, last_analyzed FROM user_tables WHERE table_name LIKE 'TEST%'; Num Last
TABLE_NAME Rows Analyzed
------------------- ------ --------
TEST_STATS_ESTIMATE 9979 20190704
No matter what we do the statistics generated row count isn’t updated. Looks like this method of find a row count is useless — you not only need to have stats but they have to be current. Or is it useless? Oracle provides a method to update stale statistics. How does it know?
SQL> SELECT table_name, inserts, updates, deletes, truncated FROM all_tab_modifications WHERE table_owner = 'HR';TABLE_NAME INSERTS UPDATES DELETES TRU
------------------------------ ---------- ---------- ---------- ---
TEST_SMALL_ESTIMATE_PERCENT 10000 0 19839 YES
This information is summarized by Oracle into a single stale_stats flag:
SQL> SELECT table_name,num_rows,last_analyzed,stale_stats FROM user_tab_statistics;TABLE_NAME NUM_ROWS LAST_ANALYZED STALE_S
------------------------------ ---------- --------------- -------
TEST_SMALL_ESTIMATE_PERCENT 10023 20190704 YES
LOCATIONS 23 20190704 NO
DEPARTMENTS 27 20190704 NO
EMPLOYEES 107 20190704 NO
COUNTRIES 25 20190704 NO
REGIONS 4 20190704 NO
JOBS 19 20190704 NO
JOB_HISTORY 10 20190704 NO8 rows selected.
Here we can see that the statistics for our target table are stale. To have Oracle automatically gather stats for stale tables we can use:
EXECUTE DBMS_STATS.gather_schema_stats (ownname => 'HR', cascade => TRUE, options => 'GATHER AUTO');
Which leads us back to where this post started, with:
The automatic optimizer statistics collection job which calls DBMS_STATS package runs in predefined maintenance windows and these maintenance windows are open once a day during which various jobs including the gathering of statistics is performed. SYSTEM can see the job in the scheduler:
SELECT *
FROM dba_scheduler_jobs
WHERE job_name LIKE '%STAT%';
See my next post if you want to expand this to calculate disk usage.
Here’s the entire sqlplus script which can be run on the command line:
Using Oracle Database 18c Express Edition (XE) — Free download
[1] This won’t happen on a fresh DB install, I deleted the statistics using:
EXECUTE DBMS_STATS.DELETE_SCHEMA_STATS(ownname=>’HR’);