When Oracle Statistic Gathering times out.

In a previous post, I explained how to see where the Auto Stats job has been running and timed out:

I got a case where it always timed out at the end of the standard maintenance window. One table takes many hours, longer than the largest maintenance window, it will always be killed at the end. And, because it stayed stale, and staler each day, this table was always listed first by the Auto Stat job. And many tables never got their chance to get their stats gathered for … years.

In that case, the priority is to gather statistics. That can be long. Then I run the job manually:

exec dbms_auto_task_immediate.gather_optimizer_stats;

Here, it will never time-out (and the auto job will not start at maintenance window start). This manual gathering can take many days. Of course, this gives time to think about a solution, like reading Nigel Bayliss recommendations:

If I want to kill the manual job, because one table takes really too long and I decide to skip it for the moment, here is my query to find it:

select 'alter system kill session '''||sid||','||serial#||',@'||inst_id||''' /* '||action||' started on '||logon_time||'*/;' "Kill me with this:" from gv$session where module='DBMS_SCHEDULER' and action like 'ORA$AT^_OS^_MANUAL^_%' escape '^';

Which gives me the kill statement, and the time when I started it:

Before killing, I’ll check the long queries from it with the goal to find a solution for it:

select executions,users_executing,round(elapsed_time/1e6/60/60,1) hours,substr(coalesce(info,sql_text),1,60) info,sql_id from gv$sql natural left outer join (select address,hash_value,sql_id,plan_hash_value,child_address,child_number,id,rtrim(operation||' '||object_owner||' '||object_name) info from gv$sql_plan where object_name is not null) where elapsed_time>1e6*10*60 and action like 'ORA$AT_OS_%' order by last_active_time,id

In this example, I can see that one table is running for 4 days:

Now I kill this statistic gathering job. What I want for the moment is to exclude this table from the automatic statistics gathering. Unfortunately, I cannot change the AUTOSTATS_TARGET at table level, then I lock the stats. And run DBMS_AUTO_TASK_IMMEDIATE.GATHER_OPTIMIZER_STATS again.

This is just to quickly resolve the gap we had on many tables. The few tables locked will need further considerations. I even got a funny case where the statistics gathering was long because… statistics where stale. It was in 11g, an IOT where the CBO decided to with ‘db file sequential reads’. I deleted the statistics and the gathering used an optimized execution plan then. When you have really bad statistics, it may be better to have no statistics (and then do dynamic sampling) rather than completely stale ones.