awr_plan_change脚本中平均执行时间不正确浅析?
awr_plan_change.sql脚本是Kerry Osborne的一个脚本,这个脚本也是我非常喜欢并且经常使用的一个脚本。脚本如下所示
set lines 155
col execs for 999,999,999
col avg_etime for 999,999.999
col avg_lio for 999,999,999.9
col sql_id for a16
col begin_interval_time for a30
col node for 99999
break on plan_hash_value on startup_time skip 1
select ss.snap_id
, ss.instance_number node
, begin_interval_time
, sql_id
, plan_hash_value
, nvl(executions_delta,0) execs
,(elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/1000000 avg_etime
,(buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta)) avg_lio
from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS
where sql_id = nvl('&sql_id','799tsyjq9p1r6')
and ss.snap_id = S.snap_id
and ss.instance_number = S.instance_number
and executions_delta > 0
order by 1, 2, 3
/
正常情况下,这个脚本并没有什么问题,但是这个脚本在某些场景下,它计算SQL语句的平均执行时间(avg_etime)是有问题的(avg_etime在有些场景是不准确的)。下面以一个案例来简单分析一下。如下所示,SQL_ID=799tsyjq9p1r6是数据仓库中的一个SQL语句,使用awr_plan_change.sql分析这个SQL,发现它的平均执行时间(AVG_ETIME)有非常大的波动,如下所示,我们以2023-10-04为起点,如下所示,你会发现它的平均执行时间多的时候为1559.825秒,短的时侯为25.080秒,这个SQL并没有使用绑定变量,它的执行计划从2023-10-04号到2023-10-08号没有变化过(PLAN_HASH_VALUE没有变化),那么这个SQL的平均执行时间为什么有这么大的波动呢?
其实,确切来说,就是awr_plan_change.sql中计算的平均执行时间(AVG_ETIME)有误,首先要说明的是数据库的快照间隔(SNAP_INTERVAL)是30分钟,而这个SQL的执行时间超过AWR的快照间隔了。这个SQL的开始执行时间和结束时间,跨越了几个snap_id,我们可以使用sqlstat.sql分析如下所示:
DBW1:SYS>select
2 sql_id,
3 plan_hash_value,
4 TO_CHAR(BEGIN_INTERVAL_TIME,'yyyy-mm-dd hh24:mi:ss') AS BEGIN_INTERVAL_TIME,
5 TO_CHAR(END_INTERVAL_TIME,'yyyy-mm-dd hh24:mi:ss') AS END_INTERVAL_TIME,
6 executions_delta,
7 ROUND(ELAPSED_TIME_DELTA/(nonzeroexecutions*1000000),1) "Elapsed Average",
8 ROUND(CPU_TIME_DELTA/(nonzeroexecutions*1000000),1) "CPU Average",
9 ROUND(IOWAIT_DELTA/(nonzeroexecutions*1000000),1) "IO Average",
10 ROUND(CLWAIT_DELTA/(nonzeroexecutions*1000000),1) "Cluster Average",
11 ROUND(APWAIT_DELTA/(nonzeroexecutions*1000000),1) "Application Average",
12 ROUND(CCWAIT_DELTA/(nonzeroexecutions*1000000),1) "Concurrency Average",
13 BUFFER_GETS_DELTA/nonzeroexecutions "Average buffer gets",
14 DISK_READS_DELTA/nonzeroexecutions "Average disk reads",
15 trunc(PHYSICAL_WRITE_BYTES_DELTA/(1024*1024*nonzeroexecutions)) "Average disk write megabytes",
16 ROWS_PROCESSED_DELTA/nonzeroexecutions "Average rows processed"
17 from
18 (select
19 ss.snap_id,
20 ss.sql_id,
21 ss.plan_hash_value,
22 sn.BEGIN_INTERVAL_TIME,
23 sn.END_INTERVAL_TIME,
24 ss.executions_delta,
25 case ss.executions_delta when 0 then 1 else ss.executions_delta end nonzeroexecutions,
26 ELAPSED_TIME_DELTA,
27 CPU_TIME_DELTA,
28 IOWAIT_DELTA,
29 CLWAIT_DELTA,
30 APWAIT_DELTA,
31 CCWAIT_DELTA,
32 BUFFER_GETS_DELTA,
33 DISK_READS_DELTA,
34 PHYSICAL_WRITE_BYTES_DELTA,
35 ROWS_PROCESSED_DELTA
36 from DBA_HIST_SQLSTAT ss,DBA_HIST_SNAPSHOT sn
37 where ss.sql_id = '&sql_id'
38 and ss.snap_id=sn.snap_id
39 and ss.INSTANCE_NUMBER=sn.INSTANCE_NUMBER)
40 where ELAPSED_TIME_DELTA > 0
41 order by snap_id,sql_id;
Enter value for sql_id: 799tsyjq9p1r6
SQL_ID PLAN_HASH_VALUE BEGIN_INTERVAL_TIME END_INTERVAL_TIME EXECUTIONS_DELTA Elapsed Average CPU Average IO Average Cluster Average Application Average Concurrency Average Average buffer gets Average disk reads Average disk write megabytes Average rows processed
------------------ --------------- --------------------------------------------------------- ------------------------- ---------------- --------------- ----------- ---------- --------------- ------------------- ------------------- ------------------- ------------------ ---------------------------- ----------------------
799tsyjq9p1r6 1879238228 2023-09-24 14:00:37 2023-09-24 14:30:44 0 1312.3 56.3 1283.1 0 0 0 2983520 621742 0 0
799tsyjq9p1r6 1879238228 2023-09-24 14:30:44 2023-09-24 15:00:50 1 968.7 33.2 950 0 0 .1 2522751 328654 0 31635
799tsyjq9p1r6 1879238228 2023-09-25 14:30:16 2023-09-25 15:00:23 0 1581.8 52.1 1556.6 0 0 0 2216777 523756 0 0
799tsyjq9p1r6 1879238228 2023-09-25 15:00:23 2023-09-25 15:30:30 1 1238.1 49.7 1211.7 0 0 0 2457976 547253 0 26778
799tsyjq9p1r6 1879238228 2023-09-26 14:00:40 2023-09-26 14:30:47 0 182 6.6 178.9 0 0 0 249644 63126 0 0
799tsyjq9p1r6 1879238228 2023-09-26 14:30:47 2023-09-26 15:00:54 0 1806.2 61.5 1775.5 0 0 0 2761065 665285 0 0
799tsyjq9p1r6 1879238228 2023-09-26 15:00:54 2023-09-26 15:30:01 1 421.8 18.2 411.3 0 0 .1 1123432 243842 0 23641
799tsyjq9p1r6 1879238228 2023-09-27 14:00:09 2023-09-27 14:30:16 0 922.4 33.9 905.8 0 0 0 1579540 335758 0 0
799tsyjq9p1r6 1879238228 2023-09-27 14:30:16 2023-09-27 15:00:23 1 1753.4 58 1722.9 0 0 0 3440756 593153 0 28687
799tsyjq9p1r6 1879238228 2023-09-28 14:00:39 2023-09-28 14:30:46 0 948.9 32.6 933.9 0 0 0 1237907 301664 0 0
799tsyjq9p1r6 1879238228 2023-09-28 14:30:46 2023-09-28 15:00:53 1 1794.8 61.5 1764.1 0 0 0 2946573 637203 0 23879
799tsyjq9p1r6 1879238228 2023-09-29 14:00:10 2023-09-29 14:30:18 1 337.1 21.6 323.9 0 0 0 1693956 273991 0 9581
799tsyjq9p1r6 1879238228 2023-09-30 14:00:39 2023-09-30 14:30:46 0 1034.6 39.3 1015.1 0 0 0 1833993 405250 0 0
799tsyjq9p1r6 1879238228 2023-09-30 14:30:46 2023-09-30 15:00:53 0 1804 57.5 1771.7 0 0 0 4533708 541639 0 0
799tsyjq9p1r6 1879238228 2023-09-30 15:00:53 2023-09-30 15:31:00 1 72.9 3.7 70.5 0 0 0 369115 31273 0 38389
799tsyjq9p1r6 1879238228 2023-10-01 14:00:09 2023-10-01 14:30:17 0 118.9 4.3 116.8 0 0 0 179510 44841 0 0
799tsyjq9p1r6 1879238228 2023-10-01 14:30:17 2023-10-01 15:00:24 0 1806.4 53.9 1780.9 0 0 0 2070104 543732 0 0
799tsyjq9p1r6 1879238228 2023-10-01 15:00:24 2023-10-01 15:30:31 0 1806.2 56.2 1779 0 0 .1 2234512 470020 0 0
799tsyjq9p1r6 1879238228 2023-10-01 15:30:31 2023-10-01 16:00:38 1 1027.9 33.4 1010.2 0 0 0 1920920 339045 0 36342
799tsyjq9p1r6 1879238228 2023-10-02 14:00:40 2023-10-02 14:30:47 0 1447.3 46.4 1426.4 0 0 0 1699747 415567 0 0
799tsyjq9p1r6 1879238228 2023-10-02 14:30:47 2023-10-02 15:00:54 0 1806 59.1 1775.6 0 0 .1 3280821 595508 0 0
799tsyjq9p1r6 1879238228 2023-10-02 15:00:54 2023-10-02 15:30:01 1 221.6 9.1 216 0 0 .1 742730 96869 0 32297
799tsyjq9p1r6 1879238228 2023-10-03 14:00:10 2023-10-03 14:30:17 0 720.1 26.9 706.8 0 0 0 1153872 262211 0 0
799tsyjq9p1r6 1879238228 2023-10-03 14:30:17 2023-10-03 15:00:24 1 1399.9 47 1372.8 0 0 0 3553183 437388 0 26545
799tsyjq9p1r6 1879238228 2023-10-04 14:00:38 2023-10-04 14:30:45 0 1146.6 47.3 1121.8 0 0 0 2332277 545996 0 0
799tsyjq9p1r6 1879238228 2023-10-04 14:30:45 2023-10-04 15:00:51 1 1042.1 36.9 1020.6 0 0 0 2892948 368538 0 29540
799tsyjq9p1r6 1879238228 2023-10-05 14:00:05 2023-10-05 14:30:12 0 939.4 35.5 921.7 0 0 0 1556583 360616 0 0
799tsyjq9p1r6 1879238228 2023-10-05 14:30:12 2023-10-05 15:00:19 0 1802.9 59.5 1768.4 0 0 .1 4668380 553976 0 0
799tsyjq9p1r6 1879238228 2023-10-05 15:00:19 2023-10-05 15:30:26 1 25.1 1.3 24.2 0 0 0 103952 11067 0 35528
799tsyjq9p1r6 1879238228 2023-10-06 14:00:31 2023-10-06 14:30:38 0 1059.1 44 1036.6 0 0 0 2127809 447816 0 0
799tsyjq9p1r6 1879238228 2023-10-06 14:30:38 2023-10-06 15:00:45 1 1559.8 60.8 1524.6 0 0 0 4663034 562076 0 38124
799tsyjq9p1r6 1879238228 2023-10-07 14:00:58 2023-10-07 14:30:05 0 1101.6 47.7 1077.5 0 0 0 2064020 547169 0 0
799tsyjq9p1r6 1879238228 2023-10-07 14:30:05 2023-10-07 15:00:12 0 1806.2 66.7 1771.2 0 0 0 3817811 694678 0 0
799tsyjq9p1r6 1879238228 2023-10-07 15:00:12 2023-10-07 15:30:19 1 88.3 3.3 86.3 0 0 0 244627 25431 0 34400
799tsyjq9p1r6 1879238228 2023-10-08 14:30:31 2023-10-08 15:00:38 0 1638 42.5 1617.2 0 0 0 1728020 412431 0 0
799tsyjq9p1r6 1879238228 2023-10-08 15:00:38 2023-10-08 15:30:45 0 1808.6 57.1 1780.4 0 0 0 2167202 498306 0 0
799tsyjq9p1r6 1879238228 2023-10-08 15:30:45 2023-10-08 16:00:52 1 1187.7 32.4 1171.5 0 0 0 1682580 297268 0 31062
可以清晰的看到2023-10-07号,这个SQL的执行时间在快照2023-10-07 14:00和2023-10-07 15:00AWR快照区间内完成,而在14:00-14:30, 14:30-15:00内,SQL没有执行完成,所以它的执行次数为0,在15:00-15:30这个快照区间内,它的ELAPSED_TIME_DELTA为88.3,所以这样统计时,它的"Elapsed Average"(它等价于awr_plan_change.sql脚本中的AVG_ETIME)为88.3秒,而这个SQL的实际执行时间为1101.6+ 1806.2+ 88.3 ~= 2996.1秒,而2023-10-08号的执行时长为1638 +1808.6 + 1187.7~= 4634.3秒。那么awr_plan_change.sql统计出错的原因是啥呢?因为脚本里面有一个条件:and executions_delta > 0,它排除了那些正在执行,但是快照间隔内没有执行完的记录数据(因为这些记录的executions_delta为0),正确的做法是要汇总这些数据然后除以执行次数才是SQL正在的平均执行时间。所以awr_plan_change.sql对于哪些执行时间超过AWR采样间隔的SQL,它的统计数据就会不正确。
其实DBA_HIST_SQLSTAT的elapsed_time_delta这个字段,它统计的是AWR采样间隔内,SQL语句(执行一次或多次)的执行时间,也就是说它是这个采样间隔内差值,一个变化值,而不是SQL语句的完整累加值。Delta本身就是第4个希腊字母,其实delta也表示微积分中的曲率,在数学和科学中表示单位时间内,指标的的变化量。所以,你分析AWR报告时,也要注意一个问题,如果你生成AWR报告的快照开始时间和结束时间不能覆盖SQL语句的开始执行时间和结束执行时间,那么AWR报告中的SQL语句的Elapsed Time per Exec (s)也是不正确的。这个也是很多人没有特别留意的地方。