Oracle 11g中的IO Calibrate(IO校准)
转载自:http://blog.itpub.net/17203031/viewspace-1063523/
Oracle数据库发展到今天,“IO为王”已经是一种发展方向趋势。ExtraData一体机的重要特色之一就是最大程度的发挥IO能力、提高IO吞吐量。
相比CPU和内存,IO存储有其特殊性。我们讨论IO,通常成为I/O栈(I/O Stack)。I/O栈设计的对象是一系列关键组件层,包括HBA、Storage Switches、Storage Array和Physical Disks。这些对象共同合力,才能形成系统整体的IO能力。
四层关键组件,共同形成“木桶效应”。只要有一个层面存在不足,必然成为IO中的短板。I/O难调,也就是在这个方面。但是对于Oracle而言,我们需要关注的是IO整体性能,也就是整体的效果。
Oracle 11g有两个对于性能方面的测试工具,一个就是RAT(Real Application Test),另一个就是IO校准(Calibrate IO)。RAT是一种负载重演组件,当进行系统软硬件升级的时候,我们一个很关注的问题是:此次变化能否提升系统性能、能提升多少,会不会有新的瓶颈。这个在过去是不能实现的,只能够在升级之后通过实践去发现。但是RAT可以捕获实际系统负载情况,将其在新环境下进行重演,并且进行度量比较。IO调教的作用也是IO负载模拟,从而判断出实际真实的系统IO情况。
本篇我们就介绍IO校准特性。
1、发现IO校准
首先聊聊为什么要进行校准。IO是一个多组件共同影响的统一体,多个组件之间大部分情况下是不能够完全如同理想情况下工作的。所以需要进行硬件标准指标和实际情况之间进行校准,来获取准确的IO数据。
获取精确IO有什么用途呢?根源还是Oracle自动化和智能化的需要。进入11g之后,Oracle向智能化的步子是在加快的过程。Oracle从CBO开始,进行自动化并行决策的Auto DOP就需要IO校准的信息。
2、配置IO校准
我们进行配置过程,首先选择Oracle 11gR2进行测试。
SQL> select * from v$version;
BANNER
---------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
11g中有一个视图v$io_calibration_status,记录了系统进行校准过程信息。和统计量不同,Oracle是不会自动进行IO校准的,而需要DBA手工完成。
SQL> select * from v$io_calibration_status;
STATUS CALIBRATION_TIME
------------- --------------------------------------------------------------------------------
NOT AVAILABLE
注意:进行校准过程,一般需要配置异步IO功能。
SQL> show parameter disk_asy
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io boolean TRUE
SQL> select name,asynch_io from v$datafile f,v$iostat_file i
2 where f.file#=i.file_no
3 and (filetype_name='Data File' or filetype_name='Temp File');
NAME ASYNCH_IO
-------------------------------------------------- ---------
+DATA/ora11g/datafile/system.256.825944325 ASYNC_ON
+DATA/ora11g/datafile/system.256.825944325 ASYNC_ON
+DATA/ora11g/datafile/sysaux.257.825944327 ASYNC_ON
+DATA/ora11g/datafile/undotbs1.258.825944329 ASYNC_ON
+DATA/ora11g/datafile/users.259.825944329 ASYNC_ON
+DATA/ora11g/datafile/example.265.825944513 ASYNC_ON
6 rows selected
IO校准并不是单独的列出功能,而是融入到Oracle的Resource Manager功能包里面。调用IO校准的功能包DBMS_RESOURCE_MANAGER.CALIBRATE_IO,其中两个输入参数,一个是磁盘的个数,另一个是允许的最大IO延迟。这两个参数可以通过咨询运维团队和厂商实现。
调用过程如下:
SQL> set serveroutput on;
SQL> DECLARE
2 lat INTEGER;
3 iops INTEGER;
4 mbps INTEGER;
5 BEGIN
6 --DBMS_RESOURCE_MANAGER.CALIBRATE_IO(, ,iops, mbps, lat);
7 DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 10, iops, mbps, lat);
8 DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
9 DBMS_OUTPUT.PUT_LINE ('latency = ' || lat);
10 dbms_output.put_line('max_mbps = ' || mbps);
11 end;
12 /
max_iops = 111
latency = 8
max_mbps = 62
PL/SQL procedure successfully completed
Executed in 811.547 seconds
这个执行过程执行超过800s,时间不算短。最后计算出测算出的最大iops、延迟和最大mbps(每秒MB)。
在执行过程中,我们查看视图v$io_calibration_status。
SQL> select * from v$io_calibration_status;
STATUS CALIBRATION_TIME
------------- --------------------------------------------------------------------------------
IN PROGRESS 14-12月-13 11.20.20.120 上午
此时的状态,从Not Available变为Ready。在校准过程中,Oracle会形成对存储的大量IO读写操作。我们借助Linux下的sar命令,监控全部过程。
[root@SimpleLinux ~]# sar -b 5 100 -o /tmp/res2
Linux 2.6.18-128.el5 (SimpleLinux.localdomain) 12/13/2013
11:25:08 AM tps rtps wtps bread/s bwrtn/s
11:25:13 AM 8.33 0.00 8.33 0.00 134.92
11:25:18 AM 23.02 1.59 21.43 50.79 311.90
11:25:23 AM 5.96 1.59 4.37 50.89 85.88
11:25:28 AM 7.14 1.59 5.56 50.79 89.68
11:25:33 AM 2.78 0.00 2.78 0.00 44.44
11:25:38 AM 5.96 1.59 4.37 50.89 85.88
11:25:43 AM 257.65 253.28 4.37 4141.55 76.34
11:25:48 AM 281.75 276.19 5.56 4415.87 219.05
11:25:53 AM 278.33 273.56 4.77 4427.83 89.07
11:25:58 AM 289.50 266.53 22.97 4264.55 237.62
11:26:03 AM 232.14 228.97 3.17 3688.89 50.79
11:26:08 AM 268.53 264.14 4.38 4608.76 92.43
注意TPS的变化过程。启动校准之后,Oracle生成大量的IO操作,来判断存储的极限。这个过程也就是让我们了解当前IO架构的上限。
我们通过Excel画出全过程的TPS、RTPS和WTPS趋势。
结束IO校准之后,我们可以查看到IO调教过程信息。
SQL> select * from v$io_calibration_status;
STATUS CALIBRATION_TIME
------------- --------------------------------------------------------------------------------
READY 14-12月-13 11.39.10.194 上午
3、校准使用
我们进行IO校准,可以为Oracle很多功能提供决策依据。如果没有进行过IO校准,Auto DOP就不能正常工作。
SQL> explain plan for select /*+parallel*/ * from scott.emp;
Explained
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1408123770
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 532 | 2 (0)| 00:00:01
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 14 | 532 | 2 (0)| 00:00:01
| 3 | PX BLOCK ITERATOR | | 14 | 532 | 2 (0)| 00:00:01
| 4 | TABLE ACCESS FULL| EMP | 14 | 532 | 2 (0)| 00:00:01
--------------------------------------------------------------------------------
Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing
15 rows selected
收集IO Calibrate统计量之后,就可以看到并行度效果。
SQL> explain plan for select /*+parallel*/ * from scott.emp;
Explained
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2873591275
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 532 | 2 (0)| 00:00:01
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 14 | 532 | 2 (0)| 00:00:01
| 3 | PX BLOCK ITERATOR | | 14 | 532 | 2 (0)| 00:00:01
| 4 | TABLE ACCESS FULL| EMP | 14 | 532 | 2 (0)| 00:00:01
--------------------------------------------------------------------------------
Note
-----
- automatic DOP: Computed Degree of Parallelism is 2
15 rows selected
4、结论
Oracle自动化、智能化过程中,是需要提供很多辅助信息的。Calibrate IO是一个重要方面。Oracle不进行自动的Calibrate IO统计量的原因大体有三个:
首先是Oracle并不知道实际磁盘的标准指标。第二是Oracle校准过程生成很大的IO,如果不慎会引起很大产品问题。第三是Disk IO性能不会经常性发生变化。
In this Document
Purpose
Scope Ask Questions, Get Help, And Share Your Experiences With This Article
Details IO Calibration Tuning Parameters
References
Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.
In 11.2.0.2 Automatic Degree of Parallelism can only be used if I/O statistics are gathered.
This Note explains what DBA has to do in order to make sure Automatic Degree of Parallelism works.
SCOPEFor DBAs
The AutoDOP is not a feature to use more parallelism. It is a feature that restricts the parallel to maximize throughput,
so it is expected that with AutoDOP not all queries will run in parallel and the ones that do run in parallel may not run with full parallelism, as this is the technical specifications of the feature.
Ask Questions, Get Help, And Share Your Experiences With This Article
Would you like to explore this topic further with other Oracle Customers, Oracle Employees, and Industry Experts?
Click here to join the discussion where you can ask questions, get help from others, and share your experiences with this specific article.
Discover discussions about other articles and helpful subjects by clicking here to access the main My Oracle Support Community page for Database Datawarehousing.
When PARALLEL_DEGREE_POLICY is set to AUTO, Oracle Database determines whether a statement should run in parallel based on the cost of the operations in the execution plan and the hardware characteristics.
AutoDOP is used when PARALLEL or PARALLEL(AUTO) statement level hint is used regardless of the value of the PARALLEL_DEGREE_POLICY (see documentation).
IO Calibration
The hardware characteristics include I/O calibration statistics so these statistics must be gathered otherwise Oracle Database does not use the automatic degree parallelism feature.
If I/O calibration has not been run to gather the required statistics, the explain plan includes the following text in its notes section: ": skipped because of IO calibrate statistics are missing"
explain plan for select /*+ parallel */ * from emp;Plan hash value: 2873591275 --------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |--------------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 14 | 532 | 2 (0)| 00:00:01 | | | || 1 | PX COORDINATOR | | | | | | | | || 2 | PX SEND QC (RANDOM)| :TQ10000 | 14 | 532 | 2 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) || 3 | PX BLOCK ITERATOR | | 14 | 532 | 2 (0)| 00:00:01 | Q1,00 | PCWC | || 4 | TABLE ACCESS FULL| EMP | 14 | 532 | 2 (0)| 00:00:01 | Q1,00 | PCWP | |-------------------------------------------------------------------------------------------------------------- Note ----- - dynamic sampling used for this statement (level=2) - automatic DOP: skipped because of IO calibrate statistics are missing
The Oracle PL/SQL package DBMS_RESOURCE_MANAGER.CALIBRATE_IO is used to execute the calibration. The duration of the calibration is dictated by the NUM_DISKS variable as well as the number of nodes in the RAC cluster.
SET SERVEROUTPUT ONDECLARE lat INTEGER; iops INTEGER; mbps INTEGER;BEGIN --DBMS_RESOURCE_MANAGER.CALIBRATE_IO(, ,iops, mbps, lat); DBMS_RESOURCE_MANAGER.CALIBRATE_IO (28, 10, iops, mbps, lat); DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops); DBMS_OUTPUT.PUT_LINE ('latency = ' || lat); dbms_output.put_line('max_mbps = ' || mbps);END;/
Note that the first two variables (NUM_DISKS, MAX_LATENCY) are input variables, and the remaining three are output variables.
NUM_DISKS - To get the most accurate results, its best to provide the actual number of physical disks that are used for this database. The Storage Administrator can provide this value. Keep in mind that when ASM is used to manage the database files, say in the DATA diskgroup, then only physical disks that make up the DATA diskgroup should be used for the NUM_DISKS variable; i.e.; do not include the disks from the FRA diskgroup. In the example above the DATA diskgroup is made up of 28 physicals (presented as 4 LUNs or ASM disks)
LATENCY– Maximum tolerable latency in milliseconds for database-block-sized IO requests.
You find more information about CALIBRATE_IO in Note: 727062.1 Configuring and using Calibrate I/O.
In order to verify whether the calibration run was successful, query V$IO_CALIBRATION_STATUS after you executed DBMS_RESOURCE_MANAGER.CALIBRATE_IO call.
select * from V$IO_CALIBRATION_STATUS;STATUS CALIBRATION_TIME-------- ----------------------------- READY 25-NOV-10 08.53.08.536
The execution plan now shows that the feature automatic degree of parallelism can be used:
explain plan for select /*+ parallel */ * from emp;
Plan hash value: 2873591275 --------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |--------------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 14 | 532 | 2 (0)| 00:00:01 | | | || 1 | PX COORDINATOR | | | | | | | | || 2 | PX SEND QC (RANDOM)| :TQ10000 | 14 | 532 | 2 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) || 3 | PX BLOCK ITERATOR | | 14 | 532 | 2 (0)| 00:00:01 | Q1,00 | PCWC | || 4 | TABLE ACCESS FULL| EMP | 14 | 532 | 2 (0)| 00:00:01 | Q1,00 | PCWP | |--------------------------------------------------------------------------------------------------------------Note ----- - dynamic sampling used for this statement (level=2)- automatic DOP: Computed Degree of Parallelism is 2
There is known issue with DBMS_RESOURCE_MANAGER.CALIBRATE_IO.
Note: 10180307.8 DBRM DBMS_RESOUCE_MANAGER.CALIBRATE_IO REPORTS VERY HIGH MAX_PMBPS If CALIBRATE_IO can not be used you can set the relevant value manual:
delete from resource_io_calibrate$;
insert into resource_io_calibrate$
values(current_timestamp, current_timestamp, 0, 0, 200, 0, 0);
commit;
You have to restart the database after this change.
200 is a value that works well for Machine with a fast I/O subsystem, as example for Exadata Machines. If you set the number (max_pmbps) lower your calculated DOP will increase. If you set max_pmbps higher the calculated DOP are decreased. It appears that 200 is a reasonable value to deal with concurrency on a system.
Automatic DOP is not computed if
- the database is not opened, or
- the database is in restricted access (DBA) or read-only or migrate mode, or
- database is suspended, or
- instance is not open , or
- the SQL cursor is not supported do run in AutoDOP mode.
Tuning Parameters
When you use AutoDOP, you may want to adjust some tuning parameters. See Document 1549214.1 Setup, Monitor, And Tune Parallelism In The Database for information about these parameters. The parallel_servers_target should always be smaller than parallel_max_servers, with parallel_servers_target anywhere from 75% to 50% of parallel_max_servers. If you start seeing a lot of DOP downgrades, you should make the distance between the values for these two parameters greater.
parallel_servers_target
parallel_min_time_threshold
Overview I/O Calibration
Oracle Database 11g introduces an I/O Calibration mechanism, whereby you can run I/O calibration tests either through the Enterprise Manager Performance page or a PL/SQL package. Oracle’s I/O calibration is a variation on the Clarion tool. In an Oracle database, the I/O workload is of two basic types—small random I/O and large sequential I/O. OLTP applications usually experience the small random I/O workload, where the speed with which small I/O requests are serviced is paramount. Thus, disk spinning and seeking times are of critical importance. OLAP applications, on the other hand, employ the large sequential I/O in general. For these types of applications, the critical factor is the capacity of the I/O channel. The larger the I/O channels between the database server and the storage system, the larger the I/O throughput. Oracle uses the following two metrics, each measuring the efficacy of one type of I/O workload:
- IOPS (I/O per second) The IOPS rate is the number of small random I/Os the system can perform in a second and depends on the spin speed of disks. You can increase the IOPS rate by increasing the number of disks in the storage array or by using faster disk drives, which have a high RPM and lower seek time.
- MBPS (megabytes per second) This metric measures the data transfer rate between the server and the storage array and depends on the capacity of the I/O channel between the two systems. A larger I/O channel means a higher MBPS rate.
Two important terms need clarification in this discussion: throughput and latency. The throughput of a system determines how fast it can transfer data and is measured by the MBPS metric. The channel capacity determines the overall throughput of the system, and it thus puts the ceiling on the amount of data transfer. Latency refers to the lag between the time an I/O request is made and when the request is serviced by the storage system. High latency indicates a system that’s overloaded and you can reduce latency by striping data across multiple spindles, so different disks can service the same I/O request in parallel.
Oracle recommends that you use the new I/O Calibration tool to determine I/O metrics in a database. It takes about 10 minutes to run the tools and you should pick a time when the database workload is light to avoid overstressing the storage system. You can run only a single calibration task at a time. If you perform the task in an RAC environment, the workload is generated simultaneously from all instances in the system. You can either run the tool with Enterprise Manager or through PL/SQL.
Calibrating I/O Using PL/SQL
You can also use the new procedure CALIBRATE_IO from the DBMS_ RESOURCE_MANAGER package to run the I/O Calibration task. Here is an example:
begin
exec dbms_resource_manager.calibrate_io(-
num_disks => 1, -
max_latency => 10, -
max_iops => :max_iops, -
max_mbps => :max_mbps, -
actual_latency => :actual_latency);
end;
/
In the CALIBRATE_IO procedure, the following are the key parameters:
- num_disks: Approximate number of disks in the storage array.
- max_latency: Maximum tolerable latency (in milliseconds) for an I/O request.
- max_ios: Maximum number of random DB block-sized read requests that can be serviced.
- max_mbps: Maximum number of randomly distributed 1MB reads that can be serviced (in megabytes per second).
- actual_latency: Average latency of DB block-sized I/O requests at max_iops rate (in milliseconds).
The procedure only works if asynchronous I/O is enabled. If asynchronous I/O is not enabled, the procedure returns the following error.
DECLARE
*
ERROR at line 1:
ORA-56708: Could not find any datafiles with asynchronous i/o capability
ORA-06512: at "SYS.DBMS_RMIN", line 453
ORA-06512: at "SYS.DBMS_RESOURCE_MANAGER", line 1153
ORA-06512: at line 6
You can use the FILESYSTEMIO_OPTIONS static initialization parameter to enable or disable asynchronous I/O or direct I/O on file system files. This parameter is platform-specific and has a default value that is best for a particular platform.
FILESYTEMIO_OPTIONS can be set to one of the following values:
- ASYNCH: enable asynchronous I/O on file system files, which has no timing requirement for transmission.
- DIRECTIO: enable direct I/O on file system files, which bypasses the buffer cache.
- SETALL: enable both asynchronous and direct I/O on file system files.
- NONE: disable both asynchronous and direct I/O on file system files.
SQL> SHOW PARAMETER FILESYSTEMIO_OPTIONS
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options string none
ALTER SYSTEM SET FILESYSTEMIO_OPTIONS=SETALL SCOPE=SPFILE;
SHUTDOWN IMMEDIATE;
STARTUP;
Usage Notes
- Only users with the SYSDBA privilege can run this procedure. Qualified users must also turn on timed_statistics, and ensure asynch_io is enabled for datafiles. This can be achieved by setting filesystemio_options to either ASYNCH or SETALL. One can also query the asynch_io status by means of the following SQL statement:
col name format a50
SELECT name, asynch_io FROM v$datafile f,v$iostat_file i
WHERE f.file# = i.file_no
AND filetype_name = 'Data File'; - Only one calibration can be run at a time. If another calibration is initiated at the same time, it will fail.
- For an Oracle Real Application Clusters (Oracle RAC) database, the workload is simultaneously generated from all instances.
For timed_statistics paramter,please refer to following illustration :
TIMED_STATISTICS specifies whether or not statistics related to time are collected.Values:
- true: The statistics are collected and stored in trace files or displayed in the V$SESSTATS and V$SYSSTATS dynamic performance views.
- false: The value of all time-related statistics is set to zero. This setting lets Oracle avoid the overhead of requesting the time from the operating system.
Starting with release 11.1.0.7.0, the value of the TIMED_STATISTICS parameter cannot be set to false if the value of STATISTICS_LEVEL is set to TYPICAL or ALL.
On some systems with very fast timer access, Oracle might enable timing even if this parameter is set to false. On these systems, setting the parameter to true can sometimes produce more accurate statistics for long-running operations.
The [G]V$IO_CALIBRATION_STATUS views show the current status of the calibration runs. During a run the status of 'IN PROGRESS' is displayed. Once a run is complete the status switches to 'READY' and the calibration time is displayed.Besides,NOT AVAILABLE mean Calibration results not available and CALIBRATION_TIME tell us End time of the last calibration run
SQL> SELECT * FROM v$io_calibration_status;
STATUS CALIBRATION_TIME
------------- -------------------------------
IN PROGRESS
SQL> SELECT * FROM v$io_calibration_status;
STATUS CALIBRATION_TIME
------------- ---------------------------------------------------------------------------
READY 28-JUL-2008 14:37:38.410
1 row selected.
Once you execute the CALIBRATE_IO procedure, you can query the V$IO_ CALIBRATION_STATUS and the DBA_RSRC_IO_CALIBRATE views to check the results. Here’s a sample query:
SQL> select max_iops, max_mbps, max_pmbps, latency
from dba_rsrc_io_calibrate;
MAX_IOPS MAX_MBPS MAX_PMBPS LATENCY
---------- ------------ ------------ ----------
133 12 6 64
Oracle Database 11g collects I/O statistics in three different dimensions to provide a consistent set of statistics for I/O calls.These dimensions are RDBMS components grouped into 12 functional groups.
- The V$IOSTAT_ FUNCTION view provides the details. I/O statistics are collected for each consumer group that’s part of the currently enabled resource plan.
- The V$IOSTAT_CONSUMER_GROUP view has the details.
- Individual file level I/O statistics are collected and stored in the V$IOSTAT_ FILE view for details.