ORA-19599 When Backing up an Archivelog that is Corrupt
前几天遇到了一起备份失败案例,RMAN备份过程中遇到了归档日志损坏的情况,还是第一次遇到这种案例,这里记录一下这个案例的具体情况。
备份作业失败,检查RMAN备份的输出日志,发现一个归档日志文件损坏(corrupt)了,如下所示:
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
RMAN-08515: archived log file name=/eapdblog/eap_1_666_1155313416.arc thread=1 sequence=666
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
RMAN-08515: archived log file name=/eapdblog/eap_1_667_1155313416.arc thread=1 sequence=667
RMAN-03009: failure of backup command on dev_0 channel at 04/09/2024 09:44:50
ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: non RMAN, but media manager or vendor specific failure, error text:
Vendor specific error: OB2_EndObjectBackup() failed ERR(-2)
ORA-19599: block number 316064 is corrupt in archived log /eapdblog/eap_1_660_1155313416.arc
检查验证归档日志,发现归档日志文件eap_1_660_1155313416.arc确实损坏。如下所示:
RMAN> validate archivelog all;
Starting validate at 09-APR-24
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=261 device type=DISK
channel ORA_DISK_1: starting validation of archived log
channel ORA_DISK_1: specifying archived log(s) for validation
input archived log thread=1 sequence=660 RECID=645 STAMP=1165788069
input archived log thread=1 sequence=663 RECID=648 STAMP=1165824445
input archived log thread=1 sequence=664 RECID=649 STAMP=1165828881
input archived log thread=1 sequence=665 RECID=650 STAMP=1165829178
input archived log thread=1 sequence=666 RECID=651 STAMP=1165829976
input archived log thread=1 sequence=667 RECID=652 STAMP=1165830268
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Archived Logs
=====================
Thrd Seq Status Blocks Failing Blocks Examined Name
---- ------- ------ -------------- --------------- ---------------
1 660 FAILED 8 346599 /eapdblog/eap_1_660_1155313416.arc
1 663 OK 0 382900 /eapdblog/eap_1_663_1155313416.arc
1 664 OK 0 94593 /eapdblog/eap_1_664_1155313416.arc
1 665 OK 0 1748 /eapdblog/eap_1_665_1155313416.arc
1 666 OK 0 17557 /eapdblog/eap_1_666_1155313416.arc
1 667 OK 0 4226 /eapdblog/eap_1_667_1155313416.arc
validate found one or more corrupt blocks
See trace file /eapdb/diag/rdbms/eap/eap/trace/eap_ora_917867.trc for details
Finished validate at 09-APR-24
RMAN> exit
检查告警日志,也看到下面信息。
2024-04-08T23:15:05.730996+08:00
***
Corrupt block seq: 660 blocknum=316064.
Bad header found during backing up archived log
Data in bad block - flag:1. format:34. bno:93696. seq:649
beg:16 cks:21324
calculated check value: 21324
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
2024-04-08T23:15:21.671470+08:00
***
Corrupt block seq: 660 blocknum=316064.
Bad header found during backing up archived log
Data in bad block - flag:1. format:34. bno:93696. seq:649
beg:16 cks:21324
calculated check value: 21324
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
2024-04-08T23:15:36.695623+08:00
虽然知道归档日志损坏了,但是不清楚什么原因导致归档日志损坏,之前也见过别人分享的案例ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (Doc ID 1487957.1),但是当前环境如下所示,跟Doc ID 1487957.1中案例环境完全不一样
操作系统 :Red Hat Enterprise Linux release 8.8 (Ootpa)
数据库版本: Oracle 19c Enterprise Edition 19.20.0.0.0
文件系统为: xfs
开了Service Requests,然后提交各种日志,以及损坏归档日志的dump文件,最后官方反馈跟未公开的两个bug非常相似(下面截图)。不过这种现象发生的频率非常少。还是第一次遇到这种错误。官方技术支持建议,如果这种情况出现的频率很少,建议观察,如果出现频率很高,建议修改filesystemio_options为directio来规避这个问题。
sqlplus / as sysdba
oradebug setmypid
oradebug tracefile_name
alter system dump logfile '/eapdblog/eap_1_660_1155313416.arc' VALIDATE;
做了如下操作处理,然后重新做了RMAN完整备份,又观察了好几天,暂时一直未遇到这个错误。
手工删除这个损坏的归档日志
RMAN > crosscheck archivelog all;
RMAN> DELETE EXPIRED ARCHIVELOG sequence 660;