Friday, December 21, 2007

The power of a stack trace

Introduction

The real test to measure a database administrator's troubleshooting skills is during P1 issues and emergencies. We were in such a situation for one of our batch processing 10g database the other day. The batch processing application failed to start after the regular restart every sunday. Upon further investigation the application logfile contained 'ORA-01031: insufficient privileges' error. Looking at it for the first time made me think it was a "table/synonym access or package execution permission that was missing'. But its practically not possible to find the user and database object that is causing this problem. Time was running out and the batch processing has come to a halt because of this issue. At this time i realized i can use the ORA-error stack trace option to find the source of the error.

Procedure


  1. Login to the database (sqlplus '/ as sysdba ) and issue the following command
  2. alter system set events '1031 trace name errorstack level 3';
  3. Let the application team retry their application startup command/script
  4. It fails again, but this time a trace file will be generated in the user_dump_destination directory(udump)
  5. Goto the udump directory and view the stack trace.
  6. Login to the database (sqlplus '/ as sysdba ) again to turn off the stack trace
  7. alter system set events '1031 trace name context off';

In our case the stack trace had the following contents

ksedmp: internal or fatal error
ORA-01031: insufficient privileges
Current SQL statement for this session:
alter system set optimizer_index_cost_adj =20


Bang on Target !!! The error is because the application start program was running 'alter system set optimizer_index_cost_adj =20'. Any 'alter system' command should be run by only privileged users (sysdba, and sometimes system depending on the type of command). The application team admitted to have inadvertently added this 'alter system' command instead of the planned 'alter session' command. Soon the error was corrected and the batch processing started again.

Caveats

Disable the error tracing as soon as the problem is reproduced and logfile is generated in the user dump directory. Failure to disable the stack trace in a timely fastion can fillup the user dump destination directory.

Conclusion

Database wide errors(such as a backup failure or a tablespace out of freespace) are reported in the alertlog and therefore easy to identify leading to a faster rootcause identification. However ORA-errors caused by user sessions are not reported in database server logfiles and hence the rootcause/source of error is not easy to find out. Enabling a stack trace for such an ORA error will help the database produce a logfile in the udump directory which can be analyzed to find the source. A DBA can heave a sigh of relief after resolving a production priority or emergency issue, but a speedy resolution and identification of rootcause goes a long way in increasing the DBA's reputation.

Related Posts Plugin for WordPress, Blogger...