Friday, December 21, 2007

The power of a stack trace

Introduction

The real test to measure a database administrator's troubleshooting skills is during P1 issues and emergencies. We were in such a situation for one of our batch processing 10g database the other day. The batch processing application failed to start after the regular restart every sunday. Upon further investigation the application logfile contained 'ORA-01031: insufficient privileges' error. Looking at it for the first time made me think it was a "table/synonym access or package execution permission that was missing'. But its practically not possible to find the user and database object that is causing this problem. Time was running out and the batch processing has come to a halt because of this issue. At this time i realized i can use the ORA-error stack trace option to find the source of the error.

Procedure


  1. Login to the database (sqlplus '/ as sysdba ) and issue the following command
  2. alter system set events '1031 trace name errorstack level 3';
  3. Let the application team retry their application startup command/script
  4. It fails again, but this time a trace file will be generated in the user_dump_destination directory(udump)
  5. Goto the udump directory and view the stack trace.
  6. Login to the database (sqlplus '/ as sysdba ) again to turn off the stack trace
  7. alter system set events '1031 trace name context off';

In our case the stack trace had the following contents

ksedmp: internal or fatal error
ORA-01031: insufficient privileges
Current SQL statement for this session:
alter system set optimizer_index_cost_adj =20


Bang on Target !!! The error is because the application start program was running 'alter system set optimizer_index_cost_adj =20'. Any 'alter system' command should be run by only privileged users (sysdba, and sometimes system depending on the type of command). The application team admitted to have inadvertently added this 'alter system' command instead of the planned 'alter session' command. Soon the error was corrected and the batch processing started again.

Caveats

Disable the error tracing as soon as the problem is reproduced and logfile is generated in the user dump directory. Failure to disable the stack trace in a timely fastion can fillup the user dump destination directory.

Conclusion

Database wide errors(such as a backup failure or a tablespace out of freespace) are reported in the alertlog and therefore easy to identify leading to a faster rootcause identification. However ORA-errors caused by user sessions are not reported in database server logfiles and hence the rootcause/source of error is not easy to find out. Enabling a stack trace for such an ORA error will help the database produce a logfile in the udump directory which can be analyzed to find the source. A DBA can heave a sigh of relief after resolving a production priority or emergency issue, but a speedy resolution and identification of rootcause goes a long way in increasing the DBA's reputation.

Friday, November 30, 2007

How to rename a datafile which has blank characters or whitespace

Introduction

One of my earlier posts has the procedure that describes how to rename a datafile. A similar but slightly different procedure is when a datafile has newline or whitespace character in it. Obviously its not a great practice to create a datafile with newline or whitespace characters is in it. This post describes how to remove a newline or any special characters in datafiles.

Problem

In our production database there is a datafile that has 1 newline and 4 whitespace characters in it. It was created long ago by a DBA by mistake. Due to these special characters, every database clone procedure has a special task to rename this datafile before creating the target database's controlfile. This extra manual step can be avoided if the datafile is renamed in the production database.

Solution

1. select file_name
from dba_data_files
where file_name like '%appsprod_custcpsinx03%';

/dbf07/oracle/appsproddata/
custcpsprod_inx03.dbf

2. Note the above newline and the whitespace characters.

3. select dump(name)
from dba_data_files
where file_name like '%custcpsprod_inx03.dbf%'
DUMP(NAME)--------------------------------------------------------------------------------Typ=1 Len=57: 47,100,98,102,48,55,47,111,114,97,99,108,101,47,97,112,112,115,112,114,111,100,100,97,116,97,47,10,32,32,32,32,97,112,112,115,112,114,111,100,95,99,117,115,116,99,112,115,105,110,120,48,51,46,100,98,102

4. The above dumps all the ascii equavalent values of the characters in the datafile.
5. http://www.columbia.edu/kermit/ascii.html can be referred for ascii values and their equavalents.

6. 10 is newline and 32 is whitespace. Therefore the above datafile has 1 newline and 4 whitespace characters.
7. Shutdown the database
8. cd /dbf07/oracle/appsproddata/
9. ls –lrt *appsprod_custcpsinx03.dbf
10. -rw-r----- 1 oracle dba 3145736192 Nov 29 19:35
appsprod_custcpsinx03.dbf
11. mv *appsprod_custcpsinx03.dbf appsprod_custcpsinx03.dbf
12. Login as 'sqlplus '/ as sysdba' and issue 'startup mount'
13. Verify the existing datafile entry in the controlfile – select name from v$datafile where name like ‘%appsprod_custcpsinx%’; - It still has the old entry in the controlfile
14. alter database rename file '/dbf07/oracle/appsproddata/
appsprod_custcpsinx03.dbf '
to
'/dbf07/oracle/appsproddata/appsprod_custcpsinx03.dbf'

15. In the above alter database statement, exactly one newline and exactly 4 whitespace characters have to be entered. Otherwise controlfile will not recognize the datafile.

16. alter database open

Conclusion

Whereas Oracle allows a datafile to have any number of newline or whitespace characters, it is not a great practice to create datafiles with these special characters and as noticed above it is not a straightforward rename exercise. Renaming any datafile requires a database bounce or atleast a 'tablespace offline' and this means downtime for business.

Tuesday, October 30, 2007

10g Certification IZ0-040

I obtained my 10g Certification last week. Pending since long, it completes a major requirement for that real 10g DBA. Since i am already 9i Certified Professional, upgrading to 10g wasn't such a big deal.

I found passing the 4 exams of the initial Oracle 9i OCP quite a big thing, especially dealing with the amount of books and experience needed to pass the exam. Its like passing one of your engineering semester exams as part of your Bachelor's degree. However 10g certification is going to be easy for any existing 9i certified professional. I am posting my experience with the exam here.

  1. 60 questions. Atleast 44 answers have to be correct.
  2. Questions will be independent of 10g R.1 and 10g R.2 releases. You can still pass the exam if you are just experienced in 10g R.1 and not 10g R.2 release.
  3. Only new features will be tested here (including some 9i advanced new features). If one wants to pass this exam easily, one needs to study thoroughly the 10 new features guide found here.

Thursday, October 11, 2007

Oracle listener password

Ever been in a situation where your database listener has been shutdown by an end user ? Yes , thats right, the default database listener configuration in 9i comes *WITHOUT* a password and any client or end user who obtains the listener name and a TNS entry(listener port and host name) , can shutdown the database server's listener sitting from that client machine !!! . All the places that i worked so far had their database server listeners' unsecured when i first checked them.

Quarterly database and apps CPU patches are recommended to fix security gaps and loopholes, but database listener security is often overlooked and easiest thing to break.

Problem



Database listeners are not always password protected. They are vulnerable to security attacks from any host in the network with a TNS information of the target database and an oracle client software. The TNS connection information is already widely distributed to users for e.g to run queries using TOAD or SQL*Navigator etc., so any user can easily *stop* database listeners inadvertently thereby comprising new database connections.

Below is a simulation of how these listeners can be stopped from a client machine

1) Obtain target database TNS connection descriptor
2) Put the connection descriptor in a client machine’s tnsnames.ora
3) Login to the client machine and issue ‘lsnrctl stop db listener name’


This stop command from the client machine will stop the database listeners on the server

Solution



Password protection is the solution for this problem. Passwords can be either encrypted or unencrypted. For maximum security the database listeners should be password protected with encryption.

Implementation

Setting Encrypted Password

1) Login to each of the database instance server host as oracle owner
2) lsnrctl
3) LSNRCTL> set current_listener listener_name3) LSNRCTL> change_password
Old password : Hit Enter KeyNew password : Type the new passwordReenter new password: Retype the new password4) LSNRCTL>set password
Password: Enter the newly created password5) LSNRCTL> save_config listener name
This will create a new passwords parameter in the $TNS_ADMIN/listener.ora and it is encrypted in the file. The old file will be backed up with a listener.bak extention.

Resetting the password if its forgotten

1) Login to the database instance server as oracle owner
2) kill -9 listener pid3) Open the listener.ora and delete the passwords_listener_name line
4) lsnrctl
5) LSNRCTL> set current_listener listener name
6) LSNRCTL> start
7) LSNRCTL> change_password
Old password: Hit Enter key
New password: Type a new passwordReenter new password: Retype the new password
8) LSNRCTL> set password
Password: Enter the newly typed password again
9) LSNRCTL> save_config listener name

Changing an existing password

1) Login to the database instance server as oracle owner
2) kill -9 listener pid
3) lsnrctl
4) LSNRCTL> set current_listener listener name
5) LSNRCTL> start
6) LSNRCTL> change_password
Old password: Enter old passwordNew password: Type a new password
Reenter new password: Retype the new password
7) LSNRCTL> set password
Password: Enter the new password again
8) LSNRCTL> save_config listener name

Listener administration with a password

start

Starting the listener does not require a password. A listener can be started in the conventional method without setting any password

stop

Stopping the listener always requires a password.

1) lsnrctl
2) LSNRCTL>set password
Password: Enter the password3) stop listener name

status

A password is also required for finding the status of a listener

1) lsnrctl
2) LSNRCTL>set password
Password: Enter the password
3) status listener name
Testing

1) Obtain the database TNS descriptor information
2) Put the connection descriptor in a client machine’s tnsnames.ora
3) Login to the client machine and issue ‘lsnrctl stop
The stop command will fail with the following message.

TNS-01169: The listener has not recognized the password

Uninstall (Status Quo)

1) Login to the database instance server as oracle owner
2) kill -9 listener pid
3) Open the listener.ora and delete the passwords_listener_name line
4) lsnrctl
5) LSNRCTL> set current_listener listener name
6) LSNRCTL> start

Caveats

1) A client can still stop/status a listener after setting the password explicitly. Hence its always recommended to choose a strong password with encryption.

Tuesday, October 2, 2007

How to Change APPS Password without running ADCONFIG

Any other ORACLE database user password can be changed simply by the 'alter user username identified by xxxxx' command. However the APPS user password change procedure in an Oracle Applications 11i instance has got a few more additional steps. In this post i will outline all the steps required without running the adconfig utility as well as the verification process after changing the password.
 
APPS password change procedure WITHOUT running adconfig
  1. Login to any of the admin tiers of the 11i instance
  2. FNDCPASS apps/old_pwd 0 Y system/sys_pwd SYSTEM APPLSYS new_pwd
  3. Edit $ORACLE_HOME/reports60/server/CGIcmd.dat and update with the new password
  4. Login to all of the middle tiers of the 11i instance (or just one of them if its a shared appl tier configuration)
  5. Edit the following files and update them with the new password
  6. $FND_TOP/bin/appsweb.cfg
  7. $FND_TOP/resource/wfmail.cfg
  8. $ORACLE_HOME/reports60/server/CGIcmd.dat
  9. $IAS_ORACLE_HOME/Apache/modplsql/cfg/wdbsvr.app
  10. Recreate any database links that are owned by the apps user
  11. Update any 3rd party tools, interfaces or scripts that have the password hardcoded

APPS password change procedure WITH adconfig

  1. Login to any of the admin tiers of the 11i instance
  2. FNDCPASS apps/old_pwd 0 Y system/sys_pwd SYSTEM APPLSYS new_pwd
  3. Backup any customizations in the configuration files
  4. cd $COMMON_TOP/admin/scripts/$CONTEXT_NAME and run adautocfg.sh
  5. Restore the customizations to the configuration files
  6. Login to all of the middle tiers of the 11i instance (or just one of them if its a shared appl tier configuration)
  7. Backup any customizations in the configuration files
  8. cd $COMMON_TOP/admin/scripts/$CONTEXT_NAME and run adautocfg.sh
  9. Restore the customizations to the configuration files
  10. Recreate any database links that are owned by the apps user
  11. Update any 3rd party tools, interfaces or scripts that have the password hardcoded

Verification

  1. Login to the database as apps user
  2. Complete the AOL/J diagnostic test. This webpage asks for the new apps password and tries to complete a jdbc connection. http://11i_web_URL:port/OA_HTML/jsp/fnd/aoljtest.jsp

Conclusion

There may be lot of customizations in the configuration files that are generated by the adautocfg.sh. Saving and restoring them each and every time adautocfg.sh is run is the key to a successful apps password change procedure when adconfig is run (adautocfg.sh). If one is not sure about all the customizations in the configuration files, one can choose the first procedure above WITHOUT running adconfig.

Thursday, September 27, 2007

RMAN Duplication to a new host using Veritas Netbackup in Oracle 9i

We use Veritas Netbackup 5.1 as the Media Management Software for creating our production tape backups. Yesterday, I successfully created a RMAN DUPLICATE database on a different host using these tape backups .I am documenting the procedure below

NAMING CONVENTIONS

1)RMAN target = source database of the RMAN backups
2)duplicate database = destination database that needs to be created from the source database backup
3)auxiliary database = duplicate database
6)RMAN repository = separate schema (RMAN catalog) in a separate database that contains source database’s backup details
7)dev7 is the name of the duplicate database to be created
8)Media Manager Server = Netbackup Media Manager Server.
9)Media Manager Client = source database host
10)PROD = source database name
11)DEV = duplicate database name
12)RMANCT = RMAN catalog database name
13)RMANPROD = RMAN catalog repository schema name

CREATE DESTINATION DATABASE osuser and group SIMILAR to SOURCE DATABASE osuser and group

One of the requirements of RMAN duplication using VERITAS NETBACKUP is that the destination database's osuser and group should be similar to what it is in the source database node

e.g If source database ORACLE_HOME owner is oracle:dba, the destination database ORACLE_HOME owner should also be oracle:dba

MAKE BACKUP TAPES accessible to DESTINATION HOST

1)Contact your NETBACKUP system administrators to the source database BACKUP TAPES accessible to DESTINATION HOST

2)Verify by the following command

a) Login oracle@destination_db_host
b) /usr/openv/netbackup/bin/bplist -C source_db_host -t 4 -l -R /

The above bplist command should list all the backup sets that are in the RMAN catalog

Prepare Duplicate database

1)Login oracle@destination_db_host
2)Copy the source database's initSID.ora to the destination host and add a new parameter DB_FILE_NAME_CONVERT. For e.g if the source database files are in /oracle/prod/d1 and /oracle/prod/d2 directories and if the corresponding restoration paths in the destination host are to be /oracle/dev/d1 and /oracle/dev/d2 respectively,

DB_FILE_NAME_CONVERT=(/oracle/prod/d1/,/oracle/dev/d1/,/oracle/prod/d2,/oracle/dev/d2/)

3)sqlplus '/ as sysdba'

4)startup nomount

5)Edit the $TNS_ADMIN/tnsnames.ora and add connection descriptors for the source database PROD and its catalog database RMANCT

Run the DUPLICATE command

RMAN always restores from the latest tape backups by default to create new database with the DUPLICATE command.

1)Login to oracle@destination_host

export NB_ORA_CLIENT=source_host

2)Connect to RMAN

rman LOG=/tmp/client/dev.log
connect target sys/***@prod
connect catalog rmanprod/****@rmanct
connect auxiliary /


3)Issue the Duplicate commands

RUN
{
allocate auxiliary channel t1 type 'SBT_TAPE';
allocate auxiliary channel t2 type 'SBT_TAPE';
SEND 'NB_ORA_SERV=media manager server, NB_ORA_CLIENT=source_host';
duplicate
target database to dev
pfile=/ora_base/oracle/devdb/9.2.0/dbs/initdev.ora
logfile
'/oracle/dev/logs/dev_log1.dbf' size 256M,
'/oracle/dev/logs/dev_log2.dbf' size 256M,
'/oracle/dev/logs/dev_log3.dbf' size 256M
}


4)Verify the dev alert log and the RMAN log /tmp/client/dev.log
5) The whole restore process will happen over the network depending on the network speed and the source database size

Restoration of only archive logs from RMAN

1) Login oracle@destination_host
2) select * from v$instance ( it should show ‘MOUNTED’ status)
3) shutdown immediate
4) Login oracle@source_host
5) Restore the archivelog files with a separate RMAN command.

rman
connect target /
connect catalog rmanprod/****@rmanct
run
{
allocate channel t1 type 'SBT_TAPE';
set archivelog destination to '/home/oracle/';
restore archivelog
from time "to_date('09/24/07 8:00:00', 'mm/dd/yy hh24:mi:ss')" until time "to_date('09/24/07 13:00:00', 'mm/dd/yy hh24:mi:ss')";
}

7)Verify the restored archive logfiles in /home/oracle/
8)Copy them to destination node using rcp or scp command
9)Login to destination node oracle@destination_host
10)Mount the destination database dev and recover manually using the 'recover database until cancel using backup controlfile' command
11)Type ‘cancel’ after applying the necessary archive logfiles to make the destination database consistent
12) alter database open resetlogs
13) Create the temporary tablespace

How to create a duplicate database until a specific time in the RMAN repository

The procedure remains the same except one more ‘set until’ command. The set until time command has to be put in the run block just before the duplicate command to create the duplicate database upto a specific point in time.

set until time= "to_date('mm/dd/yyyy hh24:mi:ss', 'MM/DD/YYYY HH24:MI:SS')";

Affect of Duplicate command on Recovery Catalog

There is no affect of duplication on the recovery catalog. RMAN generates a new DBID for the duplicate database. Unless this DBID is registered in the catalog, RMAN does not create or update any records for the newly created duplicate database.

It is not necessary to register the newly created duplicate database in the RMAN catalog unless the newly created duplicate database wants to use RMAN for its own backups

Friday, September 21, 2007

How to verify whether a patch has been applied in 11i

There are several known ways for checking whether a patch (simple patch, patchset, family pack or minipack or maintenance pack) has been applied in the Oracle 11i E-Business Suite.

However in these "several" ways, what is the best method which is also suppored by Oracle Support ?

Yesterday i was doing patch analysis for installing Oracle Support Diagnostics 2.5 product (IZU) in our 11.5.10.2 production system. It listed 11i.ATG_PF.H.4 (Rollup 4) as the prequisite. So i downloaded patchsets.sh (the Oracle Patch Comparision utility) and ran the script to find the current Running Versions in our system for patchsets and family packs.

It listed 11i.ATG_PF.H(3438354) as the Running Version and 11i.ATG_PF.H.5(5473858) (Rel-By_Metalink) as the 'Latest Available' one in the Report_11i.txt output file.
 
So as per the patchsets.sh tool, there is no 11i.ATG_PF.H.4 (4676589) in our system. I went ahead with my patch analysis only to know later that '11i.ATG_PF.H.4' has indeed been applied to our system.

AD_PATCH_DRIVERS shows 4676589, whereas patchsets.sh does not show this is applied !! though Metalink Note 162524.1 shows either method is OK.

4 Different methods as per Metalink for finding patch information in 11i
------------------------------------------------------------------------
1) patchsets.sh (Patch Comparision Tool)
2) AD_PATCH_DRIVERS table
3) Two reports adphrept.sql (patch history) and adfhrept.sql(file
history) in $AD_TOP directory
4) Login to Oracle Applications Manager (OAM) => Applied Patches => Simple Search by 'Patch ID)

What is the best method out of the 4 above ?
---------------------------------------------
Finding patch history through OAM is the best method. It is easy to use and it has a web-based interface. AD_PATCH_DRIVERS too is good if you want to write a SQL statement.

What does the Documentation say ?
----------------------------------
According to the 'AD Maintenance Utilities' PDF for 11.5.10.2 , OAM is the best method for finding patch history.


Wednesday, September 19, 2007

Adding Datafile with same name but different directory

Yesterday, one of my colleagues added a datafile with the same filename but different directory by mistake through OEM. He quickly realized it but suprised to find that OEM did not complain and created the datafile successfully.

We checked the alertlog and there were no errors either. We tried the same thing in a different test database and no issues there too. So this proves that Oracle treats the filename in its entirety (including the directory path) and hence the new datafile is still unique.

However this is not a great practice and hence we are going to use our next scheduled downtime to rename this newly added datafile.

Renaming a Datafile when the database is shutdown
-------------------------------------------------

1) Login to all of the RAC database nodes and issue 'shutdown immediate'
2) cp existing_datafile_name new_datafile_name
3) Login to the master RAC node and issue 'startup mount'
4) alter database rename file 'old_directory/old_datafilename'
to
'new_directory/new_datafilename'
5) shutdown immediate
6) Login to all the of the RAC database nodes and issue 'startup'
7) Verify the new name in v$datafile and dba_data_files.
8) Remove the datafile with the wrong name from the operating system using the OS command.

Tuesday, September 18, 2007

Working on RMAN Duplication to a different host

Since a week i am working on changing our Oracle 11i E-Business Suite Database cloning procedure. In the current method the source database is put in hotbackup mode, data files and the needed archive logfiles are copied to the destination host and the clone/test database is created.

But the problem with this current method is we are not really testing the source database backups. A good DBA should always create test/clone databases from the source database backups instead of copying datafiles in backup mode. Our production database has a full biweekly RMAN tape backup. Though these backups are successful as per their "log files", they have never been used as a source to create for cloning purposes.

The right procedure now is to use these RMAN tape backups and then perform database cloning using the RMAN DUPLICATION feature.

We use VERITAS NETBACKUP as the Media Management Software for the RMAN tape backups. I am working with the Unix system administrator, NETBACKUP administrator but encountering some NETBACKUP errors. We opened a ticket with Veritas and i will blog again once the issue is resolved. I will also detail the issue rootcause and the solution for it.
Related Posts Plugin for WordPress, Blogger...