Friday, December 30, 2011

Oracle expdp tablespace mode


In my first expdp article, I have explained how to use expdp as sysdba and also other options such as nohup, excluding schemas, compression etc.  In this article, I will give you an example of expdp in tablespace mode. Traditionally, when one wants to take a schema backup, one uses expdp always as it is a logical backup.  For tablespace backups, most of the times DBAs take physical backups of tablespaces instead of using expdp.

However expdp too provides the tablespace option.  Infact, several tablespaces can be backed up at once using expdp. Also one can combine other options such as parallel,compression etc.  The below command is one such example.

expdp TABLESPACES=FINTS_DATA,HRTS_DATA dumpfile=DATA_PUMP_DIR:ts_dumps_%U.dmp LOGFILE=DATA_PUMP_DIR:ts_dumps_log.log    parallel=2    filesize=20G   compression=ALL

In the above example, expdp takes a backup of the contents of FINTS_DATA and HRTS_DATA. The expdp also runs quickly because of the parallel=2 option (provided there are more CPUs available in the database server). 

Furthermore, the filesize option above specifies that each dumpfile generated should not be more than 20G in size.  Oracle will create a second dumpfile if the first dumpfile's size already reaches the 20G threshold.  The %U in the dumpfile name guarantees uniqueness in the name.  

The compression option is a new feature in Oracle 11g. It compresses the output dumpfile by almost a ratio of 1:10 approximately. This is a great feature if the tablespaces have bigger sizes and one wants to save space in the DATA_PUMP_DIR directory for their exported dumpfiles.  Also, during a subsequent impdp process, Oracle automatically decompresses these dumpfiles during the data load, requiring no additional impdp option.  

Wednesday, December 28, 2011

RMAN Uncatalog using DBMS_BACKUP_RESTORE

Introduction

Most of the Oracle DBAs just rely on the command line RMAN interface ( $ORACLE_HOME/bin/rman ) to perform the rman backup and recovery commands. Every rman backup file (backup piece) generated is also called a backup piece handle. Some of the rman operations can be performed via the little documented DBMS_BACKUP_RESTORE package also and it turns out to be a quite useful in some cases such as when one wants  to uncatalog many RMAN backup pieces at once.

Problem

The other day I was transferring my RMAN backup pieces from one server to another server to create a clone database using rman restore and recovery commands. Due to some transfer issue, all the backup pieces (files) generated on a given day were corrupt during the transfer. Because Oracle Flash Recovery Area (FRA) was used, all the backup pieces were transferred to the FRA area. Some of you already know that FRA has a nicely laid out directory structure with one of the subdirectories containing the date the backup piece was generated.

For e.g /backup/flash_recovery_area/APPSDB/backupset/2011_02_15/o1_mf****_.bkp. Note the date 2011_02_15 in the directory path.

I already cataloged these backup pieces together using the catalog start with '/backup/flash_recovery_area/APPSDB/backupset/2011_02_15/' noprompt command.

Because these files were corrupted during the transfer process, the rman recovery failed. I fixed the problem using the below approach.

1) spool uncatalog_files.sql
2) set lines 200
3)
select 
'exec sys.dbms_backup_restore.deleteBackupPiece('||recid||','||stamp||','''||handle||''','||set_stamp||','||set_count||','||piece#||');'
from v$backup_piece 
where handle like '%2011_02_15%';
4) exit
5) Now run the uncatalog_file.sql file

The above dbms_backup_restore.deleteBackupPiece procedure removes the backup metadata about these corrupted files and also physically removes these files from the disk. Now transfer the backup pieces again from the source server to the destination server, catalog them again and retry your recovery. This time it must be successful.

Conclusion
One can also use the CHANGE backuppiece '#backup piece file name#' UNCATALOG command. But if there are dozens of backup pieces to be uncataloged all at once, the above dynamic generation of the uncatalog commands using dbms_backup_restore.deleteBackupPiece and v$backup_piece can be used.


Friday, November 25, 2011

Oracle ASM disk failure - Part 2

Introduction

In Part 1, I wrote about a scenario when ASM detects READ_ERRS/WRITE_ERRS and updates these columns in v$asm_disk for the ASM disk. The DBA has to explicitly drop the disk in ASM. This article is about a different scenario when ASM instance itself performs the 'drop disk' operation.

This post assumes that you are using ASM redundancy (Normal or High) and that you are not using ASMLib program. The commands and syntax could be different if you are using ASMLib.

Scenario

In this scenario, ASM drops the disk automatically. Furthermore, the READ_ERRS/WRITE_ERRS in v$asm_disk could be showing a value of NULL (instead of an actual count of READ or WRITE errors noticed).

How to identify the failed disk

Unlike scenario 1 discussed in Part 1 of the ASM series, ASM instance can initiate the 'drop disk' by itself in some situations. Let the failed disk be '/dev/sds1'.

select path
from v$asm_disk
where read_errs is NULL;

/dev/sds1

select path
from v$asm_disk
where write_errs is NULL

/dev/sds1

Additionally, the HEADER_STATUS in v$asm_disk returns a value of UNKNOWN.

select mount_status,header_status,mode_status,state
from v$asm_disk
where path = '/dev/sds1';

CLOSED UNKNOWN ONLINE NORMAL

Compare this scenario with that of the scenario mentioned in Part 1, when the HEADER_STATUS is still shown as MEMBER and the READ_ERRS/WRITE_ERRS has a value > 0.

The following are the errors mentioned in the +ASM alert log file when the failure was first noticed.

WARNING: initiating offline of disk
NOTE: cache closing disk
WARNING: PST-initiated drop disk

ORA-27061: waiting for async I/Os failed
WARNING: IO Failed. subsys:System dg:0, diskname:/dev/sds1


No "drop disk" command required by DBA

The disk is already dropped by ASM instance. There is no need of an "alter diskgroup ...drop disk" command again. Instead the DBA has to work with the system administrator and physically locate the failed disk in the disk enclosure and remove it.Add the replacement disk

1)Get the replacement/new device name, partition it and change ownership to the database owner. For example let the disk path after partitioning be /dev/sdk12)select distinct header_status from v$asm_disk where name = '/dev/sdk1'; (Must show as CANDIDATE)

3)alter diskgroup #name# add disk '/dev/sdk1';

4)ASM starts the re-balancing operation due to the above disk add command. One can monitor the progress of the re-balance operation by checking v$asm_operation.

select state,power,group_number,EST_MINUTES
from v$asm_operation;

After a few min/hours the above gets completed (no rows returned)

5)The disk add operation is now considered complete.

How to decrease the ASM re-balance operation time

While the above ASM re-balancing operation is in progress, the DBA can let it complete quickly by changing 'ASM power' by running the below command for example.

alter diskgroup #name# rebalance power 8;

The default power is 1 (i.e ASM starts one re-balance background process to handle the re-balancing work, called ARB process). The above command dynamically starts 8 ARB processes (ARB0 to ARB7), which can dramatically decrease the time to re-balance. The maximum power limit in 11g R1 is 11 (upto 11 ARB processes can be started).

Conclusion

I am not exactly sure why ASM shows the status of a failed disk in different ways, but these are two scenarios that I aware of so far.

None of the above maintenance operations (faile disk removal from the disk enclosure, new disk add) causes a downtime to the end user and therefore can be completed during normal business hours. The re-balance operation can cause slight degradation of performance and hence increase the power limit to let it complete quickly.

Friday, November 11, 2011

Oracle ASM disk failure - Part 1

Introduction

Oracle Automatic Storage Management (ASM) was introduced in Oracle 10g. ASM provides advance storage management features such as DISK I/O re-balancing, volume management and easy database file name management. It also can provide MIRRORING of data for high availability and redundancy in the event of a disk failure (Mirroring is optional). ASM guarantees that data extents (table,index row data etc.) in one disk are mirrored in another disk (normal redundancy) and in two disks (high redundancy).

A few times I have faced ASM disk failures when redundancy (mirroring) was enabled and none of them resulted in an issue for an end user. ASM automatically detects the disk failure and services Oracle SQL requests by retrieving information from the mirrored (other) disk. Such a failure is handled gracefully and entirely managed by Oracle. I am very impressed by the fault tolerance capability in ASM.

But soon the Oracle DBA must work with the system administrator to replaced the failed disk. If the mirrored disk also fails before the replacement, then Oracle SQL by end users will error because both the primary and mirrored disks have failed.

This post assumes that you are using ASM redundancy (Normal or High) and that you are not using ASMLib program. The commands and syntax could be different if you are using ASMLib.


How to identify a failed disk

An ASM disk failure as noted below is transparent to end users and one can be caught unaware if one is not proactive in database monitoring. The DBA can write a program that constantly checks the database alert logfile or a SQL script that checks for any read/write errors.

If either of the below queries return rows, then it is confirmed there are one or more ASM disks that have failed.

select path,name,mount_status,header_status
from v$asm_disk
where WRITE_ERRS > 0

select path,name,mount_status,header_status
from v$asm_disk
where READ_ERRS > 0;

But despite the read/write errors, the header_status column value may still be shown as "MEMBER".

Drop the failed disk

1) alter diskgroup #name# drop disk #disk name#;

Caution: Do NOT physically remove the failed disk YET from the disk enclosure of the server. The above command is executed immediately, but ASM also starts a lengthy re-balance operation. The disk should be physically removed only after the header_status for the failed disk becomes FORMER. This status is set after the re-balance operation is completed. One can monitor the progress of the re-balance operation by checking v$asm_operation.

state,power,group_number,EST_MINUTES
from v$asm_operation;

After a few min/hours the above operation will get completed (no rows returned). Then verify that the header_status is now FORMER and then request the System Administrator to physically remove the disk from the disk enclosure. The LED light for the failed disk should get turned off and this indicates the physical location of the failed disk in the enclosure.

Add the replacement disk

1) Get the replacement device name, partition it and change ownership to the database owner. For example let the disk path after partitioning be /dev/sdk1
2) select distinct header_status from v$asm_disk where name = '/dev/sdk1'; (Must show as CANDIDATE)

3) alter diskgroup #name# add disk '/dev/sdk1';
4) ASM starts the re-balancing operation due to the above disk add command.
One can monitor the progress of the re-balance operation by checking v$asm_operation.

select state,power,group_number,EST_MINUTES
from v$asm_operation;

After a few min/hours the above gets completed (no rows returned)

5) The disk add operation is now considered complete.


How to decrease the ASM re-balance operation time

While the above ASM re-balancing operation is in progress, the DBA can let it complete quickly by changing 'ASM power' by running the below command for example.

alter diskgroup #name# rebalance power 8;

The default power is 1 (i.e ASM starts one re-balance background process to handle the re-balancing work, called ARB process). The above command dynamically starts 8 ARB processes (ARB0 to ARB7), which can dramatically decrease the time to re-balance. The maximum power limit in 11g R1 is 11 (upto 11 ARB processes can be started).

Conclusion

None of the above maintenance operations (disk drop, disk add) causes a downtime to the end user and therefore can be completed during normal business hours. The re-balance operation can cause slight degradation of performance and hence increase the power limit to let it complete quickly.

Saturday, October 22, 2011

Database Administrator : Top Ten Best Jobs in America

CNN.com has released the BEST JOBS in AMERICA report for 2011. Database Administrator again stands in the TOP TEN. It is currently placed at No.8

Here is the LINK for more information


This is the 3rd time in a row that Database Administrator has been among the TOP TEN in DEMAND jobs in USA.




Thursday, June 9, 2011

Oracle expdp and impdp - Some useful tips

Introduction

Many of you may be using Oracle Data Pump Export (expdp) and Data Pump Import (impdp) for logical backups of objects/schemas as well as the full databases for say, performing a database platform migration. I have some useful tips for you in an Oracle 11g R1 database.

1) How to run expdp as sysdba

There are two ways of running it using sysdba. Either by giving the option in the command line, or giving the option when it prompts for the username. Note the escape characters in the command line

i) expdp \"/ as sysdba\" parfile=myparfile.lst
ii) expdp parfile=myparfile.lst

Export: Release 11.1.0.7.0 - 64bit Production on Thursday, 15 May, 2011 13:11:57

Copyright (c) 2003, 2007, Oracle. All rights reserved.

Username: / as sysdba

2) How to use nohup

nohup lets one run a operating system command in the background. The process won't be terminated even if the operating system (os) user has logged out. Using nohup for expdp can give errors unless a parfile (parameter file) is used.

The below is an example of using nohup to perform a full database export with the parfile option.

The parfile (fulldb_export.par) has the following contents

dumpfile=DATA_PUMP_DIR:mydb_full_%U.dmp
LOGFILE=DATA_PUMP_DIR:mydb_full_dump.log
parallel=4
filesize=40G
full=Y

nohup expdp \"/ as sysdba\" parfile=fulldb_export.par &


3) How to exclude schemas

Get the syntax right when you want to exclude a particular schema during the export or import process. In one wants to exclude the system schema, one has to use the EXCLUDE=SCHEMA:"in\('SYSTEM'\)" option.

4) How to exclude multiple schemas

If one wants to exclude multiple schemas during an export or import process, for example, SYSTEM and FIN schemas, one has to use the EXCLUDE=SCHEMA:"in\('SYSTEM','FIN'\)" option

5) Avoid filling up the $ORACLE_HOME location

By default, the DATA_PUMP_DIR (in dba_directories) points to $ORACLE_HOME/rdbms/log directory. If this location does not have much free space left, the $ORACLE_HOME can become 100% full in the case of large dump files, causing problems to the normal database operations. To avoid this issue, consider creating a symbolic link from $ORACLE_HOME/rdbms/log to a location (for e.g /lot_of_freespace) where you have lot of free space available.

cd $ORACLE_HOME/rdbms
mv log log.old
ln -s /lot_of_freespace log


6) Consider COMPRESSION option to reduce the export dump file sizes

Oracle 11g introduced the concept of compressing contents in the export dump file. I am impressed with the compression algorithm because in the event of an import operation (impdp), Oracle decompresses automatically (No additional options required). This is similar to the RMAN backup compression technique that is introduced in Oracle 10g. I have noticed good data compression ratios when used with the expdp command (Almost 1:10 to 1:15). However, it is not well documented in the Utilities guide whether COMPRESSION feature requires a separate license from Oracle (Oracle Advanced Compression)

The option for expdp is COMPRESSION=ALL. It is not applicable for impdp.

Friday, April 22, 2011

Flash Recovery Area - Disable during recovery

Introduction

Flash Recovery Area (FRA) is one my favorite features in the Oracle Database. The FRA helps in automatic management of archive log files, RMAN backup pieces, control file autobackups and the flash back log files (if the FLASHBACK DATABASE is turned on). The FRA nicely creates all these files with well defined sub directory structure (based on dates) and also automatically deletes obsolete backup pieces (if the RMAN recovery window is set).

The other day, I was tasked with creating a clone of an Oracle 11g R1 (11.1.0.7) production database using the RMAN restore and recovery commands on the destination host. As usual, i made a copy of the production database's parameter file and edited it as per the new database (clone) settings. I did NOT remove any existing parameters though and retained all parameters including the FRA parameters viz. DB_RECOVERY_FILE_DEST and DB_RECOVERY_FILE_DEST_SIZE

Problem

Our RMAN restore was not successful as just before the restore started, Oracle strangely deleted two datafiles (as per the alertlog) and then RMAN first created two empty datafiles with the same name which were deleted and then started the actual restore. These two datafiles were created years ago on the production database and they were online datafiles. Nothing wrong with them.


**FROM THE ALERT LOG***

Thu Mar 24 10:49:24 2011
Deleted Oracle managed file +ORADATA/prod_myhost_db/datafile/fints_index2.1887.709799705
Deleted Oracle managed file +ORADATA/prod_myhost_db/datafile/fints_index2.1906.723204215

**FROM THE RMAN LOG**

creating datafile file number=1630 name=+ORADATA/prod_myhost_db/datafile/fints_index2.1887.709799705
creating datafile file number=1649 name=+ORADATA/prod_myhost_db/datafile/fints_index2.1906.723204215

channel d1: starting datafile backup set restore
channel d1: specifying datafile(s) to restore from backup set
channel d1: restoring datafile 00030 to +ORADATA/prod_myhost_db/datafile/sales.1690.700575591
channel d1: restoring datafile 00063 to +ORADATA/prod_myhost_db/datafile/argus2_index2.263.700567185
channel d1: restoring datafile 00083 to +ORADATA/prod_myhost_db/datafile/bits_index2.1800.700576857
channel d1: restoring datafile 00112 to +ORADATA/prod_myhost_db/datafile/cbo_data2.1826.700576943
....
....
....

Solution

I got this working by disabling the FRA. I commented the two FRA parameters DB_RECOVERY_FILE_DEST and DB_RECOVERY_FILE_DEST_SIZE in the destination database's init parameter file. Then I recreated the +ORADATA ASM diskgroup (formatted the ASM disks to remove data from the other files that got restored so far) and restarted the restore. This time the restore was successful.

Conclusion

Turn off the FRA (remove/comment the DB_RECOVERY_FILE_DEST and DB_RECOVERY_FILE_DEST_SIZE) on the destination database until the cloning process is over (restore and recovery). Once the new database is opened successfully, the FRA can be turned on again.