Saturday, October 25, 2014

alter table set unused versus drop column

Introduction

You may be asked to drop a column from a database table during a business day (or) to be marked as unusable immediately due to a security concern. Once you have approval to drop a column, it may be wise to first set it as unused and only drop it later. A drop column command incurs a lot of database I/O and hence should be avoided during a business day or peak business hour for the database.

Explanation with an example

Let us create 2 tables named test_unused and dropped respectively with 1 million rows and 3 columns named a,b,c in each of them. So these two tables are identical in number of columns and number of rows.

Table creation scripts:

create table scott.test_unused as
select level a,  level+1||'b' b, rpad('c-',mod(level,100),'*') c
from dual
connect by level <= 1e6;


create table scott.dropped as
select level a,  level+1||'b' b, rpad('c-',mod(level,100),'*') c
from dual
connect by level <= 1e6;


Case 1: Set the column as unused first

alter table test_unused set unused (B,C)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          2          0           0
Execute      1      0.00       0.01          0         98          1           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.01          0        100          1           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 60

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  log file sync                                   1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00
********************************************************************************


Case 2: Drop the column right away

alter table dropped drop (B,C)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1     19.96      76.81       9597      14486    2099302           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2     19.96      76.81       9597      14486    2099302           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 60

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                        235        0.32         22.15
  log buffer space                              126        0.70         21.34
  reliable message                               46        0.99          3.06
  rdbms ipc reply                                62        0.00          0.00
  log file switch completion                      9        0.25          1.52
  db file sequential read                        91        0.28          6.67
  latch: cache buffers chains                     1        0.00          0.00
  log file sync                                   1        0.36          0.36
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1       19.39         19.39
********************************************************************************


As you can see, setting a column as UNUSED just marks a column for logical deletion and the columns are no longer available for DML, DDL or SELECT commands. But the data in the columns is still intact in the segment and not dropped.

UNUSED  is much faster and as it doesn't update the data file with the changes. The actual DROP will take place after issuing DROP UNUSED columns. This is quite evident from the elapsed time of 0.1 seconds and wait events.

DROP will drop the column data from the segment right away and therefore you can notice the high elapsed time of 76.81  seconds.

Conclusion

Therefore, if the intent  is to cut-off access to columns without I/O latency in production during peak business hours, then first mark it as UNUSED and then during the weekend or off peak hours, complete the physical drop of the column using DROP UNUSED.