Wednesday, December 12, 2007

Oracle Table/Data Compression

Description/Overview

Oracle Tables/Data Compression was introduced with the 9i version of the database.

This feature was a huge step in reducing the size of our large tables and helping reduce physical I/O resulting from queries against those large tables.

Table data compression works by eliminating duplicate values in a block. The more duplicate values that are in the block the better the compression is. All duplicate values for the block are stored once and are put in the beginning of the block called a symbol table for the block. All the values in the symbol table for the block are referenced by the occurrences of those values in the block. This type of compression could achieve better compression ratios by loading data in a specific order, say like primary key order where primary key makes up several columns. Also larger block sizes can yield greater compression ratios main reason is the possibility of duplication in a block is greater to more data that can go into the block. Also keep in mind that pctfree is set to 0 on all tables created with the compress keyword.

Table data Compression works on partitioned and non-partitioned tables as well as partitioned and non-partitioned materialized views (snapshots). Data compression when the tables is partitioned compression can be applied to all partitions or select individual partitions. This allows the flexibility for instance to allow a partition to be come static before partitioning it. However table data compression will not maintain compression therefore a table, tablespace, table partition can contain compressed and uncompressed data. The reason the compression is not maintained as compression only works with direct load type operations such as Direct SQL Load, Create Table … as Select and parallel inserts or inserts using the /*+ APPEND */ hint. Since the compression is not maintained it is possible to cause some fragmentation through row chaining and wasted disk space when modifying compressed data.

Oracle indicates the impact of compression for read operations is so minimal that negative impact on queries does not need to be taken into account when considering implementing table data compression. The impact of table data partitioning happens at the time of the bulk loading of the data into the table.

There are several ways to apply data compression to a table:

1. Set at the tablespace level, this will set so that all tables created in the tablespace will be compressed by default.
2. Set with the compress keyword at table creation.
3. Move the table via the alter table … move.


Benefits

1. Can reduce physical I/O required especially for full table scans as well as physical I/O operations while using indexes when many rows are returned as each physical block read can contain more data.
2. Improve performance of queries with higher physical I/O activity. Some query testing has shown in a lot of cases a 50+% reduction in physical I/O.
3. Most tables during testing showed a 50% or better disk space savings by compressing the table. Therefore compressing large tables can significantly reduce physical disk space usage.
4. Improved Buffer Cache utilization from compressed blocks in buffer cache memory.

Limitations

1. CPU hit, impact seems minimal in most cases. Larger full table scans appear to have the largest CPU impact. Be sure to test larger queries against large tables to be sure CPU impact does not reach the limits of the host hardware therefore impacting performance though a CPU bottleneck vs. an I/O one.
2. Table data compression does not compress new rows. Compression only can happen on direct DML operations. This creates a situation were to maintain the compression periodic maintenance may be required.
3. Direct DML operations performance is impacted due to having to compress the data.
4. Table data compression does not work on LOB data type columns (CLOB, BLOB, etc.) and data types derived from LOBs like VARRAYs or XML data stored in CLOB column.



Calculating the Compression Ratio for an existing table

Below is a function that will calculate the compression ratio for an existing table, it accepts the table name as a parameter. The function came from Table Compression in Oracle9i Release2 An Oracle White Paper May 2002

create function compression_ratio (tabname varchar2)
return number
is

-- sample percentage
pct number := 0.000099;
-- original block count (should be less than 10k)
blkcnt number := 0;
-- compressed block count
blkcntc number;

begin
execute immediate ' create table TEMP_UNCOMPRESSED pctfree 0 as select * from ' tabname ' where rownum <>

while ((pct <>

execute immediate 'truncate table TEMP_UNCOMPRESSED';

execute immediate 'insert into TEMP_UNCOMPRESSED select * from ' tabname ' sample block (' pct ',10)';

execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_UNCOMPRESSED' into blkcnt;

pct := pct * 10;

end loop;

execute immediate 'create table TEMP_COMPRESSED compress as select * from TEMP_UNCOMPRESSED';

execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_COMPRESSED' into blkcntc;

execute immediate 'drop table TEMP_COMPRESSED';

execute immediate 'drop table TEMP_UNCOMPRESSED';

return (blkcnt/blkcntc);

end;

/

Summary

Table data compression does indeed shrink the amount of space a table takes in the database. The small test shows that there is indeed an impact on creating the table with a create table … as select where the compress table create took 2.01 seconds and the uncompressed table took .40 seconds. Not a large difference in time here but one that does show that there is indeed an impact on the create table … as select DDL. One very large loads the time could be significant. More testing with larger sets appears to be needed.

The table data compression loads using /*+ APPEND */ hint of an insert select had much closer timings to normal insert where compressed load test took 4.85 seconds and the uncompressed table insert too 3.26 seconds.

Table data compression reads did show a slight performance gain in elapse time as expected. Indeed the read from the compressed table showed fewer buffer gets and fewer physical reads as well as showed better elapsed time as we would expect doing less physical and logical reads. The read from the compressed table showed 4121 logical reads and 303 physical reads and a elapse time of 2.25 seconds where as the read from the uncompressed table did 5436 logical reads and 1724 physical reads and had an elapse time of 3.12 seconds. This shows that there is a level of compression for the blocks in the buffer cache as well as the compression for the physical disk reads, but the gets do not match the 50% reduction in size of the table.

Conclusion, this test was done on a Windows laptop running an Oracle 10.2.0.1 database using Windows XP and that based on the small test and the laptop it was done on it does show that further testing on a more robust system and varying table sizes is needed. Based on the tests physical and logical read differences and knowing the overhead compression can have on the CPU of a system I would conclude that compression for the data in a table can offer significant space savings for sure and potentially good I/O read savings for queries which can result in lower elapse times. However for the lower elapse times there needs to be sufficient CPU capacity to handle the decompression on larger physical read operations.


Test Results Summary

SQL>
SQL> -- let see how long it takes to create the tables
SQL> -- and load the data for compressed and
SQL> -- uncompressed versions.
SQL> set timing on
SQL>
SQL> -- create a table compressed and load with data
SQL> create table test_compression_compressed
2 COMPRESS
3 as
4 select * from dba_tables ;

Table created.

Elapsed: 00:00:02.01
SQL>
SQL> commit ;

Commit complete.

Elapsed: 00:00:00.00
SQL>
SQL> -- create a table and load with data uncompressed
SQL> create table test_compression_uncompressed
2 as
3 select * from dba_tables ;

Table created.

Elapsed: 00:00:00.40
SQL>
SQL> commit ;

Commit complete.

Elapsed: 00:00:00.00
SQL>
SQL> set timing off
SQL>
SQL> -- Lets see how large our compressed table is
SQL> select sum(bytes)/1024 from dba_segments where owner = 'MRMESSIN' and segment_name = 'TEST_COMPRESSION_COMPRESSED' ;

SUM(BYTES)/1024
---------------
128

SQL>
SQL> -- Lets see how large out uncompressed table is
SQL> select sum(bytes)/1024 from dba_segments where owner = 'MRMESSIN' and segment_name = 'TEST_COMPRESSION_UNCOMPRESSED' ;

SUM(BYTES)/1024
---------------
512

SQL>
SQL> -----------------------------------------------------------------------------------------------------------------------
SQL> set timing on
SQL>
SQL> -- Now lets insert a ton more records into the compressed table and show timing
SQL> declare
2 l number := 0 ;
3
4 begin
5
6 loop
7 if l > 30 then
8 exit ;
9 end if ;
10
11 l := l + 1 ;
12
13 insert /*+ APPEND */ into test_compression_compressed select * from dba_tables ;
14 -- we have to commit after each insert append operation or get an error.
15 commit ;
16 end loop ;
17 end ;
18 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:04.85
SQL>
SQL>
SQL> -- Now lets insert a ton more records into the uncompressed table and show timing
SQL> declare
2 l number := 0 ;
3
4 begin
5
6 loop
7 if l > 30 then
8 exit ;
9 end if ;
10
11 l := l + 1 ;
12
13 insert into test_compression_uncompressed select * from dba_tables ;
14 commit ;
15 end loop ;
16 end ;
17 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.26
SQL>
SQL> set timing off
SQL>
SQL> -- Lets see how large our compressed table is
SQL> select sum(bytes)/1024 from dba_segments where owner = 'MRMESSIN' and segment_name = 'TEST_COMPRESSION_COMPRESSED' ;

SUM(BYTES)/1024
---------------
3072

SQL>
SQL> -- Lets see how large out uncompressed table is
SQL> select sum(bytes)/1024 from dba_segments where owner = 'MRMESSIN' and segment_name = 'TEST_COMPRESSION_UNCOMPRESSED' ;

SUM(BYTES)/1024
---------------
14336

SQL>
SQL> -- flush buffer cache
SQL> alter system flush buffer_cache ;

System altered.

SQL>
SQL> -- Check timing on compressed table with full table scan
SQL> set autotrace traceonly
SQL> set timing on
SQL> select * from test_compression_compressed ;

56318 rows selected.

Elapsed: 00:00:02.25

Execution Plan
---------------------------------------------------------
Plan hash value: 1423006034

------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
-------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 44034 19M 100 (6) 00:00:02
1 TABLE ACCESS FULL TEST_COMPRESSION_COMPRESSED 44034 19M 100 (6) 00:00:02
-----------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
28 recursive calls
1 db block gets
4121 consistent gets
303 physical reads
4432 redo size
3161898 bytes sent via SQL*Net to client
41675 bytes received via SQL*Net from client
3756 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts(disk)
56318 rows processed

SQL> set autotrace off
SQL> set timing off
SQL>
SQL> -- Flush buffer cache
SQL> alter system flush buffer_cache ;

System altered.

SQL>
SQL> -- Check timing on uncompressed table with full table scan
SQL> set autotrace traceonly
SQL> set timing on
SQL> select * from test_compression_uncompressed ;

56319 rows selected.

Elapsed: 00:00:03.12


Execution Plan
----------------------------------------------------------
Plan hash value: 3640389637

---------------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
---------------------------------------------------------------------------------------------------
0 SELECT STATEMENT 59403 26M 530 (2) 00:00:06
1 TABLE ACCESS FULL TEST_COMPRESSION_UNCOMPRESSED 59403 26M 530 (2) 00:00:06
--------------------------------------------------------------------------------------------------

Note
----
- dynamic sampling used for this statement


Statistics
-------------------------------------------------------
29 recursive calls
1 db block gets
5436 consistent gets
1724 physical reads
132 redo size
12545290 bytes sent via SQL*Net to client
41675 bytes received via SQL*Net from client
3756 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
56319 rows processed

SQL> set autotrace off
SQL> set timing off
SQL>
SQL> set echo off


2 comments:

sap testing said...

Yes its a great feature of Oracle that helps to reduce the size of bulky tables thus by reducing the problem of memory usage. Also you have given a nice overview of this feature. Each and every point is made clear in the above post. Thanks for this great description.

sap testing said...

Yes its a great feature of Oracle that helps to reduce the size of bulky tables thus by reducing the problem of memory usage. Also you have given a nice overview of this feature. Each and every point is made clear in the above post. Thanks for this great description.