Friday, October 1, 2010

Exadata Hybrid Columnar Compression

Summary
Hybrid Columnar Compression was introduced with the Exadata version 2 and is one of the most highlighted advantages of an Exadata as it improves performance and significantly reduces the storage footprint of the data on physical storage. By reducing the foot print of the data on physical storage it also reduces the foot print of that data in the database cache area as well therefore offering overall improvement in system memory resources in addition to physical storage. While you can use the columnar compression options in Exadata it does not prevent you from using the Oracle 11gR2 Advanced Compression as well giving you three different levels of compress based on the data within your database.

There are 2 types of columnar compression with Exadata, compress for query and compress for archive. The compress for query option offers a high level of compression, but not as high as for archive, but offers better query performance then the for archive option. While the OLTP compression in 11gR2 could work well for more online type access.

The advantages to the multiple types of compression is the level of compression based on the type of data. This gives the ability to true start to consider the Information Lifecycle Management of your data in regards to how it is stored and better manage your storage resources. For example let say you have a very large table partitioned by month where the first 24 months of data is subject to regular insert, update, delete and query activity, then from 24 months to 60 months the data is not updated, but query activity is heavy then data over 60 months is rarely accessed but does need to be available. Your best option for the most recent 24 months of database would most likely be OLTP compression. For the data between 24 and 60 months compress for query columnar compression would be your best option and the data over 60 months old would be best suited for compress for archive column compression.

Compress for Query Columnar Compression
Compress for query Columnar Compress offers a good balance between frequent query access and compression level. The compress for query can be high (default) or low. This indicates the level of compression and subsequently impact on CPU resources to handle the compression. The query option has between a 5-10% runtime hot on direct loads over an uncompress table load which in most cases can be considered minimal when compared to the disk savings achieved. The compression significantly reduces the size of the table and at the same time improves the performance of queries without very high impact on other system resources. With the implementation of Hybrid Columnar Compression for query high reduction consistently has between 60-70% . The table compress for query below shows some real world results when using the compress for query high option. With the implementation of Hybrid Columnar Compression with the compress for archive option

Compress for Query High
Uncompressed(MB) Compress      Reduction(MB)    % Reduction
903              344           559              61.90
1088             408           680              62.50
960              361           599              62.40
1088             416           672              61.76
1152             400           752              65.28
1091             400           691              63.33
1216             456           760              62.50
1112             408           704              63.31

Compress for Archive Columnar Compression
The compress for archive columnar compression offers the highest level of compression available, but with the higher level of compression the impact to system CPU resources increases therefore making it a less desired option where consistent query access is needed. The archive compress option averages between 70-80% size reduction over uncompressed tables. The archive option is also a significant hit on direct load runtimes as an average 50% runtime hits have been experienced over query compressed table direct loads. This hit on load time could also make this a less desired option for objects with frequent load activities where load runtimes are critical. As the name indicates it works very well for compressing down data that is archived and needs to be available, but is not accessed frequently or load times are a critical requirement.

Compress for Archive
Uncompressed(MB) Compress     Reduction(MB)    % Reduction
903              264          639              70.76
1088             304          784              72.06
960              272          688              71.67
1088             312          776              71.32
1152             336          816              70.83
1091             328          763              69.94
1216             352          864              71.05
1112             304          808              72.66 

Effectively Utilizing the Hybrid Column Compression              
Query and Archive Hybrid Columnar Compression is most effective for historical tables/date where the data no longer changes.  Being that they are historical tables they are typically quite large we usually will utilize with a partitioned table as best practice for large tables is to partition them.  The most effective use of the Hybrid Columnar Compression is to use them both within your partitioned historical table where the more frequently queries data partitions are query hybrid columnar compressed and the less frequently accessed data is archive columnar compressed.  This creates an effect Information Life Cycle Management that will help you manage your partition sizes based on their age and access frequency.  

Notice in the example below the more recently accessed data is partitioned by month allowing faster access to data through more fine tuned partition elimination.  Space is saved through the query hybrid columnar compression on those partitions.  Now as the data gets older partition maintenance is performed to combined partitions by year and the new partition for the entire year utilizes hybrid columnar compression to save a little more space and still keep the data accessible.

Example
CREATE TABLE ARCHIVE."CLAIM"
   (   "CLAIM_ID"            VARCHAR2(36)   NOT NULL ENABLE,
       "CLAIM_TYPE_CODE"     VARCHAR2(2)    NOT NULL ENABLE,
       "PAYER_PROCESS_STAGE" VARCHAR2(10),
       "SYS_CREATE_DATE"     DATE,
       "SYS_UPDATE_DATE"     DATE)
TABLESPACE ARCHIVE_2004
PARTITION BY RANGE (SYS_CREATE_DATE) interval (NUMTOYMINTERVAL(1,'month'))               
   (PARTITION CLAIM_2004 VALUES LESS THAN (TO_DATE ('1-JAN-2005','DD-MON-YYYY'))
                   NOLOGGING TABLESPACE ARCHIVE_2004 compress for archive high,
    PARTITION CLAIM_2005 VALUES LESS THAN (TO_DATE ('1-JAN-2006','DD-MON-YYYY'))
                   NOLOGGING TABLESPACE ARCHIVE_2005 compress for archive high,
    PARTITION CLAIM_2006 VALUES LESS THAN (TO_DATE ('1-JAN-2007','DD-MON-YYYY'))
                   NOLOGGING TABLESPACE ARCHIVE_2006 compress for archive high,
    PARTITION CLAIM_200701 VALUES LESS THAN (TO_DATE ('1-FEB-2007','DD-MON-YYYY'))
                   NOLOGGING TABLESPACE OTHER_PART_QUERY compress for query,
    PARTITION CLAIM_200702 VALUES LESS THAN (TO_DATE ('1-MAR-2007','DD-MON-YYYY'))
                   NOLOGGING TABLESPACE OTHER_PART_QUERY compress for query,
    PARTITION CLAIM_200703 VALUES LESS THAN (TO_DATE ('1-APR-2007','DD-MON-YYYY'))
                   NOLOGGING TABLESPACE OTHER_PART_QUERY compress for query
   )
compress for archive high;

No comments: