Sunday, December 09, 2007

DB2 Partitioning

Divide and conquer- that’s the mantra used to deal with large volume of data in data warehouses. What are the features available in DB2 to divide or partition the data? DB2 provides following three features to partition your data.

Data Partitioning Feature (DPF)
Multi-Dimensional Clustering (MDC)
Table Partitioning (TP)

DPF allows you to partition the whole database. In fact, database partitions are defined on the instance level. Hence, all the databases under an instance are partitioned in the same way. DPF uses a hashing algorithm to distribute the data of a table across multiple database partitions based on the values in columns of distribution key of the table. Distribution key has to be selected carefully while creating the table to ensure a balanced distribution of data among all the database partitions. Normally, the distribution key with columns of high distinct cardinality provides good distribution of data. DPF is based on shared nothing architecture, i.e. each partition has its own processes, memory and disk storage. DPF allows you to scale the size and performance of your database by adding more partitions and additional resources (CPU, Memory, Disk storage) with each additional partition. Because of shared nothing architecture DPF can achieve almost linear scalability. Multiple partitions can be created on a single machine or it can be spread across multiple machines such that each machine can have one or more partitions.

MDC allows you to cluster your data on multiple dimensions. A common example is clustering your data warehouse fact table based on say following three columns: month of transaction date, region or state, industry of your customer. Once you cluster your table using above three dimensions, there will be a storage block (of extent size) allocated for each unique combination of values of these three dimensions. Also, a block index will be created which will have an entry pointing to each block rather than each row of the table. As a result, block indexes are much smaller compared to regular Row ID indexes and index scan on block indexes are much faster. Another advantage is that the data in a MDC table is self organized; you don’t have to reorg these tables. The biggest advantage of MDC table is query performance. If your query includes one or more dimensional columns in where clause, the block index will help to quickly figure out the subset of blocks that may have your result dataset. However, you should be aware that block indexes do not support index-only access. This is because a block index can point to an empty block, i.e. a block with now rows for a unique combination of dimension values. The performance of an MDC table is greatly dependent upon the proper choice of dimensions and the block (extent) size of the tablespace for the given data and application workload.

Table partitioning (TP) is a new feature in DB2 UDB V9. This feature is also known as range partitioning because using this feature you can define data partitions (not to be confused with database partitions in DPF) based on range of values of one or more than one columns. Each data partition is a separate database object and they can be placed in different tablespaces, if desired. Also, TP supports attaching and detaching a data partition from the TP table. A detached data partition becomes a regular table. These attach and detach features make the data roll-in and roll-out very convenient in a TP table.

So far I have very briefly described three partitioning features available in DB2 UDB. Another important notion is that all the three partitioning features can be used on the same table and decision of using one type of partitioning feature does not interfere with the decision of using other type of partitioning. Listed below are few good references which describe the DB2 partitioning features and its usage in data warehousing environment in more detail.


The IBM Redbook Leveraging DB2 Data Warehouse Edition for Business Intelligence – Chapter 10 (10.4 Partitioning)

IBM developerWorks article DB2 partitioning features

The IBM Redbook Up and Running with DB2 UDB ESE: Partitioning for Performance in an e-Business Intelligence World

3 comments:

Anonymous said...

Your blogs are always useful
Regards,
Jack
http://db2examples.googlepages.com/

Ian Stevanovich said...

Radhesh-

I hope you are well. I came across your articles in Db2 magazine - I am a technical recruiter and actually working on a position calling for a very strong DB2 DBA/Developer in toronto...I wanted to see if you might know anyone, and conversely, if there's anything I can do to be of assistance to you. Please let me know...and feel free to contact me.

Best Regards,
Ian (istevanovich@siperian.com)

ganesh said...

Hi There,

I configured ssh and updated db2rshmd parameter for few servers it is working fine with out asking password.other servers it is asking password .The differens bet ween these servers are i moved to .ssh folder and cofigured it is working fine but some servers are I was in other folder and executed Can some one will guide me how to reconfigure without asking password for db2_all

Note :Even i performed remove .ssh folder and set null to db2rshmd and restarted instance angain configured but there is o use still it is asking password

Regards,
Ganesh K
ganesh.kunchakarra@gmail.com.