Wednesday, December 26, 2007

Database Roles

If you are familiar with Oracle Database, you will be happy to know that DB2 9.5 has introduced few new features that you may already be familiar with in Oracle. First of all, there are several SQL related compatibility features (e.g. support for dual, decode etc.). Check the developerWorks article DB2 Viper 2 compatibility features for more details. These compatibility features will greatly help the migration from Oracle to DB2 database.

Database Roles, another new feature in DB2 9.5, was also available in Oracle since long. A role has a certain set of privileges and a user who is granted membership in this role inherits those privileges. Roles are also available in Informix, SQL Server, Sybase and may be some other RDBMS’s.

DB2 provided an alternative to role in the form of group based authorization. You may grant one or more privileges to user groups and all the users in that group will automatically have those privileges. You may add a user to that group (equivalent to granting a role to a user) or remove a user from the group (equivalent to revoking a role from a user). However, there are two major limitations with group based authorization

  1. Privileges and authorities granted to groups are not considered when a user is creating views, materialized query tables (MQTs), SQL routines, triggers and packages containing static SQL. This is because groups and users are controlled externally from the DB2 database system, for example, by an operating system or an LDAP server. So if a user will be removed from a group, database may not be aware of it and would not update the validity of created objects mentioned above.
  2. A role can be granted to a role and so a role hierarchy is possible, which is not possible with group based authorization.

To overcome the above limitations, DB2 9.5 introduced roles in addition to group based authorization. In DB2, a role is a database object that groups together one or more privileges and can be assigned to users, groups, PUBLIC, or other roles by using a GRANT statement. For more details, check the Roles at DB2 Information Center.

Sunday, December 09, 2007

DB2 Partitioning

Divide and conquer- that’s the mantra used to deal with large volume of data in data warehouses. What are the features available in DB2 to divide or partition the data? DB2 provides following three features to partition your data.

Data Partitioning Feature (DPF)
Multi-Dimensional Clustering (MDC)
Table Partitioning (TP)

DPF allows you to partition the whole database. In fact, database partitions are defined on the instance level. Hence, all the databases under an instance are partitioned in the same way. DPF uses a hashing algorithm to distribute the data of a table across multiple database partitions based on the values in columns of distribution key of the table. Distribution key has to be selected carefully while creating the table to ensure a balanced distribution of data among all the database partitions. Normally, the distribution key with columns of high distinct cardinality provides good distribution of data. DPF is based on shared nothing architecture, i.e. each partition has its own processes, memory and disk storage. DPF allows you to scale the size and performance of your database by adding more partitions and additional resources (CPU, Memory, Disk storage) with each additional partition. Because of shared nothing architecture DPF can achieve almost linear scalability. Multiple partitions can be created on a single machine or it can be spread across multiple machines such that each machine can have one or more partitions.

MDC allows you to cluster your data on multiple dimensions. A common example is clustering your data warehouse fact table based on say following three columns: month of transaction date, region or state, industry of your customer. Once you cluster your table using above three dimensions, there will be a storage block (of extent size) allocated for each unique combination of values of these three dimensions. Also, a block index will be created which will have an entry pointing to each block rather than each row of the table. As a result, block indexes are much smaller compared to regular Row ID indexes and index scan on block indexes are much faster. Another advantage is that the data in a MDC table is self organized; you don’t have to reorg these tables. The biggest advantage of MDC table is query performance. If your query includes one or more dimensional columns in where clause, the block index will help to quickly figure out the subset of blocks that may have your result dataset. However, you should be aware that block indexes do not support index-only access. This is because a block index can point to an empty block, i.e. a block with now rows for a unique combination of dimension values. The performance of an MDC table is greatly dependent upon the proper choice of dimensions and the block (extent) size of the tablespace for the given data and application workload.

Table partitioning (TP) is a new feature in DB2 UDB V9. This feature is also known as range partitioning because using this feature you can define data partitions (not to be confused with database partitions in DPF) based on range of values of one or more than one columns. Each data partition is a separate database object and they can be placed in different tablespaces, if desired. Also, TP supports attaching and detaching a data partition from the TP table. A detached data partition becomes a regular table. These attach and detach features make the data roll-in and roll-out very convenient in a TP table.

So far I have very briefly described three partitioning features available in DB2 UDB. Another important notion is that all the three partitioning features can be used on the same table and decision of using one type of partitioning feature does not interfere with the decision of using other type of partitioning. Listed below are few good references which describe the DB2 partitioning features and its usage in data warehousing environment in more detail.

The IBM Redbook Leveraging DB2 Data Warehouse Edition for Business Intelligence – Chapter 10 (10.4 Partitioning)

IBM developerWorks article DB2 partitioning features

The IBM Redbook Up and Running with DB2 UDB ESE: Partitioning for Performance in an e-Business Intelligence World