Wednesday, December 26, 2007

Database Roles

If you are familiar with Oracle Database, you will be happy to know that DB2 9.5 has introduced few new features that you may already be familiar with in Oracle. First of all, there are several SQL related compatibility features (e.g. support for dual, decode etc.). Check the developerWorks article DB2 Viper 2 compatibility features for more details. These compatibility features will greatly help the migration from Oracle to DB2 database.

Database Roles, another new feature in DB2 9.5, was also available in Oracle since long. A role has a certain set of privileges and a user who is granted membership in this role inherits those privileges. Roles are also available in Informix, SQL Server, Sybase and may be some other RDBMS’s.

DB2 provided an alternative to role in the form of group based authorization. You may grant one or more privileges to user groups and all the users in that group will automatically have those privileges. You may add a user to that group (equivalent to granting a role to a user) or remove a user from the group (equivalent to revoking a role from a user). However, there are two major limitations with group based authorization

  1. Privileges and authorities granted to groups are not considered when a user is creating views, materialized query tables (MQTs), SQL routines, triggers and packages containing static SQL. This is because groups and users are controlled externally from the DB2 database system, for example, by an operating system or an LDAP server. So if a user will be removed from a group, database may not be aware of it and would not update the validity of created objects mentioned above.
  2. A role can be granted to a role and so a role hierarchy is possible, which is not possible with group based authorization.

To overcome the above limitations, DB2 9.5 introduced roles in addition to group based authorization. In DB2, a role is a database object that groups together one or more privileges and can be assigned to users, groups, PUBLIC, or other roles by using a GRANT statement. For more details, check the Roles at DB2 Information Center.

Sunday, December 09, 2007

DB2 Partitioning

Divide and conquer- that’s the mantra used to deal with large volume of data in data warehouses. What are the features available in DB2 to divide or partition the data? DB2 provides following three features to partition your data.

Data Partitioning Feature (DPF)
Multi-Dimensional Clustering (MDC)
Table Partitioning (TP)

DPF allows you to partition the whole database. In fact, database partitions are defined on the instance level. Hence, all the databases under an instance are partitioned in the same way. DPF uses a hashing algorithm to distribute the data of a table across multiple database partitions based on the values in columns of distribution key of the table. Distribution key has to be selected carefully while creating the table to ensure a balanced distribution of data among all the database partitions. Normally, the distribution key with columns of high distinct cardinality provides good distribution of data. DPF is based on shared nothing architecture, i.e. each partition has its own processes, memory and disk storage. DPF allows you to scale the size and performance of your database by adding more partitions and additional resources (CPU, Memory, Disk storage) with each additional partition. Because of shared nothing architecture DPF can achieve almost linear scalability. Multiple partitions can be created on a single machine or it can be spread across multiple machines such that each machine can have one or more partitions.

MDC allows you to cluster your data on multiple dimensions. A common example is clustering your data warehouse fact table based on say following three columns: month of transaction date, region or state, industry of your customer. Once you cluster your table using above three dimensions, there will be a storage block (of extent size) allocated for each unique combination of values of these three dimensions. Also, a block index will be created which will have an entry pointing to each block rather than each row of the table. As a result, block indexes are much smaller compared to regular Row ID indexes and index scan on block indexes are much faster. Another advantage is that the data in a MDC table is self organized; you don’t have to reorg these tables. The biggest advantage of MDC table is query performance. If your query includes one or more dimensional columns in where clause, the block index will help to quickly figure out the subset of blocks that may have your result dataset. However, you should be aware that block indexes do not support index-only access. This is because a block index can point to an empty block, i.e. a block with now rows for a unique combination of dimension values. The performance of an MDC table is greatly dependent upon the proper choice of dimensions and the block (extent) size of the tablespace for the given data and application workload.

Table partitioning (TP) is a new feature in DB2 UDB V9. This feature is also known as range partitioning because using this feature you can define data partitions (not to be confused with database partitions in DPF) based on range of values of one or more than one columns. Each data partition is a separate database object and they can be placed in different tablespaces, if desired. Also, TP supports attaching and detaching a data partition from the TP table. A detached data partition becomes a regular table. These attach and detach features make the data roll-in and roll-out very convenient in a TP table.

So far I have very briefly described three partitioning features available in DB2 UDB. Another important notion is that all the three partitioning features can be used on the same table and decision of using one type of partitioning feature does not interfere with the decision of using other type of partitioning. Listed below are few good references which describe the DB2 partitioning features and its usage in data warehousing environment in more detail.


The IBM Redbook Leveraging DB2 Data Warehouse Edition for Business Intelligence – Chapter 10 (10.4 Partitioning)

IBM developerWorks article DB2 partitioning features

The IBM Redbook Up and Running with DB2 UDB ESE: Partitioning for Performance in an e-Business Intelligence World

Sunday, July 29, 2007

db2_all and RAHUSER

If you need to run a command on all or a subset of database partitions, you can use db2_all.

For example, to change the database configuration LOGFILSZ to 100 for sample database on all database partitions that are specified in node configuration file, we can use the following command:

db2_all “UPDATE DB CFG FOR sample USING LOGFILSZ 100”

Similarly, if you have to change the database configuration only on partition 1, you can run the following command:

db2_all “<<+1<UPDATE DB CFG FOR sample USING LOGFILSZ 100”

If you have to change the database configuration on all the database partitions except partition 0, you can run the following command:

db2_all “<<-0<UPDATE DB CFG FOR sample USING LOGFILSZ 100”


On UNIX based platforms, db2_all uses the remote shell program specified by DB2RSHCMD registry variable. Starting with DB2 v8.1 FP 9, we can chose either rsh or ssh for DB2RSHCMD registry variable. If this registry variable is not set, rsh (or remsh for HP-UX) is used.

There is an environment variable RAHUSER, which can be used to specify the user ID under which the remote command is to be run on UNIX based platforms.

Though the primary purpose of db2_all is to run the command on all the database partitions that you specify, it can also be used for the purpose of running the command under a different user ID with the help of RAHUSER environment variable. Thus, db2_all becomes useful even in a single-partition database. However, db2_all is available with Enterprise Server Edition (ESE) only. So, if you have a Workgroup Server Edition (WSE) instance, you would not be able to use db2_all.

The above mentioned use of db2_all, which allows you to run a command under a different user ID is particularly helpful in situations where your task automation script always runs as a fixed user ID or a common user ID say ScriptUser and if you wish to run the commands inside those scripts under a different user ID say DB2CommandUser (e.g. a user with SYSDBA authorization). In such a scenario, you can use following two solutions:

1. Use the ssh or rsh command directly to run the commands inside script as DB2CommandUser.
2. Use db2_all instead to run the command inside script as DB2CommandUser. This allows you to have a generic script which can use either rsh or ssh remote shell program depending on the DB2RSHCMD registry variable.

If you are using ssh as remote shell program for db2_all, ssh authentication should be configured such that ScriptUser can run the commands under the login of DB2CommandUser without prompted for any password or passphrase. This can be verified by running the following command while logged in as ScriptUser.

ssh DB2CommandUser@<HostName>

pwd

If the above command runs successfully without prompting for any password or passphrase, it indicates that the ssh authentication has been setup correctly. In the above command <HostName> can be localhost or a remote host depending on where do you want to run the command.

Similarly, if you are using rsh remote shell program for db2_all, the .rhosts file under the DB2CommandUser home directory must include the following line in order to allow ScriptUser to run command under DB2CommandUser login.

<HostName>
ScriptUser

Where, <HostName> is the host name of computer where ScriptUser exists. The permission of .rhosts file should not allow the Group or Other to write to this file. The recommended permission for .rhosts is 600, though 644 will also work.

On Windows platform, db2_all uses “DB2 Remote Command Service” to run the commands on all database partitions that you specify. Also, RAHUSER should be the logon account associated with the DB2 Remote Command Service.

Friday, July 27, 2007

Copy and Drop Schema

Do you need to make the exact copy of all the objects in a given schema? Are you planning to write a script for this task? If you are using DB2 version 9, perhaps you can avoid writing this script and still perform the same task using a single statement.

DB2 version 9 provides a new stored procedure called ADMIN_COPY_SCHEMA that can be used to copy all the objects from one schema to another schema name. The syntax for this stored procedure can be found at


ADMIN_COPY_SCHEMA at DB2 Info Center

In addition to the copy schema stored procedure, DB2 9 also offers a drop schema stored procedure to drop all the objects in a schema. The stored procedure is called ADMIN_DROP_SCHEMA. The syntax of this stored procedure can be found at

ADMIN_DROP_SCHEMA at DB2 Info Center