Saturday, June 20, 2009

SQL Tips & Techniques- String Splitting

In my previous post SQL Tips & Techniques- String Aggregation, I discussed how to perform string aggregation. Recently, I received a question to do the reverse using SQL i.e. String Splitting.

Listing 1

For example, we have a table myTab2 with column values as shown above. As you can see, the string value in LIST column is a list of sub-strings separated with “, “. Now, we would like to split the string in LIST column and present the sub-strings in separate rows. Hence, the query should return two rows for CATEGORY = 1, three rows for CATEGORY = 2 and two rows for CATEGORY = 3.

Here is the query that performs the required string splitting.

Listing 2

The above query uses recursive SQL and it will return the following.

Listing 3

I learned the above use of recursive SQL from Knut Stolze’s developerWorks article Parsing Strings in SQL. If you know a different way to do the string splitting using SQL, please share it.

Saturday, April 11, 2009

DB2 Authentication & Password Encryption Method

Few days ago, I received an e-mail from my colleague with following error message

I verified the password of db2inst1 on OS level and the password “db2test” was correct. I verified the authentication related configuration parameters and everything looked okay. The authentication type was set to SERVER. The connection was working fine if we only run “db2 connect to test” and do not specify user and using keywords. The version of DB2 was 8.1 and the OS was SUSE Linux Enterprise Server 10.

You might be thinking how come I am still dealing with DB2 version 8.1. Well, as I have mentioned in my earlier posts, the company I work for (Stratavia Corporation) has software product for data center automation called
Data Palette. Data Palette still supports DB2 V8.1 and hence we have to test it on DB2 V8.1 also. Going back to the original problem, I approached our system administrator to figure out what is so unique about db2inst1 password that is causing the above authentication error. We found that the password encryption method used for db2inst1 was different than the other V8.1 instances where we don’t have the above mentioned authentication problem. On this problematic system, the password encryption method was “blowfish”. We changed it to “md5” as shown in the portion of “/etc/default/passwd” below. On SUSE Linux, “/etc/default/passwd” has the configuration parameter to control the password encryption method.

After changing the password encryption method, we set the password of db2inst1 and it was encrypted using md5 this time. On Linux systems, you can check the encrypted password in /etc/shadow file (you have to be root). The length of blowfish encrypted password is almost double than md5 encrypted password. Also, I think md5 encrypted password starts with $1 and blowfish encrypted password starts with $2.

After the above change, the authentication problem was gone. So it appears DB2 V8.1 (Fix pack 0) does not support authentication with password that is encrypted using blowfish encryption. Out of curiosity, I upgraded the same instance to V8.2 to test if blowfish password encryption will work or not and with V8.2 blowfish encryption worked fine. Hence, we should keep in mind that DB2 authentication (while using OS authentication) has dependency on password encryption method used by OS. The user password may work fine on the OS level, but DB2 authentication may still fail if DB2 does not support the password encryption method used by OS.

Monday, February 16, 2009

DB2 and EC2

Though irrelevant, there is a similarity of alphabetical pattern between DB2 and EC2. E comes just after D and C comes just after B in the alphabet. May be this similarity aroused my interest in EC2... J Also, IBM and Amazon have partnered together to deliver DB2 in Amazon EC2 (Elastic Compute Cloud) environment.

Cloud computing, the biggest buzz word in current date, is no more perceived as mere hype. It has already started helping small startup companies. There are new businesses emerging on top of cloud computing infrastructure. Gradually more and more companies will start embracing the cloud computing.

Though I have been hearing about cloud computing for over a year, I never bothered to think about how similar or different it would be managing DB2 databases hosted in cloud computing environment. When I read the IBM’s announcement (actually followed the link from Anant Jhingran’s blog post), it crossed my mind that very soon we DBA’s may have to deal with the DB2 instances hosted in the cloud computing environment. Suddenly, several questions started arising in my mind. Will it be accessible through web browser only or we can still use our favorite SSH client or Remote Desktop (in case of Windows)? How will be the database physical design? Can we also access these virtual servers through VPN and so on...?

Out of curiosity, I thought of signing up for the Amazon’s service and get a feel of EC2. Looking at their pricing, it did not appear to cost much just to get a feel of it. On a second thought, I decided to go through their documentation first. The documentation is concise and to the point. It helped to clear the clouds over my understanding of Elastic Compute Cloud.

Here is a very high level summary of what I learned about EC2 from their documentation.
  1. The virtual servers in EC2 environment are basically the running instances of AMI (Amazon Machine Image). AMI contains all the software, including OS and associated configuration settings, applications, libraries etc.
  2. There are already many existing AMI (free as well as paid). Also, new AMI can be created from scratch or based on an existing AMI. However, new AMI has to be saved on Amazon S3 (Simple Storage Service) and there is a charge for this storage.
  3. There are different types of instances based on CPU, Memory, Storage and I/O requirements and accordingly Amazon has different pricing for each type of instance.
  4. AMI instances are not persistent. You lose all the changes when you shutdown your OS or terminate the AMI instance.
  5. For all the online data that requires persistence (for example, a DB2 database), they have a different solution. It’s called Elastic Block Store (EBS). EBS is storage volume that can be attached to the host (running instances of AMI) and it can either be used as raw device or a file system can be created on it.
  6. Snapshots of EBS can be created on the Amazon S3 (Simple Storage Service). EBS snapshot can be used to launch new EBS volumes using the previous snapshot as the starting point of those new volumes.
  7. Static IP addresses can also be assigned to instances. They are called Elastic IP address.
  8. We can access the servers hosted in EC2 environment using SSH client or Remote Desktop.

I like the ease of provisioning, server downsizing and upsizing available in EC2 environment. Also, it appears to me that set up and maintenance of disaster recovery and business continuity solutions will be relatively easier in EC2 environment. Testing the additional computing power (capacity planning) for increased workload will be much easier and cost effective. We don’t have to make permanent investment in computing power and then later realize that the investment did not solve the capacity problem.

Having said all the good things about EC2, I do agree that we will come across several unpleasant aspects of it when we start using it. I found some of the experiences discussed on Colin Percival’s blog.

In general, one of the big concerns about cloud computing is company policies and security issues related with storing company’s data outside company network. Though EC2 appears to have reasonable security provision, I am not an expert on network security and hence I will abstain from making any comment on security. However, if cloud computing provides or will provide robust security, I would anticipate revisions in company policies over a period of time to adapt to the new computing environment.

Tuesday, January 13, 2009

Where a DB Request Spends Time?

While analyzing slow response or poor database performance, you might be curious to know where a database request spends time. This information will feed to database performance intelligence and guide the tuning effort in the right direction. However, I doubt if anyone ever found a satisfactory answer to this question for a DB2 database.

Basically, a database request may spend time either waiting for a service (wait time) or being serviced (service time). In fact this applies to any request to a computer system. The total time spent by a database request should be sum total of all the wait times and the service times for that request. Also, a database request may spend time inside the database system as well as outside the database system (e.g. Network). For a database request, the monitoring features of a database system should ideally provide all these wait times and service times spent inside database system.

In this post I will investigate the monitoring info available through db2batch and snapshot monitoring in attempt to identify the individual components that make up the total time spent by a SQL statement. First, I will investigate the monitoring info captured by db2batch. Listing 1 shows the SQL script I used for db2batch test

Listing 1

In case you are curious about the definition of table t1, it has the same structure as syscat.columns and I inserted all the rows from syscat.columns four times into t1. Also, t1 is created under a DMS tablespace with the associated buffer pool of just 1000 4K pages.

PERF_DETAIL 3 instructs db2batch to capture summary of monitoring information. Listing 2 shows the db2batch output after filtering out all the lines not related with time from the monitoring info

Listing 2

The SQL statement took total 130.567 seconds. Out of which Agent CPU Time was 124.36 seconds, but what about the remaining time (6.207 seconds). Even if we assume that buffer pool read time is not inclusive in Agent CPU Time, still it does not sum up to total elapsed time.

Next I will investigate the DB2 snapshot monitoring and identify any time related stats. Ideally, a database request should correspond to a statement. However, application level snapshot provides more time related stats, hence I will present a test example using application snapshot.

I created a test script as shown in Listing 3:

Listing 3

I ran the script with the time command as shown below

Listing 4

Here is the content of test.out

Listing 5

Listing 6 shows the time related lines from the application snapshot

Listing 6

Now let’s try to analyze the output of this test. The script took 2 minutes and 11.12 seconds (131.12 seconds). The query was run locally on the database server to avoid any network time. Also, the query returns just a single row to minimize the time associated with sending the result set to standard output. That’s the reason “Elapsed time of last completed uow” from application snapshot is pretty close to the time captured by the time command. Even the difference of fraction of a second is mostly because of other statements in (connect, get snapshot, terminate etc.).

Since the statement was run in auto-commit mode, the UOW comprises of single statement. Thus the SELECT statement took 130.9 seconds. Now let’s try to break up the 130.9 seconds and identify the individual components. We see that User CPU Time by agents is 124.78 seconds. Apart from that, the following time components sum up to less than a second

Listing 7

Adding them all to User CPU time will still be less than 126 seconds. So where was the remaining 5 seconds spent? In fact, many of the time components shown in Listing 7 (sort time, buffer pool read time etc.) might be inclusive in User CPU Time. In any case, we don’t have any accountability of at least 5 seconds. This gap of unaccountable time will be even longer for more complex queries. Also, note that all the time components shown in Listing 7 are not even available at the statement level where it makes more sense.

With these simple examples, I just wanted to show that DB2 monitoring does not provide the complete picture of all the time spent inside the database system. I have reached to the same conclusion even with the event monitoring. Ideally, the database monitoring should provide the total time spent by a statement and its individual components in a parent-child format. I would love to hear from you if you have any more insight or info regarding the time spent by a database request inside DB2.