Tuesday, January 13, 2009

Where a DB Request Spends Time?

While analyzing slow response or poor database performance, you might be curious to know where a database request spends time. This information will feed to database performance intelligence and guide the tuning effort in the right direction. However, I doubt if anyone ever found a satisfactory answer to this question for a DB2 database.

Basically, a database request may spend time either waiting for a service (wait time) or being serviced (service time). In fact this applies to any request to a computer system. The total time spent by a database request should be sum total of all the wait times and the service times for that request. Also, a database request may spend time inside the database system as well as outside the database system (e.g. Network). For a database request, the monitoring features of a database system should ideally provide all these wait times and service times spent inside database system.

In this post I will investigate the monitoring info available through db2batch and snapshot monitoring in attempt to identify the individual components that make up the total time spent by a SQL statement. First, I will investigate the monitoring info captured by db2batch. Listing 1 shows the SQL script I used for db2batch test

Listing 1

In case you are curious about the definition of table t1, it has the same structure as syscat.columns and I inserted all the rows from syscat.columns four times into t1. Also, t1 is created under a DMS tablespace with the associated buffer pool of just 1000 4K pages.

PERF_DETAIL 3 instructs db2batch to capture summary of monitoring information. Listing 2 shows the db2batch output after filtering out all the lines not related with time from the monitoring info

Listing 2

The SQL statement took total 130.567 seconds. Out of which Agent CPU Time was 124.36 seconds, but what about the remaining time (6.207 seconds). Even if we assume that buffer pool read time is not inclusive in Agent CPU Time, still it does not sum up to total elapsed time.

Next I will investigate the DB2 snapshot monitoring and identify any time related stats. Ideally, a database request should correspond to a statement. However, application level snapshot provides more time related stats, hence I will present a test example using application snapshot.

I created a test script as shown in Listing 3:

Listing 3

I ran the script with the time command as shown below

Listing 4

Here is the content of test.out

Listing 5

Listing 6 shows the time related lines from the application snapshot

Listing 6

Now let’s try to analyze the output of this test. The script test.sh took 2 minutes and 11.12 seconds (131.12 seconds). The query was run locally on the database server to avoid any network time. Also, the query returns just a single row to minimize the time associated with sending the result set to standard output. That’s the reason “Elapsed time of last completed uow” from application snapshot is pretty close to the time captured by the time command. Even the difference of fraction of a second is mostly because of other statements in test.sh (connect, get snapshot, terminate etc.).

Since the statement was run in auto-commit mode, the UOW comprises of single statement. Thus the SELECT statement took 130.9 seconds. Now let’s try to break up the 130.9 seconds and identify the individual components. We see that User CPU Time by agents is 124.78 seconds. Apart from that, the following time components sum up to less than a second

Listing 7

Adding them all to User CPU time will still be less than 126 seconds. So where was the remaining 5 seconds spent? In fact, many of the time components shown in Listing 7 (sort time, buffer pool read time etc.) might be inclusive in User CPU Time. In any case, we don’t have any accountability of at least 5 seconds. This gap of unaccountable time will be even longer for more complex queries. Also, note that all the time components shown in Listing 7 are not even available at the statement level where it makes more sense.

With these simple examples, I just wanted to show that DB2 monitoring does not provide the complete picture of all the time spent inside the database system. I have reached to the same conclusion even with the event monitoring. Ideally, the database monitoring should provide the total time spent by a statement and its individual components in a parent-child format. I would love to hear from you if you have any more insight or info regarding the time spent by a database request inside DB2.