Saturday, March 20, 2010

Where a DB Request Spends Time in DB2 9.7?

This title may sound somewhat familiar if you have been following my Blog for over a year. More than a year ago, I wrote about the shortcomings of time spent monitoring in DB2 (see my post “Where a DB Request Spends Time”). Last year when DB2 9.7 was released (around June 2009), I was thrilled to see “Time-spent monitor elements are more comprehensive” under monitoring enhancements in DB2 9.7. Since then I have been thinking about installing DB2 9.7 on my laptop and having a closer look at this enhancement. Finally, I got chance to do so and here I am going to share my findings.

Without spending too much time trying to come up with a more complex workload, I created a table t3 with similar structure as syscat.columns and populated it with rows from syscat.columns. Then I ran a query against this table as shown below. The real elapsed time of this SQL was more than 38 minutes (long enough for my analysis purpose).

Listing 1

Before I start analyzing the time spent by above SQL, let me list some of the queries that I used for this analysis.

Following query gives the client idle wait time, total request time, total wait time and various processing times for each connection in the database.

Listing 2

If you are interested to drill down the total wait time of each connection, following query can be used.

Listing 3

A connection can have more than one Unit of Work (UOW) and if you are interested to monitor the time spent on UOW level, following query can be used.

Listing 4

Again, if you want to drill down the total wait time of each UOW, following query can be used.

Listing 5

If you are interested to monitor time spent on statement level, you can use mon_get_pkg_cache_stmt table function. Following query gives info on time spent by top 10 dynamic SQL statements (sorted by statement execution time).

Listing 6

If you want to drill down the wait time of dynamic SQL statements returned by above query, following query can be used.

Listing 7

In all the above monitoring queries (Listing 2 through 7), I have intentionally left the WHERE clause as you can use the appropriate WHERE clause as per your specific need. Also, the result set of monitoring table functions used in above queries can be altered through the input parameters passed to those table functions. Check the documentation of these table functions for more detail.

In each of the above monitoring queries, SELECT list has indentation and the indented columns give the individual components of the time spent monitoring element selected just before the indented columns. “Hierarchy of time-spent monitor elements” on DB2 Information Center was very helpful while writing above queries. Also, you might have noticed the most of the wait time components are suffixed with “_wait_time” and similarly most of the processing time components are suffixed with “_proc_time”. Any remaining ambiguities get clarified by looking at the documented hierarchy of time-spent monitor elements.

Though, I found the time-spent monitoring elements available at each level (Connection, UOW, Statement) comprehensive, here I am going to present only the UOW level time spent monitoring for the SQL execution in Listing 1.

Here is the output of db2_mon_uow.sql at the end of SQL execution in Listing 1. To make it more readable, I have presented it vertically.

Listing 8

In the above output and in fact in all the columns selected in Listing 2 through 7, the unit of time is millisecond. As you can see total request time (~2242 seconds) is very close to sum of various time components (~2237 seconds) that constitutes the total request time. Even a minor difference (5 seconds or 0.2 %) can be clearly explained by “Other” category as documented under “Hierarchy of time-spent monitor elements” on DB2 Information Center. Though, the real elapsed time as shown in the output of “time” command (in Listing 1) is higher than the total request time reported by DB2, it can be due to the difference in time recording boundaries used by “time” command versus what’s used by DB2.

Similarly, if you want to drill down the total wait time, here is the output from db2_mon_uow_wait.sql

Listing 9

The difference between total wait time and sum of individual wait time components is just 65 milliseconds.

With this time spent monitoring enhancement, now we can easily find out where a DB request is spending maximum time and accordingly address any problem areas in database if present. Kudos to IBM DB2 team for putting this enhancement in DB2 9.7.


NHM said...

In most of those monitoring table functions you seem to need to know the application handle of the session you desire to monitor.

Assuming that the session A, that run the "Select count(*) FROM t3", is also the one fetching for the statistics.
How can session A get its own application handle to run the query in say, code listing 2.

Finally, the COMPILE PROC TIME, SECTION PROC TIME, ... LOAD PROC TIME; are not all these proc times equal to the TOTAL_CPU_TIME?

If so, I am under the impression that I am not geting results as good as yours. In my case, my query elapsed time, seems to be considrably larger than my connection TOTAL_CPU_TIME + TOTAL_WAIT_TIME.

Thanks for your time.

Radhesh Kumar said...


You are right, most of the monitoring table functions have an input parameter for application handle. If you provide a specific application handle, it will output monitoring data for only that application (session). If you don't provide a specific application handle (provide null), it will output monitoring data for all the applications.

Since in DB2 each session is uniquely identified by an application handle, I am not sure why will you run the monitoring query from the same session and also expect the output only for monitored query. If you are using connection level monitoring table function, it will provide you the cumulative time spent monitoring data for all the SQLs executed under that connection. Ideally, your monitoring session should be separate from the session that you want to monitor. You have option of getting time spent monitoring data on connection (session) level as well as on UOW (transaction) level.

Regarding your question on TOTAL_CPU_TIME, if I understand correctly, you are getting TOTAL_CPU_TIME less than summation of all the processing times. It would be good to know the exact values that you are getting. However, TOTAL_CPU_TIME should be looked to get an idea about amount of CPU used (as provided by operating system). The way I understand, processing time includes CPU usage and also the overhead associated to use the CPU (excluding the explicit wait time on hardware/software resource). That's why if you look at the hierarchy of time-spent monitor elements, it does not mix up CPU time there.

Hope this clarifies your doubts.

NHM said...

Hi Radhesh Kumar,

I am monitoring these statistics from withing the same session that runs the query, because once the session that runs the query "terminates" (closes connection), it disappears from the mon_get_connection table function.

Therefore, since the statistics should be collected once the "select count(*) from t3" finishes, my only option is to do it inside that same session.

I'd make the query-session run with 3 transactions, connect, T1, T2, T3 and terminate:
(T1) Would save the statistics you illustrated in your code listings. Into snapshot table s1.

(T2) would run the "select count(*)".

(T3) would snapshot the same statistics as T1 into the table s2.

The snapshot difference (s2 - s1) should represent the wait time and processing time of the query run in T2.

I don't understand how you use a separate session to monitor these statics. I understand that these monitoring table functions give information about all connections and units of work in the database. But I am under the impression that this information only lingers in the system while the said connections are active.

"if I understand correctly, you are getting TOTAL_CPU_TIME less than summation of all the processing times."

Well, it is either that or the TOTAL_WAIT_TIME. One of the two statistics must have lower value than it should.
For instance, I have run a query for which the elapsed time was 3504 seconds, the_wait_time 84 seconds and the total_cpu_time 457 seconds.

Bellow Is the code I used to take the snapshot s2. At this point in time i was using the statistics I had considered most interesting. Now, i am going to try your code to see if I get better results.



NHM said...

Hi again,

It seems that the addition of all the PROC_TIME statistics you described in your code listings works much better than the TOTAL_CPU_TIME.

Thanks, that was a life saver blog entry. :)

Radhesh Kumar said...

Following comment from NHM did not get published due to website error.

Hi again,

It seems that the addition of all the PROC_TIME statistics you described in your code listings works much better than the TOTAL_CPU_TIME.

Thanks, that was a life saver blog entry. :)


Glad to know that you have better stats now. I will go through your previous comment later and respond to it.

Radhesh Kumar said...


Sorry for the delayed response on your previous comment. Seems like you would like to incorporate time spent monitoring as part of your application. In that case, the approach you have suggested is reasonable.

The general approach I have suggested is mostly for DBAs when they are trying to analyze the time spent by different application connections in their database. In that case, they would like to analyze the time spent so far by a given application connection (connection is still active).

If a DBA would like to test and analyze end to end time spent by a given SQL (say X), that's also possible through MON_GET_CONNECTION. Basically he will connect to database and run the SQL X. From a different session he will find out the application handle for the session running SQL X. When the SQL X completes, connection is still active since he did not terminate the connection. At that point, DBA will query MON_GET_CONNECTION (from a different session) with the application handle for the session which just ran SQL X. Any time spent before and after the execution of SQL X (in the associated connection), will be reported under CLIENT_IDLE_WAIT_TIME.

In addition, DBAs can also use MON_GET_PKG_CACHE_STMT table function to find out the end to end time spent monitoring info of various SQLs run against database (as long as it's still in package cache). However, the metrics returned by this table function is aggregate of the metrics gathered during each execution of the statement and hence it should be divided by num_executions to get the average value of metrics per execution.

Anonymous said...

Hi Radhesh Kumar

i have this setting in my db (DB2 9.7 fixpack 2, AIX).

Request metrics (MON_REQ_METRICS) = EXTENDED

Activity metrics (MON_ACT_METRICS) = NONE

Object metrics (MON_OBJ_METRICS) = NONE

Unit of work events (MON_UOW_DATA) = NONE

So i expect that i have the data into:

Instead, i have the data into MON_GET_CONNECTION but not into MON_GET_UNIT_OF_WORK.

Thanks in advance


Radhesh Kumar said...

Hi Franco,

Sorry for the dealy in response. You are right, with the setting you have, you should get data from MON_GET_UNIT_OF_WORK. If you are not getting data from this table function, you may contact IBM support for troubleshooting.


Nitin Gandhi said...

Came accross your blog via google search. very informative.keep it up buddy.