Friday, February 29, 2008

SQL Tips & Techniques- String Aggregation(Contd.)

In my last post, I showed two ways to perform string aggregation. Both of them used recursive SQL. I also mentioned that a built-in function to perform string aggregation would be preferred. Though, I have not found a direct function for string aggregation, one of the comments on my previous post suggested the use of XMLAGG function. In fact, Serge Rielau used the XMLAGG function for string aggregation in his presentation “SQL on Fire! Part 1”. Let’s look at the query using XMLAGG function to perform the string aggregation.


Listing 5


SELECT category,

SUBSTR(names, 1, LENGTH(names) -2) as list

FROM

(SELECT

category,

REPLACE

(REPLACE

(XMLSERIALIZE

(CONTENT XMLAGG(XMLELEMENT(NAME a, name)

ORDER BY name)

AS VARCHAR(50)), '<A>', ''),

'</A> ', ', ') AS names

FROM mytab1

GROUP BY category) AS X


CATEGORY LIST

----------- -------------------------------------------------

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

3 record(s) selected.



In the above query, first we are converting each name into an XML expression using XMLELEMENT function. Here, XMLELEMENT function will add the tags <A> and </A> around each name. After the aggregation, we get rid of these tags using REPLACE function. Though, XMLAGG function is also a work around for string aggregation, I would prefer it over recursive SQL because it’s easier to understand and in my test it had lower estimated cost. Ideally, what we want is a function similar to myConcat as shown below.

Listing 6


SELECT category,

myConcat(name, ‘, ’) as list

FROM mytab1

GROUP BY category


CATEGORY LIST

----------- -------------------------------------------------

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

3 record(s) selected.




Is that something in your wish list also?


Wednesday, February 27, 2008

SQL Tips & Techniques- String Aggregation

In my experience, application or database developers will have to deal with writing SQL queries more than a DBA (specially a System DBA). However, many a times DBAs are expected to help with writing complex queries. Keeping that in mind, I have decided to post SQL tips and techniques as I encounter them.

Recently, I had a requirement of performing string aggregation by concatenating them while grouping based on another column. For example, we have a table myTab1 with CATEGORY and NAME column values as shown below.

Listing 1

SELECT category, name

FROM myTab1

ORDER BY category


CATEGORY NAME

----------- ----------

1 abc

1 bcd

2 cde

2 def

2 efg

3 fgh

3 ghi

7 record(s) selected.

The expected result is to have one CATEGORY per row with all the NAME values for a CATEGORY concatenated with a comma and a space in between. So the result should look as shown below.

Listing 2

CATEGORY LIST

----------- ----------

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

This could have been so simple if there was an aggregate function (similar to SUM, AVG, COUNT) to perform this task. However, there is no built-in function for it. So either you can write a function of your own or you have to use recursive SQL. You may refer the article “User-defined Aggregate Functions in DB2 Universal Database”, if you want to write a function for it. Here, I will show you a query that uses recursive SQL

Listing 3

WITH

t1(rowNum, category, name) AS

( SELECT rownumber() over(partition by category),

category, name

FROM myTab1 ),

t2(category, list, cnt) AS

( SELECT category, VARCHAR(name, 50), 1

FROM t1

WHERE rowNum = 1

UNION ALL

SELECT t2.category, t2.list || ', ' || t1.name,

t2.cnt + 1

FROM t2, t1

WHERE t2.category = t1.category AND

t2.cnt + 1 = t1.rowNum )

SELECT category, list

FROM t2

WHERE ( category, cnt ) IN (

SELECT category,

MAX(rowNum)

FROM t1

GROUP BY category )

ORDER BY 1


CATEGORY LIST

----------- -------------------------------------------------

SQL0347W The recursive common table expression "RKUMAR.T2" may contain an infinite loop. SQLSTATE=01605

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

3 record(s) selected with 1 warning messages printed.

Notice that in the above query, the definition of table expression t2 refers itself and hence we call such queries recursive. If you first run the query used in the definition of table expression t1, then run the entire query with t1 and t2 but without the last WHERE clause, you can easily follow logic involved here.

DB2 version 9.5 also supports the CONNECT BY syntax of Oracle for recursive queries. You may check the article DB2 Viper 2 compatibility features for details on CONNECT BY syntax. We may write the query using CONNECT BY as shown below.

Listing 4

SELECT category, list

FROM

(SELECT category, level,

MAX(level) OVER(PARTITION BY category) as maxlevel,

VARCHAR(SUBSTR(SYS_CONNECT_BY_PATH(name,', '), 3),50)

as list

FROM

(SELECT category,

name,

ROW_NUMBER() OVER (PARTITION BY category

ORDER BY name) AS curr,

ROW_NUMBER() OVER (PARTITION BY category

ORDER BY name) -1 AS prev

FROM myTab1)

CONNECT BY prev = PRIOR curr and category = PRIOR category)

WHERE level = maxlevel

ORDER BY 1


CATEGORY LIST

----------- -------------------------------------------------

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

3 record(s) selected.


In addition to CONNECT BY, the above query also uses the function MAX as OLAP aggregation function. You may check DB2 SQL Reference manual or the article “OLAP Functions in DB2” to learn about OLAP functions.

Wednesday, February 06, 2008

Locking issues

Locking issues in a database often lead to finger-pointing between DBAs and Application developers. Application folks will complain that their application is performing poorly because of database locking issues. DBAs will respond that it’s the application design issue that’s causing the locks. In fact, the locking issues can be resolved by the combined effort of DBAs and Application folks.

DBAs have tools to diagnose the locking issue and explain to application folks the root cause. Then DBAs and Application folks can work together to analyze the transaction logic that’s causing the locking issue and make corrections if possible. Sometimes, the locking issues can be resolved or reduced by database level changes (e.g. changing maxlocks and locklist database parameters).

Now, let’s see what are the locking issues in DB2 and how can a DBA diagnose them. DB2 can have mainly three types of locking issues

· Deadlocks

· Lock escalation

· Lock wait and lock timeout

As a DBA, you should periodically check the occurrences of above locking issues through Database snapshot and make sure that there are not many occurrences of these locking issues. I will briefly describe here each of the three locking issues and also provide pointers for further details.

Deadlocks

A deadlock is created when two applications are each locking data needed by the other, resulting in a situation when neither application can continue execution. DB2 deadlock detector is responsible for handling deadlocks. When a deadlock is detected, the deadlock detector will choose a victim that will be automatically rolled back and issued an SQL0911 error message with a return code of 2. By rolling back the victim, the lock conflict is removed and the application can continue processing. Database parameter ‘dlchktime’ controls the time interval for checking deadlocks. DB2 also creates an event monitor called db2detaildeadlock by default to capture the relevant info about the deadlock which can be used to identify the root cause of the deadlock. Check the link “Deadlocks” at Information Center for more details.

Lock escalation

Lock escalation occurs when database manager will attempt to free up memory allocated to locking by obtaining table locks and releasing existing row locks. The desired effect is to make more lock memory available for additional applications. The following two database parameters have a direct effect on lock escalation:

· locklist- amount of storage (in units of 4k pages) that is allocated to hold the list of all locks held by all applications concurrently connected to the database.

· maxlocks- the allowable percentage of locklist that can be used by an application.

Both the above database parameters are automatic configurable by default in version 9. Though the above two database parameters influence the lock escalation, we should not always change them to resolve the lock escalation. Check the link “Correcting lock escalation problems” at DB2 Information Center for a step wise approach to resolve the lock escalation issue.

Lock wait and lock timeout

Finally, what’s lock wait and lock timeout? An application, that makes a request for a lock that is not compatible with the existing locks on the object, will have to wait to acquire the lock and this phenomenon is called lock wait. There is a database parameter called locktimeout that determines how long the application in lock wait state should continue waiting before it times out. If the timeout period is exceeded, the waiting application receives an SQL0911 error message with a return code 68 and the application’s unit of work (transaction) is rolled back by the database manager.

Note that the default value of locktimeout parameter is -1, which causes the lock timeout to be disabled. That is, an application that is waiting for a lock will continue to wait indefinitely or until the lock is released. To avoid such a scenario, it’s recommended to set a positive value for locktimeout (specially in transactional environment). In transactional environment, the recommended starting value for locktimeout is 30 seconds. However, tuning may be necessary in order to find the more appropriate value for this parameter.

You can find out the current lock waits in database using lock wait snapshot. Check the link SNAP_GET_LOCKWAIT table function at DB2 Information Center for more details about lock wait snapshot.

Diagnosing the lock timeout is little involved. Through the snapshot monitors, you would not have the details of applications involved in lock contention that resulted in the lock timeout. Prior to version 9, you could capture such details using event monitors. However, the event monitors to capture the transactions and statements may cause considerable performance impact and hence that might not be an option in your environment. DB2 v9.1 FP4 introduced lock timeout reporting feature to help the diagnosis of lock timeouts. There is a registry variable called DB2_CAPTURE_LOCKTIMEOUT, which controls the lock timeout reporting function. When this registry variable is set to ON, it captures basic information about each lock timeout that happens within the DB2 instance and a lock timeout report is created under the instance diagnostic path (diagpath). Check the following links at DB2 Information Center for more details.

· Lock timeout reporting

· Lock timeout report files

Hope the above info will help to reduce locking issues in your environment. As mentioned in the beginning, DBAs can help to diagnose the locking issues. However in most of the cases, the locking issue has to be resolved from application side (correcting the application logic, using the proper isolation level etc.).