Wednesday, February 06, 2008

Locking issues

Locking issues in a database often lead to finger-pointing between DBAs and Application developers. Application folks will complain that their application is performing poorly because of database locking issues. DBAs will respond that it’s the application design issue that’s causing the locks. In fact, the locking issues can be resolved by the combined effort of DBAs and Application folks.

DBAs have tools to diagnose the locking issue and explain to application folks the root cause. Then DBAs and Application folks can work together to analyze the transaction logic that’s causing the locking issue and make corrections if possible. Sometimes, the locking issues can be resolved or reduced by database level changes (e.g. changing maxlocks and locklist database parameters).

Now, let’s see what are the locking issues in DB2 and how can a DBA diagnose them. DB2 can have mainly three types of locking issues

· Deadlocks

· Lock escalation

· Lock wait and lock timeout

As a DBA, you should periodically check the occurrences of above locking issues through Database snapshot and make sure that there are not many occurrences of these locking issues. I will briefly describe here each of the three locking issues and also provide pointers for further details.

Deadlocks

A deadlock is created when two applications are each locking data needed by the other, resulting in a situation when neither application can continue execution. DB2 deadlock detector is responsible for handling deadlocks. When a deadlock is detected, the deadlock detector will choose a victim that will be automatically rolled back and issued an SQL0911 error message with a return code of 2. By rolling back the victim, the lock conflict is removed and the application can continue processing. Database parameter ‘dlchktime’ controls the time interval for checking deadlocks. DB2 also creates an event monitor called db2detaildeadlock by default to capture the relevant info about the deadlock which can be used to identify the root cause of the deadlock. Check the link “Deadlocks” at Information Center for more details.

Lock escalation

Lock escalation occurs when database manager will attempt to free up memory allocated to locking by obtaining table locks and releasing existing row locks. The desired effect is to make more lock memory available for additional applications. The following two database parameters have a direct effect on lock escalation:

· locklist- amount of storage (in units of 4k pages) that is allocated to hold the list of all locks held by all applications concurrently connected to the database.

· maxlocks- the allowable percentage of locklist that can be used by an application.

Both the above database parameters are automatic configurable by default in version 9. Though the above two database parameters influence the lock escalation, we should not always change them to resolve the lock escalation. Check the link “Correcting lock escalation problems” at DB2 Information Center for a step wise approach to resolve the lock escalation issue.

Lock wait and lock timeout

Finally, what’s lock wait and lock timeout? An application, that makes a request for a lock that is not compatible with the existing locks on the object, will have to wait to acquire the lock and this phenomenon is called lock wait. There is a database parameter called locktimeout that determines how long the application in lock wait state should continue waiting before it times out. If the timeout period is exceeded, the waiting application receives an SQL0911 error message with a return code 68 and the application’s unit of work (transaction) is rolled back by the database manager.

Note that the default value of locktimeout parameter is -1, which causes the lock timeout to be disabled. That is, an application that is waiting for a lock will continue to wait indefinitely or until the lock is released. To avoid such a scenario, it’s recommended to set a positive value for locktimeout (specially in transactional environment). In transactional environment, the recommended starting value for locktimeout is 30 seconds. However, tuning may be necessary in order to find the more appropriate value for this parameter.

You can find out the current lock waits in database using lock wait snapshot. Check the link SNAP_GET_LOCKWAIT table function at DB2 Information Center for more details about lock wait snapshot.

Diagnosing the lock timeout is little involved. Through the snapshot monitors, you would not have the details of applications involved in lock contention that resulted in the lock timeout. Prior to version 9, you could capture such details using event monitors. However, the event monitors to capture the transactions and statements may cause considerable performance impact and hence that might not be an option in your environment. DB2 v9.1 FP4 introduced lock timeout reporting feature to help the diagnosis of lock timeouts. There is a registry variable called DB2_CAPTURE_LOCKTIMEOUT, which controls the lock timeout reporting function. When this registry variable is set to ON, it captures basic information about each lock timeout that happens within the DB2 instance and a lock timeout report is created under the instance diagnostic path (diagpath). Check the following links at DB2 Information Center for more details.

· Lock timeout reporting

· Lock timeout report files

Hope the above info will help to reduce locking issues in your environment. As mentioned in the beginning, DBAs can help to diagnose the locking issues. However in most of the cases, the locking issue has to be resolved from application side (correcting the application logic, using the proper isolation level etc.).

9 comments:

Ram said...

Nice post Radhesh, useful info. It may also help if there is some guidance on how to set the relative values of locktimeout and dlchktime with the pros and cons of setting one higher than the other?

Radhesh Kumar said...

Good point Ram. I will elaborate on dlchktime vs locktimeout here.

Basically the two parameters deal with slightly different scenario. The deadlock check time (dlchktime) parameter deals with deadlocks. When deadlock occurs, it has to be resolved (by third party), otherwise it will continue to exist forever. Hence, as soon as a deadlock is detected, the deadlock detector will attempt to resolve it. There is no waiting time involved while trying to resolve deadlock. The dlchktime parameter decides how frequently deadlock detector should check for deadlocks. The default value of this parameter is 10 seconds, which is good in most of the cases. Checking for deadlocks too frequently, will cause more work for DB2 and hence more resource usage. On the other hand, if you check it infrequently and if you have deadlocks in your environment, the longer it takes to detect the deadlock, the more you are wasting the resource associated with deadlocked transactions.

The locktimeout parameter determines how long an application should continue to be in lock-wait state, before it times out. While application is in lock-wait state and before it times out, there are chances that waiting application will get hold of required lock and lock-wait will automatically be resolved. So you have to consider the following factors while setting the locktimeout parameter.
• How much delay caused by lock-wait is acceptable in the response time?
• From user’s perspective, whether it’s preferable to wait longer rather than getting the request timeout error or vice versa?
• Whether you need to reduce lock-wait time because lock-wait applications are consuming considerable resource?

However, if there are too many lock waits, you should focus on identifying the root cause of it and then resolving it. In an OLTP environment, the recommended starting value for locktimeout parameter is 30 seconds. You may have to tune it in your specific environment. For query-only environment, you could start with a higher value. You should also look at percentage of lock timeouts out of total lock waits in your database (through database snapshot) while tuning locktimeout parameter. If most of the lock waits are getting timed out, it may indicate that locktimeout parameter is set too low.

Normally, the dlchktime parameter should be set less than the locktimeout parameter. If you set your locktimeout parameter less than dlchktime parameter, the waiting locks in deadlock scenario may timeout even before it gets detected by deadlock detector.

Anonymous said...

Ramesh very good post.
I have a question: do you know in detail what is placed in the Lock share memory space when a Row Lock is there? Is like a pointer to the Row or a row identification whith the Lock Type that have?

Anonymous said...

Radhesh very good post.
Thanks a lot..

Just want to know is there any good way to monitor the DB2 snapshot in production environment other than snapshot monitoring or event monitoring. what about using db2pd to capture the deadlock information as soon as it happens in the application.

-- S. D. Pawar.

Radhesh Kumar said...

Hi Pawar- If I understand correctly, you would like to capture relevant info when a deadlock occurs. The most common way to do this is using db2detaildeadlock as I mentioned in my original post. This event monitor does not have much performance impact and can be used in production environment.

However, If you don't want to use this event monitor, you can explore "db2pdcfg -catch" option. Check the following link to get more info on db2pdcfg
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=/com.ibm.db2.luw.admin.cmd.doc/doc/r0023252.html

Anonymous said...

Thanks for taking the time to discuss this, I feel strongly about it and love learning extra on this topic. If doable, as you gain experience, would you mind updating your blog with further information? It is extremely helpful for me.

Anonymous said...

Many thanks for posting this, It?s just what I used to be researching for on bing. I?d lots relatively hear opinions from a person, barely than a company web page, that?s why I like blogs so significantly. Many thanks!

Anonymous said...

Good information!!!

I came across a lock timeout situation where 2 of my select statements are holding U-update locks. Is it possible that a select can trigger a update on the tables?

Also normally what recommendations do you have for a lock timeout situation?

Thanks
Satya

Anonymous said...

I am currently having lock wait problem which I am unable to diagnose I have one table apprve_no which contains dept and a no. second table is of transaction.

I am using select update query to get next update no from approve_no table and then inserting a row with that no in tran table. These 2 i am doing in a transcation at front end. there are around 26 depts in approve_no table but this lock tiome out is coming for only R dept. How to diagnose this problem. Please help. No deadlocks no large lockwaits on DB.