Sunday, November 30, 2008

SQL Tips & Techniques- Correlated Reference

Sometimes even a simple SQL query may become challenging if we don’t approach it with a right technique. Few months ago I came across one such SQL query problem that I am going to share in this post. I was writing a database monitoring routine using snapshot table function. For simplicity, let’s consider the following query:

Listing 1

The above query will give the log used and available on partition 0, because the second argument is 0 in the snapshot_database table function. I had to collect the above info on each partition and using db2_all was not an option because the SQL was supposed to run through a JDBC connection. My first instinct was to find the list of partitions using following query and then run the query in Listing 1 in a loop for each partition

Listing 2


However, I wanted to have a single SQL query to get the complete result as shown below:

Listing 3

The problem appeared to be easy in the beginning, but it started becoming difficult as I thought over it and could not come up with the SQL query. Then I thought of checking the complete syntax of DB2 SQL query in DB2 SQL Reference to make sure I am not missing any syntax that can help with this problem. And I was indeed missing something. It was Correlated Reference, which can also be used in arguments of table function.

Most likely you might already be familiar with the Correlated Subquery and Correlation Name. Even if these terms do not sound familiar, I am sure you must have used them in your SQL. Let’s look at an example to make it clear


Listing 4

In the above query, notice the use of x.workdept inside the subquery. The employee table in the FROM clause of parent query has correlation name x. Though x is not present in the FROM clause of subquery, still x.workdept is used in the WHERE clause of subquery. This is called correlated subquery. In the above example, both the parent and subquery has employee table. However, subquery may have completely different tables and still it can use the columns from the parent table. Correlated subquery can also appear in SELECT and HAVING clause as shown below

Listing 5


Correlated reference can also be used in nested table expression as shown below

Listing 6

Above example has a nested table expression empinfo. Notice the use of d.deptno in the query of nested table expression empinfo.

As I mentioned earlier, correlated reference can also be used in the arguments of a table function and this helped me to get the solution of my SQL problem. The following SQL will give the result I wanted

Listing 7

In the above query, x.partition_number is used in the argument of snapshot_database table function. Changing the order of db_partitions and snapshot_database table functions in the FROM clause of above query will result in error because table references in a FROM clause are processed from left to right and x.partition_number must be resolved before it’s used in the argument of snapshot_database table function. The same rule applies with correlated subquery as well. The correlated reference must be from a table reference at a higher level in the hierarchy of subqueries.

Tuesday, September 30, 2008

Don’t Ignore Naming Convention

Recently I worked with an organization having hundreds of DB2 databases and still a uniform and consistent naming convention across all these databases. Naming convention started from Server Name, Instance Name and Database Name itself. Just by looking at an Instance Name, you can tell whether it’s a production, development or QA instance and the associated application for that instance. Instance home directory, database path, log path, archive log path, database backup path ..., all followed a consistent naming convention.

In the above mentioned environment, if a file system is getting full on any of the DB2 database server, you immediately know if that file system is database related and which instance/database is using that file system. If a new DBA joins the team, he can quickly get familiar with the large environment just by learning the conventions. If a database has to be backed up, he knows the backup target and if a database has to be recovered, he knows where to find the backup image and archive logs, all because of consistent naming convention.

A good naming convention not only helps a new DBA quickly come up to speed, but it also simplifies automation of DBA tasks. Many of the decision points in automation can get benefitted by the established convention. Automation platform like Data Palette has the capability to capture all the variations and weirdness in your database environment and still enable the decision making. However, more variations often lead to complexity in implementing and managing the automation.

Just having any naming convention is not enough, you should have a good naming convention. I have described below some of the qualities of a good naming convention

1. Make your naming convention informative
You can name your instances like db2ins01, db2ins02 and so on. But such naming convention does not have much information except that it is a DB2 instance. Instead if your instance name can tell you whether it’s production, development or QA instance, then you get more information out of that naming convention. Also, if your instance name includes INST or your database name includes DB, it does not provide you any additional info if you already know that you are looking at an instance name or database name.

Similarly the directory structure of your instance, database and table space level paths should incorporate the corresponding instance name, database partition number, database name hierarchy. By looking at instance/database configuration and system catalog, you can figure out most of the associated paths. However, the reverse mapping i.e. you know the path and trying to figure out who is using that path is straightforward only if you have an appropriate naming convention.

Whenever a name is derived from some other base names, do not abbreviate or shorten the base names while using into derived names. For example, if your table space container path includes a directory with your database name, this directory name should have a complete database name and not abbreviated or short form. If you use the abbreviated or short name, your path name becomes less informative because now you need additional data (the mapping between full name and short name) to determine if that directory name corresponds to the database name.


2. Use fixed-width numbers in your naming convention
If your naming convention includes a sequence number, have a fixed width sequence number. For example, if you know that you will need max four digits for a sequence number in your naming convention, use 0001 instead of just 1. Also, it’s better to have your database partition numbers (in db2nodes.cfg) as fixed width. For example, if you have 16 partitions in your database from 0 to 15, have your partitions named as 00, 01, 02 and so on. Such partition numbers has a distinct advantage while using “##” in your db2_all command. The character sequence “##” (two hash characters) can be used to get replaced by partition number while running a command through db2_all. So if you use NODE00## in your db2_all command, it can get replaced as NODE0000, NODE0001...NODE0015 in the above example of a 16 partition database.

No matter how good a naming convention is, if it’s not followed consistently, it’s not useful. I started this post with an ideal example of consistency in naming convention. On the other extreme, I have also come across organizations where DBA’s come up with a new naming convention whenever they set up a new database. To make it even worse, within a database also the naming convention would start changing over a period of time...it all depends on the mood of DBA while coming up with the name.

In my opinion, naming conventions should be properly defined as a planned effort, it should be well documented and strictly enforced by change control board or a similar regulation authority. Though, I fully agree that there are several scenarios where we can’t expect to have a uniform naming convention across a larger organization. For example, let’s imagine a company being acquired by another company and as a result the IT department of these companies getting merged into a single organization. However, naming conventions should not get destroyed just because of ignorance or negligence of individuals in an organization.

XML Superstar Contest

Did you hear about “Search for the XML Superstar” contest? If not yet, check out the details of this contest below (as provided by IBM). If you or your friend or colleague is a XML guru, here is a chance to win great prizes.

The International DB2 User Group (IDUG) is proud to announce the launch of the “Search for the XML Superstar” contest in the United States. This worldwide initiative is aimed at highlighting the benefits of DB2 and related technologies to database developers and university students.

The “Search for the XML Superstar” has five categories:
  1. XML application programming
  2. Ported applications
  3. Database gadgets
  4. Videos
  5. Query challenge
Contestants will participate in a combination of education, exposure to practical applications and “tests” that will validate understanding and ability to program IBM database technologies. Entries will be judged by an independent panel of DB2 professionals.

Participants in the contest will be eligible to win one of a series of great prizes. Those who submit star-quality entries could receive an all-expense paid trip to an IDUG conference, a new laptop, an iPod, a Sony PlayStation 3 or a Nintendo Wii system. The contest will also allow students to win opportunities for interviews and lab visits, and will enhance the skills that catch employers’ eyes on a resume. Developers will be able to gain recognition from world-class consultants and be celebrated for standing out among their peers.

The contest launches in U.S. tomorrow. There will be a IBM press release coming out tomorrow, and the website (xmlchallenge.com), will launch then, as well. Also, check out the IBM DB2 Query Console (beta), an application hosted in the cloud that allows you to run SQL and XQuery statements against a DB2 database. Good luck for the contest!

Wednesday, August 20, 2008

SSV Backup

As I mentioned in my previous post DB2 Partitioning, Database partitioning feature (DPF) of DB2 enables DB2 to provide a very scalable database solution. The applications connecting to database do not have to be aware of database partitioning. However, the DBAs managing the multi-partition database have to deal with some complexities. One of the administration complexities of multi-partition database is Backup and Restore. Each database partition has to be backed up separately. Also, the catalog partition has to be backed up before any other partition. The db2_all command is used in such scenarios to run a command against one or more than one database partitions as shown in the example below. Following example shows the offline backup of a multi-partition database “test” with catalog partition 0.


db2_all ‘<<+0< db2 BACKUP DATABASE test to /backup_path’

db2_all ‘|<<-0< db2 BACKUP DATABASE test to /backup_path’


The first line in the above example will backup the partition 0 and the second line will backup the remaining partitions in parallel. With the above approach, the backup timestamp of all the partitions would not be the same. Hence, we have to separately identify the backup images for each partition while restoring the database.


In DB2 9.5, you don’t have to deal with the above mentioned complexity because of Single system view (SSV) backup. SSV backup allows to backup some or all the database partitions using ON DBPARTITIONNUMS or ON ALL DBPARTITIONNUMS clause of BACKUP DATABASE command. The backup images created by SSV backup have the same timestamp for each database partition. The SSV backup has to be run from catalog partition. In the following example, all the database partitions are backed up using SSV backup. The backup image of all the database partitions has the same timestamp. Hence, all the database partitions could be restored with a single command using db2_all.



db2 BACKUP DB test ON ALL DBPARTITIONNUMS ONLINE INCLUDE LOGS

db2_all “db2 RESTORE DB test TAKEN AT 2008-08-17-23.44.56.125237“

db2 ROLLFORWARD DB test TO END OF BACKUP ON ALL DBPARTITIONNUMS



Some more good news... In the above example, notice that the backup image would include the logs. Prior to version 9.5, logs can be included with the online backup image for single partition databases only. With SSV backup, it’s possible to include the logs with online backup image for multi-partition databases also. Finally, notice the use of TO END OF BACKUP clause with the ROLLFORWARD command. It really simplifies to rollforward all the database partitions to the minimum recovery time and saves the effort to determine minimum recovery time over all the database partitions.

New Poll- Backup Target

Thanks everyone for participating in my previous poll regarding DB2 autonomic features. Here is the final poll result


Self-tuning memory manager – 14 (51%)

Automatic storage – 11 (40%)

Automatic maintenance – 8 (29%)

Configuration advisor – 7 (25%)

Health monitor – 9 (33%)

Other – 1 (3%)

None – 6 (22%)


Interestingly, the use of the above features is almost in the same order as their positions in the listing. It’s evident that the top two autonomic features, self-tuning memory manager and automatic storage are the most popular autonomic features as per the poll result.


I have added a new poll ... No, this is not about Democratic vs. Republican J This is related to your database backup target. Please indicate the backup target as you specify in your backup command. For example, if your backup target is TSM, choose TSM only even though TSM will store the backup image on tape. If you select “Other”, it would be nice if you can explicitly mention the exact target through a comment to this post. I hope there will be more participation this time and thus the poll result will be more realistic. So, please don’t forget to provide your input in this poll !!!

Tuesday, July 08, 2008

A Peek on DB2 Log Record

Every DB2 database has recovery log. Recovery log keeps track of all the changes made to a database and it is used for crash recovery or rollforward recovery. The recovery log is made up of a number of log extents, each contained in a separate file called a log file. Normally, a DBA will have to deal with only the sizing, archive method and archive destination of recovery logs. However, in some scenarios (e.g. SQL replication, HADR) you may come across a term called Log Sequence Number (LSN). In order to understand the LSN, we have to go one step further i.e. inside the log file. Here I will explain what is LSN and how can we map a LSN with a log file and vice versa.


A log file has number of log records inside it. A log sequence number (LSN) uniquely identifies a log record inside a log file. The LSN represents a relative byte address for the first byte of the log record within the recovery log. A transaction can generate more than one log records. Each log record also has a transaction identifier. Hence, the log records written by a single transaction have a common transaction identifier, a field in log record header.


If you have a LSN and you need to find out the log file that has the log record for that LSN, you may use db2flsn command.


db2flsn –q <input_LSN>


If you need to do the reverse i.e. find out the range of LSN and timestamp inside a log file, it becomes little tricky. There is a file DB2TSCHNG.HIS under database path, which contains the history of table space changes at a log file level. You can find out database path from database snapshot.


db2 get snapshot for db on <DB Name> | grep “Database path”


For each log file, DB2TSCHG.HIS contains information that helps to identify which table spaces are affected by the log file. Though the primary purpose of the file DB2TSCHNG.HIS is to track the table space changes, this file also has the information regarding range of LSN and timestamp within a log file. This information can be displayed using db2logsForRfwd command as shown below


$ db2logsForRfwd DB2TSCHG.HIS –all

----------------------------------------------------------

Log file number : S0000000.LOG

First New Record LSN : 00000138800C

Log Extent Tail LSN : 000001770000

Backup End Time Stamp : 0

Flags : 107

pID : 1209781597 2008-05-03-02.26.37.000000 GMT

cID : 1215376321 2008-07-06-20.32.01.000000 GMT

Tablespace ID :

----------------------------------------------------------

Log file number : S0000001.LOG

First New Record LSN : 00000177000C

Log Extent Tail LSN : 000001B58000

Backup End Time Stamp : 0

Flags : 107

pID : 1215376321 2008-07-06-20.32.01.000000 GMT

cID : 1215376322 2008-07-06-20.32.02.000000 GMT

Tablespace ID :

----------------------------------------------------------

Log file number : S0000002.LOG

First New Record LSN : 000001B5800C

Log Extent Tail LSN : 000001F40000

Backup End Time Stamp : 0

Flags : 107

pID : 1215376322 2008-07-06-20.32.02.000000 GMT

cID : 1215376323 2008-07-06-20.32.03.000000 GMT

Tablespace ID : 0 2


If the DB2TSCHNG.HIS file is not in current path, you should provide the fully qualified path of this file to db2logsForRfwd command. However, note the following points regarding DB2TSCHNG.HIS file.

· This history file is available only for databases with archive logging enabled.

· This history file will have an entry for only the archived log files and not the active log files.

· The PRUNE HISTORY command prunes this history file also.

· When the registry variable DB2_COLLECT_TS_REC_INFO is off, history records will not be written to file DB2TSCHNG.HIS.


In the above output, “First New Record LSN” and “Log Extent Tail LSN” fields provide the range of LSN in a log file. The range of timestamp is available in pID and cID fields (second part). The pID and cID stands for previous and current LogID. Have you seen the reference of LogID anywhere else? The header record of a backup image also includes LogID. You can display the information in a backup image header using db2ckbkp command as shown below.


db2ckbkp –h <backup image filename>


You might be curious about the significance of LogID and so am I. But at this point, I don’t know any more about LogID.


If you want to dive deeper into log records, check the db2ReadLog API. You may also check the DB2 add-on tool Recovery Expert, if you are looking for a tool for log analysis.

Sunday, June 29, 2008

Table Space Auto-resize in Real DB2 World

Past few months my personal time with DB2 community was encroached by excessive customer engagements. Hence, no blog post for a long time. I am afraid that it may continue like this for a while. However, these customer engagements did bring me in direct contact with many DBAs in the DB2 community and provided opportunity to learn the problems/issues in their DB2 environment.


My customer engagements were primarily focused on standardizing and automating several DBA tasks using Stratavia’s Data Palette (a software platform for Data Center automation). It initially surprised me that in many of the customer environments, Table Space Size Management is in their top priority list of automation. But why not use the DB2’s in-built Table Space auto-resize feature? One reason could be the version of DB2 older than v8.2.2 (Table Space auto-resize was introduced in v8.2.2). However, this was not the reason. Here I will present a gap analysis between auto-resize feature and the actual requirements in the customer environments I was engaged with.


First of all, let’s consider the typical process of table space size management. It can be a planned effort (as a result of monthly or annual table space capacity planning) or it can be ad hoc (increase the size of table spaces that are about to get 100% full). Since the auto-resize feature deals with the second scenario, I will focus on that part only. Normally, DBA’s have some form of monitoring for table space size. When the size enters into alert zone (percent used above a threshold), we plan on increasing the size. This is a proactive approach. Compare it with the scenario where table space will be increased only when it does not have enough space to satisfy a DB request (almost 100% full). That’s how Auto-resize feature works. What if there is no space available on file system for extending the table spaces or auto-resize has already reached the max size? Whatever be the reason, if the table space could not be extended by auto-resize feature, it may affect the business processes dealing with that table space. Hence a proactive approach is essential for managing the table space size.


Secondly, there are environment specific custom requirements associated with increasing the size of table spaces. For example, send a notification whenever the table space is extended or extending the table space size should not cause the associated file system more than 95% full. Such requirements can not be fulfilled by the auto-resize feature.


The third point is little subtle but very important for big organizations having hundreds and thousands of databases. Imagine the admin effort required for managing auto-resize on table spaces under all these databases. Such organizations need a central system to apply and manage automation rather the local solutions like auto-resize feature.


Let me end this posting with a clarification that I have no intention to denounce the DB2’s table space auto-resize feature. In fact, it might be used in many environments and many of you might be finding it useful. However in my experience, organizations require more comprehensive automation solution for managing table space size than the table space auto-resize feature of DB2. I would like to hear your experience with table space auto-resize feature in case you are using it in your environment.

Saturday, March 22, 2008

Mapping Application to Dynamic SQL

If you need to find out dynamic SQL statements in the database package cache, you can do so using Dynamic SQL Snapshot. If you need to check the details of your application, you can use Application snapshot. If you are accessing application snapshot through table function, you have to use Statement snapshot to find the executing statement by that application. However, as soon as the statement execution is finished, you would not find that statement text in the statement snapshot. Also, even though dynamic SQL snapshot will show you this statement, it does not have any info regarding the application that executed it. Hence, if you are trying to map an application with a dynamic SQL statement, the snapshot table functions mentioned above may not be very useful.

I had to suffer with this limitation of DB2 for quite some time. Interestingly, other competitive vendor databases (Oracle & SQL Server) provided the link between SQL in Cache and their sessions. However, I recently found that in version 9, we have the mapping info between application and dynamic SQL statements available through db2pd (DB2 Problem Diagnostic) tool. In fact, this db2pd enhancement was made in some fixpack after v8.2. Under this enhancement, the “applications” option of db2pd provides you following new columns

· C-AnchID : Current anchor ID

· C-StmtUID : Current statement unique ID

· L-AnchID : Last anchor ID

· L-StmtUID : Last statement unique ID

The first pair of Anchor ID and Statement Unique ID can be used to find the current SQL statement being executed by the application and the second pair can be used to find the last SQL statement executed by that application. The “dynamic” option of db2pd includes following columns to map with the Anchor ID and Statement Unique ID pair.

· AnchID : Anchor ID of the statement

· StmtUID : Statement unique ID

You may check the “Scenario 2” under Monitoring and troubleshooting using db2pd at DB2 Information Center to look at an example of mapping application to a dynamic SQL statement. You may also run the following command and verify it yourself.

db2pd –appl –dyn –db

However, the problem with db2pd is that you can’t run it through SQL. Hence, if you want to collect the above info on a regular basis and store in a table as part of your database monitoring, you need to parse the db2pd output. I wish if the above AnchID and StmtUID info could be included in Application and Dynamic SQL snapshots, so that it can be accessed through snapshot table functions using SQL. At least now we have the mapping available through db2pd, which is much better than no mapping available earlier.

Friday, February 29, 2008

SQL Tips & Techniques- String Aggregation(Contd.)

In my last post, I showed two ways to perform string aggregation. Both of them used recursive SQL. I also mentioned that a built-in function to perform string aggregation would be preferred. Though, I have not found a direct function for string aggregation, one of the comments on my previous post suggested the use of XMLAGG function. In fact, Serge Rielau used the XMLAGG function for string aggregation in his presentation “SQL on Fire! Part 1”. Let’s look at the query using XMLAGG function to perform the string aggregation.


Listing 5


SELECT category,

SUBSTR(names, 1, LENGTH(names) -2) as list

FROM

(SELECT

category,

REPLACE

(REPLACE

(XMLSERIALIZE

(CONTENT XMLAGG(XMLELEMENT(NAME a, name)

ORDER BY name)

AS VARCHAR(50)), '<A>', ''),

'</A> ', ', ') AS names

FROM mytab1

GROUP BY category) AS X


CATEGORY LIST

----------- -------------------------------------------------

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

3 record(s) selected.



In the above query, first we are converting each name into an XML expression using XMLELEMENT function. Here, XMLELEMENT function will add the tags <A> and </A> around each name. After the aggregation, we get rid of these tags using REPLACE function. Though, XMLAGG function is also a work around for string aggregation, I would prefer it over recursive SQL because it’s easier to understand and in my test it had lower estimated cost. Ideally, what we want is a function similar to myConcat as shown below.

Listing 6


SELECT category,

myConcat(name, ‘, ’) as list

FROM mytab1

GROUP BY category


CATEGORY LIST

----------- -------------------------------------------------

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

3 record(s) selected.




Is that something in your wish list also?


Wednesday, February 27, 2008

SQL Tips & Techniques- String Aggregation

In my experience, application or database developers will have to deal with writing SQL queries more than a DBA (specially a System DBA). However, many a times DBAs are expected to help with writing complex queries. Keeping that in mind, I have decided to post SQL tips and techniques as I encounter them.

Recently, I had a requirement of performing string aggregation by concatenating them while grouping based on another column. For example, we have a table myTab1 with CATEGORY and NAME column values as shown below.

Listing 1

SELECT category, name

FROM myTab1

ORDER BY category


CATEGORY NAME

----------- ----------

1 abc

1 bcd

2 cde

2 def

2 efg

3 fgh

3 ghi

7 record(s) selected.

The expected result is to have one CATEGORY per row with all the NAME values for a CATEGORY concatenated with a comma and a space in between. So the result should look as shown below.

Listing 2

CATEGORY LIST

----------- ----------

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

This could have been so simple if there was an aggregate function (similar to SUM, AVG, COUNT) to perform this task. However, there is no built-in function for it. So either you can write a function of your own or you have to use recursive SQL. You may refer the article “User-defined Aggregate Functions in DB2 Universal Database”, if you want to write a function for it. Here, I will show you a query that uses recursive SQL

Listing 3

WITH

t1(rowNum, category, name) AS

( SELECT rownumber() over(partition by category),

category, name

FROM myTab1 ),

t2(category, list, cnt) AS

( SELECT category, VARCHAR(name, 50), 1

FROM t1

WHERE rowNum = 1

UNION ALL

SELECT t2.category, t2.list || ', ' || t1.name,

t2.cnt + 1

FROM t2, t1

WHERE t2.category = t1.category AND

t2.cnt + 1 = t1.rowNum )

SELECT category, list

FROM t2

WHERE ( category, cnt ) IN (

SELECT category,

MAX(rowNum)

FROM t1

GROUP BY category )

ORDER BY 1


CATEGORY LIST

----------- -------------------------------------------------

SQL0347W The recursive common table expression "RKUMAR.T2" may contain an infinite loop. SQLSTATE=01605

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

3 record(s) selected with 1 warning messages printed.

Notice that in the above query, the definition of table expression t2 refers itself and hence we call such queries recursive. If you first run the query used in the definition of table expression t1, then run the entire query with t1 and t2 but without the last WHERE clause, you can easily follow logic involved here.

DB2 version 9.5 also supports the CONNECT BY syntax of Oracle for recursive queries. You may check the article DB2 Viper 2 compatibility features for details on CONNECT BY syntax. We may write the query using CONNECT BY as shown below.

Listing 4

SELECT category, list

FROM

(SELECT category, level,

MAX(level) OVER(PARTITION BY category) as maxlevel,

VARCHAR(SUBSTR(SYS_CONNECT_BY_PATH(name,', '), 3),50)

as list

FROM

(SELECT category,

name,

ROW_NUMBER() OVER (PARTITION BY category

ORDER BY name) AS curr,

ROW_NUMBER() OVER (PARTITION BY category

ORDER BY name) -1 AS prev

FROM myTab1)

CONNECT BY prev = PRIOR curr and category = PRIOR category)

WHERE level = maxlevel

ORDER BY 1


CATEGORY LIST

----------- -------------------------------------------------

1 abc, bcd

2 cde, def, efg

3 fgh, ghi

3 record(s) selected.


In addition to CONNECT BY, the above query also uses the function MAX as OLAP aggregation function. You may check DB2 SQL Reference manual or the article “OLAP Functions in DB2” to learn about OLAP functions.