HomeSearchSite MapContact Us RemoteDBA Services for CODASYL DBMS and RdbPreserving Mission Critical IT Applications Through Virtualization

Remote Management of OpenVMS SystemsExtreme Performance Consulting


DATABASE
ADMINISTRATION

Duplicates Slowing You Down? (DBMS only)
TECHNICAL ARTICLES -> PERFORMANCE ARTICLES
Back ] Next ]

Are you suffering from database performance problems, but don’t understand why? Performance problems may be caused by many factors from over utilization to poor application design, or even inefficient algorithms used by DBMS itself. In this article, we will look at one such problem where DBMS itself may be contributing to performance problems – and discuss possible workarounds.

 

Large numbers of duplicates in sorted sets can cause severe performance problems when inserting new records into the set. This problem occurs when a record with a new unique key value is inserted with a value that sorts just after a key value that has many duplicates. In this case, DBMS needs to insert the new entry between the two key values – at the end of the long list of duplicates, and immediately before the next key value.

 

Assume we were to insert a “B” record into the following list: AAAAAAADD

 

 

DBMS uses the index structure to quickly determine that “A” is the last key value that is less than the value to be inserted, “B”. DBMS then uses this index structure to locate the first “A” record. From there, it simply walks the chain set until it finds a record with a key value greater than “A” – each duplicate value may result in an I/O. Because the index structure itself needs to be updated, DBMS maintains an exclusive lock on the index – causing other processes needing this index to stall during this insertion.

 

Once the “B” record in inserted, the set now looks as follows: AAAAAAABDD

 

Adding a record with a value of “C” will be efficient, because the preceding key value, “B” is unique – thus, DBMS will not have to walk a long list of duplicates to insert the new entry. Adding records that match existing keys is efficient – DBMS simply uses the index structure to locate the existing entry and then adds the new record to the beginning of the list and the index is updated accordingly. Thus, in the above example adding additional “A” records will be efficient. This performance problem only surfaces when inserting new key values that sort immediately following a key value that has many duplicates – in this case, DBMS will walk each duplicate member to find the location in which the new record should be inserted – possibly resulting in an I/O for each duplicate!

 

How to diagnose this problem

How do you know if you are suffering from this problem? First, you will want to get a list of all sorted sets in your database where duplicates are allowed. This information is available from the schema ($ DBO/DUMP/SCHEMA <root-file>). For each sorted set, you will want to know which sets have many duplicates. Extracting the data from the database and counting the number of duplicates for each key value can do this. DBO/UNLOAD is a great tool for this. Applications, such as WorkStream, which have built in performance-monitoring tools can also be extremely helpful – look for “anomalies” in the time and I/O required to perform DML STORE, CONNECT and RECONNECT verbs. Then determine if these records are members of any sorted sets that allow duplicates. DBO/SHOW STATS (stalls) can also provide insight into what resources are involved in a significant number of lock problems – if the index records are frequently involved in stalls, this also suggests that duplicate keys may be the problem.

 

Understanding the Indexed Set Mode

An INDEXED set in DBMS is based on a simple CHAIN set, with one notable exception – in addition to the simple chain set that links each member of the set, an INDEXED set also contains a balanced-tree index structure that points to the first record with a unique key-value. Thus, when performing lookups within indexed sets, DBMS is able to quickly traverse this index structure to find the first matching key. When inserting a new key value, DBMS uses the index structure to locate the first record with a key value that sorts before the new key to be added. Once it finds this record, it simply walks the CHAIN set to find the correct insertion point – each duplicate member may result in an I/O – so if there are 10,000 duplicates, you are in for a long wait.

 

Application Problem or DBMS Problem

Clearly, avoiding duplicates with the application would eliminate this problem…but is this really what you want to do in your application? In SCI’s opinion, this is a DBMS problem, and not a problem with the underlying application. DBMS should simply (maybe not so simple), find the first record with a key value that is greater than or equal to the key to be inserted, use the CHAIN set’s PRIOR pointer to find the insertion point. If this algorithm was used, there would be a maximum of 2 I/Os – A significant improvement!!!

 

Duplicates in the “real world”

Okay, so maybe Oracle is not ready to fix the problem in the next release… so, what are we to do?

 

Workarounds

There are a few ways that you may be able to workaround this deficiency in DBMS. The first would be to modify the application so that the number of duplicate key values is either eliminated or minimized. The second would be to insert “dummy” records with key values that sort just slightly higher that those keys with long lists of duplicates.

 

Eliminating or Minimizing Duplicate Keys

Duplicate keys could be eliminated or minimized by adding an additional field to the end of the key and populating this new field with either a unique sequence number or a “random” value (thus minimizing the number of duplicates). This solution causes two problems – first it will require changes to the database schema and your application. This is not a trivial task. In addition, this will create additional entries in the index structure itself, thus resulting in additional overhead. For example, if previously there was one key value with 1,000 duplicates, there would be only one entry in the index. If we add “sequence” number to this index, thus making all key values unique, the index itself will contain 1,000 entries (one for each unique key value). So, the overhead of the index structure itself may increase significantly.

 

Adding Dummy Records

By adding “dummy” records with unique key values that sort immediately following long lists of duplicates, you can minimize the impact of this performance problem. However, this assumes that these “dummy” records will not adversely affect your application. The other problem with trying to add these dummy records is that the process of inserting these dummy records can be extremely slow – after all, for each dummy record inserted, DBMS must walk all of the preceding duplicate entries (the situation we are trying to avoid in the first place).

 

Top


If you have a technical question about this article or about any other 
CODASYL DBMS or Rdb topic, then ask our experts.

How would you rate this article?

5 (Highest)
4
3
2

0 (Did not apply)
Comments about this article or topic suggestions for new articles

Please type the word above:

Copyright © 2016 Software Concepts International
All Rights Reserved