HomeSearchSite MapContact Us RemoteDBA Services for CODASYL DBMS and RdbPreserving Mission Critical IT Applications Through Virtualization

Remote Management of OpenVMS SystemsExtreme Performance Consulting


DATABASE
ADMINISTRATION

Space Area Mangement Pages (SPAM) in DBMS (DBMS only)
TECHNICAL ARTICLES -> PERFORMANCE ARTICLES
Back ]

Space Area Management (SPAM) is a technique used by Oracle Rdb and Oracle CODASYL DBMS to improve the storage performance of nearly full storage areas. The objective of SPAM technology is to allow the DBCS (Database Control System) to quickly locate a page with sufficient free space to store new records in your database. SPAM pages maintain an inventory of available space on database storage area pages. They enable databases with SPAM pages to spend less time searching for free space in your database during STORE operations. Without SPAM technology, the DBCS would have to search each page in the storage area until it finds a page with sufficient free space to store the new record.

Although Rdb and DBMS both use SPAM technology, the implementation in the two engines differs. This article examines the inner workings of SPAM technology in Oracle CODASYL DBMS. By understanding the purpose of SPAMs and how they work, you can make better decisions when implementing SPAMs in your database design. You will also learn why decreasing SPAM thresholds to a tiny number in versions prior to v4.3 didn't result in a decreased level of record fragmentation.

The optimal SPAM values for your database storage areas depend on the size of the records stored in the area, the desired record placement strategy, the level of database contention in your database as well as the "typical" order in which records are stored on a database page. For example, storing a large record followed by several smaller records requires different strategy than if the small records are stored first, followed by the large records.

SPAM pages maintain a bit-mapped inventory of the free space available on the storage pages in your database. Two (2) bits are used to track the free space of each storage page in your database.

SPAM Bits Decimal Threshold Default
0 0 0 0 0
0 1 1 1 70
1 0 2 2 85
1 1 3 3 95

 

Thus, the free space on each database page is represented by a value between zero (0) and three (3), inclusive. These values correspond to the threshold values you set when you create the database (or when you restore the database using DBO/RESTORE/THRESHOLD). The SPAM entry for a page is updated whenever the amount of data on that page crosses a threshold boundary. It is important to note, that while the DBCS uses the SPAM pages to locate pages with sufficient free space, the threshold values of the SPAM pages are based on used space.

THE SPAM ALGORITHM (pre v4.3)
The DBCS first checks the target page for free space, regardless of SPAM thresholds. The target page is the page in the database that the DBCS wants to store the new record based on the PLACEMENT clause in the storage schema. If there is not enough space available on the target page for the new record, then the DBCS checks the other pages in the current database buffer for a page with sufficient free space to store the new record. The current database buffer is the buffer that contains the target page (as well as additional database pages based on the buffer length). The DBCS only checks the SPAM pages to look for free space when none of the pages in the current buffer contain sufficient free space to store the new record. Thus, with versions of DBMS prior to version 4.3, it is possible to exceed the third threshold (T3) when storing new records. That is, there was no way to define the maximum fullness of the database pages for storing new records. This could cause excessive fullness of the database pages, resulting in severe record fragmentation when the DBCS adds set pointer clusters to records during a CONNECT operation...regardless of SPAM values!

THE SPAM ALGORITHM (after v4.3)
The SPAM algorithm has been enhanced slightly for version 4.3 of DBMS. While earlier releases of DBMS would store a new record on any page in the buffer containing the target page regardless of SPAM values, version 4.3 will not store a record on a page that is already at the T3 threshold. Sites that calculated SPAM thresholds based on the old (pre v4.3) algorithm should adjust their SPAM thresholds to reflect the new SPAM algorithm introduced with version 4.3. While this change offers some improvement over the prior SPAM algorithm, what database designers really need is the ability to specify how much free space to leave on a page. Thus, by knowing how many records will be stored on a page and how much each record is likely to grow (due to set insertions and dynamic data items), it would be possible to reserve this space on the page for future record expansion.

SPAM PARAMETERS
There are two database parameters that affect Space Management; the threshold values, and the interval. Selecting optimal values for SPAM parameters can dramatically improve database performance. Using the default values is guaranteed to be a poor choice.

The threshold values determine the percent fullness at which the SPAM entries are updated, while the interval determines how many database storage pages are managed by each SPAM page.

SPAM INTERVAL
The /INTERVAL qualifier determines how many data pages are managed by each SPAM page in a database storage area. Increasing the SPAM interval may reduce I/O, but may cause additional contention for SPAM pages in a high contention update environment. The default interval for SPAM pages is 256. This means that, by default, 256 database storage pages are managed by each SPAM page. This is also the minimum value that you can specify for the SPAM interval. The maximum SPAM interval value is determined by the following equation:

MAXIMUM_SPAM_INTERVAL = ((blocks-per-page * 512) - 22) * 4

The value of 512 is the number of bytes per block. The value of 22 is the number of bytes in the SPAM page header, and the value of 4 is the number of SPAM entries (two bits) per byte (eight bits).

In general, the larger your storage area, the larger the SPAM interval value should be. This methodology optimizes SPAM effectiveness by reducing the free space search time, while balancing the likelihood of SPAM page conflict. Low contention environments should use the maximum interval value, while high-contention update environments should lower their interval values.

The trade-off when setting SPAM intervals is data page I/O (searching for free space in which to store new records) versus SPAM page conflicts (which results in additional synchronization and I/O to the SPAM page). DBMS provides statistics to help you measure the performance impact of both sides of the SPAM interval equation.

INTERVAL STATISTICS
The best measure of SPAM page contention can be seen on the PIO statistics page of DBO STATISTICS. The statistic, "unmark buffer SPAM page," measures how often SPAM pages are being written back to the database (SPAM page write I/Os). However, this statistic is not a direct measure of SPAM page contention, since SPAM pages are written back to the database due to COMMIT/ROLLBACK, buffer overflows as well as contention.

PIO STATISTICS
If the "unmark buffer SPAM page" statistic is significantly higher than the "unmark buffer transaction," this is a good indication of SPAM page conflicts. Another measure of SPAM page contention can be found by comparing the "unmark buffer -- SPAM page" with the "unmark buffer lock conflict" statistic. However, use caution when making this comparison the "unmark-buffer lock conflict" statistic includes all page buffers not just the SPAM pages. SPAM page contention also shows up in the page locking statistics. Increased SPAM contention would cause an increase in blocking ASTs for page locks. However, since the page locks also manage the entire buffer pool, it is difficult (if not impossible) to determine the severity or cost of SPAM page conflicts from this statistic.

SPAM THRESHOLDS
You can specify up to three threshold values for the /THRESHOLD qualifier. It is possible to set two or more of the threshold values to the same number, thus effectively eliminating one of the threshold levels. If you have only one or two record-types of different lengths in your storage area, use only one or two threshold values to manage space in that storage area. Reducing the number of thresholds may reduce the frequency in which the SPAM pages are updated, thus reducing SPAM page conflicts. By default, DBMS sets three thresholds, with the values 70, 85, and 95 percent.

SPAM threshold values are based on the percentage of used space on a database page. However, to effectively plan your threshold values, you must think of threshold values in terms of how much free space is guaranteed to exist on a database page for each threshold value. The following table helps to illustrate this concept:

Threshold SPAM Value Guaranteed Percent Free
0 0 30%
1 70 15%
2 85 5%
3 95 0%


Thus, if you attempt to store a record that occupies 65% of the database page, the DBCS will check every page that is at the T0 threshold. The DBCS will not waste time checking those pages whose threshold values exceed the T0 level, since they do not guarantee sufficient free space for the new record. Note, that in this example, it is still possible that the DBCS will physically examine several database pages since the lowest threshold value (70) guarantees only 30% free space, while 65 percent is required to store this new record. Thus, if most or all of the pages in the storage area are above 35% fullness, the DBCS will yield little or no benefit from the SPAM thresholds. An approximation of the efficiency of this threshold value can be measured by using the following equation:

SPAM_efficiency =

SPAM_threshold - (100 - pct_needed)
SPAM_threshold - SPAM _ threshold (-1)

= 70 - (100 - 65) = 50%
70 - 0

In this example, only 50 percent of the pages at the T0 threshold will have sufficient space to store the new record! This equation is only approximate, since it assumes a normal distribution of page percentage fullness. In practice, the actual distribution page fullness tends not to follow a normal distribution.

The objective when selecting thresholds, is to choose values that will minimize the amount of physical I/O necessary to store new records in the database. Since the record size, placement strategies and update activity is likely to vary from area to area you must calculate and use different threshold values for each area in your database.

SPAM ALGORITHMS
To determine the optimal threshold values, we will need to understand more about the free space search algorithm used by VAX DBMS.

MAX_FREE_LEN...
The actual amount of space that SPAMS are based on is called the "maximum free length." This value represents the amount of space available on a database page for storing user records (page size less page overhead). The calculation for the max_free_len is different for areas with snapshots and those areas without snapshots.
 

Pages with snapshots:

MAX_FREE_LEN =
(blocks-per-page * 512)
- (24 + 4 + 4 + 18 + 10)
or
(blocks-per-page * 512) - 60

Pages without snapshots:

MAX_FREE_LEN =
(blocks-per-page * 512)
- (24 + 4 + 10)
or
= (blocks-per-page * 512) - 38

The value of 512 is the number of bytes per block. The value of 24 is the number of bytes in the database page header. The first value of 4 is the number of bytes per line-index entry, the second value of 4 (pages with snapshots only) is the number of bytes per TSN (Transaction Sequence Numbers are used by snapshot transactions) index entry. The value of 18 (pages with snapshots only) is the number of bytes in the page footer. The value of 10 is the minimum number of bytes reserved for the SYSTEM record on each page.

SPAM thresholds are percentages of MAX_FREE_LEN, and are always rounded down to the next even byte. Thus, the MAX_FREE_LEN of a storage area with snapshots and a page size of 2 blocks per page is (1024 - 60) or 964 bytes. Using the default thresholds of 70, 85, and 95, the actual threshold byte-values are 674, 818, and 914 respectively. For an existing database, you can confirm these values by issuing the following command:

$ DBO/DUMP root-file-spec/OPT=DEBUG

For each storage area, look for the following symbols:

SPAM_T1 = 1st SPAM threshold in bytes
SPAM_T2 = 2nd SPAM threshold in bytes
SPAM_T3 = 3rd SPAM threshold in bytes

DETERMINING REQUIRED SPACE...
Next, we need to understand how much space the DBCS will look for when storing a new record. The DBCS uses the following equation to determine the required space needed to store the new record:

REQUIRED_LEN =
MAX(record-len + 5, 10)
+ PAGE_PAD_LEN

The value of record-len is the size of the static data portion of the record, as defined by the storage schema. It is easy to determine this value for each of the record-types in your database by using DBO/DUMP/STORAGE/OPTION = DEBUG and searching for the value of the "Static data portion length" of each record-type.

PAGE_PAD_LEN...
The DBCS actually reserves a portion of each database page (based on the PAGE_PAD_LEN) for future record growth on the page. The value of PAGE_PAD_LEN is currently calculated as 10 percent of the physical page size (blocks-per-page * 512) * 0.10. Thus, any threshold value greater than 90% (such as the default T3 value of 95%) is useless! The value of PAGE_PAD_LEN for each area can be verified by dumping the database root file as described above. Search for the value of PAGE_PAD_LEN for each storage area in your database. As stated previously, the value used by the DBCS for the record length, is the length of the static data portion of the record. This value can be verified using DBO/DUMP/STORAGE/OPTION = DEBUG and looking for the value of the "Static data portion length" for each record-type in your database. This value does not include the length of known (automatic) set pointer clusters or the length of dynamic storage items. In other words, the DBCS does not use the actual record-length of the storage segment the value used by the DBCS will always be less than the actual record length.

When looking at the SPAM pages, the DBCS looks for the largest SPAM threshold (in terms of bytes) that is less than or equal to MAX_FREE_LEN - REQUIRED_LEN. Pages at the T3 threshold are never searched, since they cannot guarantee any free space. Pages at the T0 level are always searched, since they guarantee the "most" free space. The following SPAM Selection Example Table illustrates what threshold value DBCS will search for using the default thresholds for a database area with snapshots and a page size of two blocks per page, and assuming a page pad length of ten percent (102 bytes):

THRESHOLD STATISTICS
The objective when setting threshold values is to minimize the number of pages the DBCS searches when storing new data. DBMS provides several statistics that can help you measure the effectiveness of your SPAM thresholds.

RECORD STATISTICS
The Record Statistic, Pages Checked, is the single most useful statistic for determining the overall effectiveness of your SPAM thresholds. This statistic shows the number of data pages that were checked before the DBCS found a page with sufficient space to store the record. By comparing the Pages Checked statistic, with the Records Stored statistic, you can get a very accurate picture of the effectiveness of your SPAM thresholds. In a well-tuned database, Pages Checked should be no more than 2 times Records Stored.

PIO STATISTICS
The PIO Statistic, "SPAM Page Request", provides one measure of the effectiveness of your SPAM strategy. A high number of SPAM page requests relative to the database activity may indicate that your SPAM interval is too low, and that the DBCS is having a difficult time locating free space for new records. This number will also increase when the storage area becomes over utilized.

SPAM SELECTION EXAMPLE
Threshold Level Max Free Bytes Max Free Len Guar Free Bytes Page Pad Len Max Record Size When Threshold is Used
T3 914 964 0 102 = -107 Never Searched!
T2 818 964 50 102 = -57 Never Searched!
T1 674 964 146 102 = 39 Searched only if the record-len <= 39
T0 0 964 290 102 = 183 Always searched

GUARANTEED_FREE_BYTES = Max_Free_Len - Max_Free_Bytes(T-1)
PAGE_PAD_LEN = 10% of Page size in bytes
MAX_RECORD_SIZE = Guaranteed_Free_Bytes - Page_Pad_Len - 5

 

Top


If you have a technical question about this article or about any other 
CODASYL DBMS or Rdb topic, then ask our experts.

How would you rate this article?

5 (Highest)
4
3
2

0 (Did not apply)
Comments about this article or topic suggestions for new articles

Please type the word above:

Copyright 2016 Software Concepts International
All Rights Reserved