Space Area Management
(SPAM) is a technique used by Oracle Rdb and Oracle CODASYL DBMS to
improve the storage performance of nearly full storage areas.The objective of SPAM technology is to allow the DBCS (Database
Control System) to quickly locate a page with sufficient free space to
store new records in your database.SPAM
pages maintain an inventory of available space on database storage area
pages.They enable databases with
SPAM pages to spend less time searching for free space in your database
during STORE operations.Without
SPAM technology, the DBCS would have to search each page in the storage
area until it finds a page with sufficient free space to store the new
record.
Although Rdb and DBMS both
use SPAM technology, the implementation in the two engines differs.This article examines the inner workings of SPAM technology in
Oracle CODASYL DBMS.By
understanding the purpose of SPAMs and how they work, you can make better
decisions when implementing SPAMs in your database design.You will also learn why decreasing SPAM thresholds to a tiny number
in versions prior to v4.3 didn't result in a decreased level of record
fragmentation.
The optimal SPAM values
for your database storage areas depend on the size of the records stored
in the area, the desired record placement strategy, the level of database
contention in your database as well as the "typical" order in
which records are stored on a database page.For example, storing a large record followed by several smaller
records requires different strategy than if the small records are stored
first, followed by the large records.
SPAM pages maintain a
bit-mapped inventory of the free space available on the storage pages in
your database.Two (2) bits are
used to track the free space of each storage page in your database.
SPAM Bits
Decimal
Threshold
Default
0 0
0
0
0
0 1
1
1
70
1 0
2
2
85
1 1
3
3
95
Thus, the free space on
each database page is represented by a value between zero (0) and three
(3), inclusive.These values
correspond to the threshold values you set when you create the database
(or when you restore the database using DBO/RESTORE/THRESHOLD).The SPAM entry for a page is updated whenever the amount of data on
that page crosses a threshold boundary.It
is important to note, that while the DBCS uses the SPAM pages to locate
pages with sufficient free space, the threshold values of the SPAM
pages are based on used space.
THE SPAM
ALGORITHM (pre v4.3)
The DBCS first checks the target page for free space, regardless
of SPAM thresholds.The target
page is the page in the database that the DBCS wants to store the
new record based on the PLACEMENT clause in the storage schema.If there is not enough space available on the target page for the
new record, then the DBCS checks the other pages in the current
database buffer for a page with sufficient free space to store the new
record.The current database buffer is the buffer that contains the
target page (as well as additional database pages based on the buffer
length).The DBCS only
checks the SPAM pages to look for free space when none of the pages in the
current buffer contain sufficient free space to store the new record.Thus, with versions of DBMS prior to version 4.3, it is possible to
exceed the third threshold (T3) when storing new records.That is, there was no way to define the maximum fullness of the
database pages for storing new records.This
could cause excessive fullness of the database pages, resulting in severe
record fragmentation when the DBCS adds set pointer clusters to records
during a CONNECT operation...regardless of SPAM values!
THE SPAM
ALGORITHM (after v4.3)
The SPAM algorithm has been enhanced slightly for version 4.3 of DBMS.While earlier releases of DBMS would store a new record on any page
in the buffer containing the target page regardless of SPAM values,
version 4.3 will not store a record on a page that is already at
the T3 threshold.Sites that
calculated SPAM thresholds based on the old (pre v4.3) algorithm should
adjust their SPAM thresholds to reflect the new SPAM algorithm introduced
with version 4.3.While this change
offers some improvement over the prior SPAM algorithm, what database
designers really need is the ability to specify how much free space
to leave on a page.Thus, by
knowing how many records will be stored on a page and how much each record
is likely to grow (due to set insertions and dynamic data items), it would
be possible to reserve this space on the page for future record expansion.
SPAM
PARAMETERS
There are two database parameters that affect Space Management; the threshold
values, and the interval.Selecting
optimal values for SPAM parameters can dramatically improve database
performance. Using the default values is guaranteed to be a poor
choice.
The threshold values
determine the percent fullness at which the SPAM entries are updated,
while the interval determines how many database storage pages aremanaged
by each SPAM page.
SPAM
INTERVAL
The /INTERVAL qualifier determines how many data pages are managed by each
SPAM page in a database storage area.Increasing
the SPAM interval may reduce I/O, but may cause additional contention for
SPAM pages in a high contention update environment.The default interval for SPAM pages is 256.This means that, by default, 256 database storage pages are managed
by each SPAM page.This is also the
minimum value that you can specify for the SPAM interval.The maximum SPAM interval value is determined by the following
equation:
The value of 512 is the
number of bytes per block.The
value of 22 is the number of bytes in the SPAM page header, and the value
of 4 is the number of SPAM entries (two bits) per byte (eight bits).
In general, the larger
your storage area, the larger the SPAM interval value should be.This methodology optimizes SPAM effectiveness by reducing the free
space search time, while balancing the likelihood of SPAM page conflict.Low contention environments should use the maximum interval value,
while high-contention update environments should lower their
interval values.
The trade-off when setting
SPAM intervals is data page I/O (searching for free space in which to
store new records) versus SPAM page conflicts (which results in additional
synchronization and I/O to the SPAM page).DBMS provides statistics to help you measure the performance impact
of both sides of the SPAM interval equation.
INTERVAL
STATISTICS
The best measure of SPAM page contention can be seen on the PIO statistics
page of DBO STATISTICS.The
statistic, "unmark buffer – SPAM page,"
measures how often SPAM pages are being written back to the database (SPAM
page write I/Os).However, this
statistic is not a direct measure of SPAM page contention, since SPAM
pages are written back to the database due to COMMIT/ROLLBACK, buffer
overflows as well as contention.
PIO
STATISTICS
If the "unmark buffer – SPAM
page" statistic is significantly higher than the "unmark buffer –
transaction," this is a good indication of SPAM page conflicts.Another measure of SPAM page contention can be found by comparing
the "unmark buffer -- SPAM page" with the "unmark buffer – lock
conflict" statistic.However,
use caution when making this comparison – the
"unmark-buffer – lock
conflict" statistic includes allpage buffers – not just
the SPAM pages.SPAM page
contention also shows up in the page locking statistics.Increased SPAM contention would cause an increase in blocking ASTs
for page locks.However, since the
page locks also manage the entire buffer pool, it is difficult (if
not impossible) to determine the severity or cost of SPAM page conflicts
from this statistic.
SPAM
THRESHOLDS
You can specify up to three threshold values for the /THRESHOLD qualifier.It is possible to set two or more of the threshold values to the
same number, thus effectively eliminating one of the threshold levels.If you have only one or two record-types of different lengths in
your storage area, use only one or two threshold values to manage space in
that storage area.Reducing the
number of thresholds may reduce the frequency in which the SPAM pages
are updated, thus reducing SPAM page conflicts.By default, DBMS sets three thresholds, with the values 70, 85,
and 95 percent.
SPAM threshold values are
based on the percentage of used space on a database page.However, to effectively plan your threshold values, you must think
of threshold values in terms of how much free space is guaranteed
to exist on a database page for each threshold value.The following table helps to illustrate this concept:
Threshold
SPAM Value
Guaranteed Percent Free
0
0
30%
1
70
15%
2
85
5%
3
95
0%
Thus, if you attempt to store a record that occupies 65% of the database
page, the DBCS will check every page that is at the T0 threshold.The DBCS will not waste time checking those pages whose threshold
values exceed the T0 level, since they do not guarantee sufficient free
space for the new record.Note,
that in this example, it is still possible that the DBCS will physically
examine several database pages since the lowest threshold value (70)
guarantees only 30% free space, while 65 percent is required to store this
new record.Thus, if most or all of
the pages in the storage area are above 35% fullness, the DBCS will yield
little or no benefit from the SPAM thresholds.An approximation of the efficiency of this threshold value can be
measured by using the following equation:
In this example, only 50
percent of the pages at the T0 threshold will have sufficient space to
store the new record! This equation is only approximate, since it assumes
a normal distribution of page percentage fullness.In practice, the actual distribution page fullness tends not
to follow a normal distribution.
The objective when
selecting thresholds, is to choose values that will minimize the amount of
physical I/O necessary to store new records in the database.Since the record size, placement strategies and update activity is
likely to vary from area to area you must calculate and use different
threshold values for each area in your database.
SPAM
ALGORITHMS
To determine the optimal threshold values, we will need to understand more
about the free space search algorithm used by VAX DBMS.
MAX_FREE_LEN...
The actual amount of space that SPAMS are based on is called the
"maximum free length."This
value represents the amount of space available on a database page for
storing user records (page size less page overhead).The calculation for the max_free_len is different for areas with
snapshots and those areas without snapshots.
The value of 512 is the
number of bytes per block.The
value of 24 is the number of bytes in the database page header.The first value of 4 is the number of bytes per line-index entry,
the second value of 4 (pages with snapshots only) is the number of bytes
per TSN (Transaction Sequence Numbers are used by snapshot transactions)
index entry.The value of 18 (pages
with snapshots only) is the number of bytes in the page footer.The value of 10 is the minimum number of bytes reserved for the
SYSTEM record on each page.
SPAM thresholds are
percentages of MAX_FREE_LEN, and are always rounded down to the next even
byte.Thus, the MAX_FREE_LEN of a
storage area with snapshots and a page size of 2 blocks per page is (1024
- 60) or 964 bytes.Using the
default thresholds of 70, 85, and 95, the actual threshold byte-values are
674, 818, and 914 respectively.For
an existing database, you can confirm these values by issuing the
following command:
$ DBO/DUMP
root-file-spec/OPT=DEBUG
For each storage area,
look for the following symbols:
SPAM_T1 = 1st SPAM
threshold in bytes
SPAM_T2 = 2nd SPAM threshold in bytes
SPAM_T3 = 3rd SPAM threshold in bytes
DETERMINING
REQUIRED SPACE...
Next, we need to understand how much space the DBCS will look for when
storing a new record.The DBCS uses
the following equation to determine the required space needed to store the
new record:
The value of record-len
is the size of the static data portion of the record, as
defined by the storage schema.It
is easy to determine this value for each of the record-types in your
database by using DBO/DUMP/STORAGE/OPTION = DEBUG and searching for the
value of the "Static data portion length" of each record-type.
PAGE_PAD_LEN...
The DBCS actually reserves a portion of each database page (based on the
PAGE_PAD_LEN) for future record growth on the page.The value of PAGE_PAD_LEN is currently calculated as 10 percent of
the physical page size (blocks-per-page * 512) * 0.10. Thus, any threshold
value greater than 90% (such as the default T3 value of 95%) is useless!
The value of PAGE_PAD_LEN for each area can be verified by dumping the
database root file as described above.Search
for the value of PAGE_PAD_LEN for each storage area in your database. As
stated previously, the value used by the DBCS for the record length, is
the length of the static data portion of the record.This value can be verified using DBO/DUMP/STORAGE/OPTION = DEBUG
and looking for the value of the "Static data portion length"
for each record-type in your database.This
value does not include the length of known (automatic) set pointer
clusters or the length of dynamic storage items.In other words, the DBCS does not use the actual record-length of
the storage segment – the value
used by the DBCS will always be less than the actual record
length.
When looking at the SPAM
pages, the DBCS looks for the largest SPAM threshold (in terms of bytes)
that is less than or equal to MAX_FREE_LEN - REQUIRED_LEN.Pages at the T3 threshold are never searched, since they
cannot guarantee any free space.Pages
at the T0 level are always searched, since they guarantee the
"most" free space.The
following SPAM Selection Example Table illustrates what threshold value
DBCS will search for using the default thresholds for a database area with
snapshots and a page size of two blocks per page, and assuming a page pad
length of ten percent (102 bytes):
THRESHOLD
STATISTICS
The objective when setting threshold values is to minimize the number of
pages the DBCS searches when storing new data.DBMS provides several statistics that can help you measure the
effectiveness of your SPAM thresholds.
RECORD
STATISTICS
The Record Statistic, Pages Checked, is the singlemost
useful statistic for determining the overall effectiveness of your SPAM
thresholds.This statistic shows the number of data pages that were checked
before the DBCS found a page with sufficient space to store the record.By comparing the Pages Checked statistic, with the Records
Stored statistic, you can get a very accurate picture of the
effectiveness of your SPAM thresholds.In
a well-tuned database, Pages Checked should be no more than 2 times
Records Stored.
PIO
STATISTICS
The PIO Statistic, "SPAM Page Request", provides one measure of
the effectiveness of your SPAM strategy.A
high number of SPAM page requests relative to the database activity may
indicate that your SPAM interval is too low, and that the DBCS is having a
difficult time locating free space for new records.This number will also increase when the storage area becomes over
utilized.
SPAM
SELECTION EXAMPLE
Threshold Level
Max
Free Bytes
Max
Free Len
Guar Free Bytes
Page Pad Len
Max
Record Size
When Threshold is Used
T3
914
964
0
102
= -107
Never Searched!
T2
818
964
50
102
= -57
Never Searched!
T1
674
964
146
102
= 39
Searched only if the record-len <= 39
T0
0
964
290
102
= 183
Always searched
GUARANTEED_FREE_BYTES
= Max_Free_Len - Max_Free_Bytes(T-1)
PAGE_PAD_LEN
= 10% of Page size in bytes
MAX_RECORD_SIZE
= Guaranteed_Free_Bytes - Page_Pad_Len - 5