- Remote DBA
- Disaster Recovery
- Migration Services
- Request Info
- DBMS Courses
- Request Info
- Tech Articles
- Version Info
- Ask SCI's Experts
Imagine the performance benefits of caching your entire database (or significant portions) into main-memory. Except for writes to the AIJ and periodic flushes of the buffer pool all I/O is eliminated.
When global buffers were introduced in version 4.2 of Rdb and 4.3 of DBMS, it appeared that Main Memory databases were within reach. After all, DBMS could now share 32,768 buffers up to 64 blocks each (about a gigabyte). In V6.1 the maximum number of global buffers was increased to 524,288 buffers! Could systems with several gigabytes of main memory combined with a massive global buffering strategy be the answer to all I/O bottlenecks?
Even with the advances made to global buffering in V6.0 (use of recoverable latches for managing the Global Buffer Page Table [GBPT]), and Optimized Page Transfers (introduced in V6.1), massive global buffering is still impractical in a production environment. To understand where massive global buffering breaks down, we need to understand how Rdb and DBMS global buffers are implemented, and what resources they consume.
Rdb and DBMS use VMS global pages for caching global buffers. VMS global pages allow multiple processes on a VMS system to share the same physical memory. In moderation, this strategy works well. Rdb and DBMS use one global section that contains the root information, global buffers and after-image journal data. The approximate size (GBLPAGES) of the global section can be determined from the "Derived Data" section of RMU[DBO]/DUMP/HEADER.
In Rdb and DBMS V7.0.3, global buffers could be placed in SYSTEM SPACE GLOBAL SECTIONS. This space is not subject to system paging, thus allowing the process to “pin” the data in memory. A major benefit of system space global sections is that a process does not have to page-fault this address space into it’s working set – it is essentially “free.” However, system space is a critical VMS system resource – and practically speaking you will be able to use significantly less than 1GB, regardless of total memory installed on the system.
System Parameters (SYSGEN)
GBLPAGFIL: Global sections must be able to be entirely mapped to the global page file (which is part of the system page file). This value is directly related to the size of your global buffer pool. If this is too small, then processes will fail with an "exceeded global page file limit" exception.
GBLSECTIONS: Rdb and DBMS use only one global section to map the root file and global buffers. Therefore, use of global buffers does not affect the number of global sections.
GBLPAGES: This must be large enough for all global section pages on the system. The number of GBLPAGES required for each database with global buffers can be estimated from the "Derived Data" section of RMU[DBO]/DUMP/HEADER. When implementing a "massive global buffering" strategy, GBLPAGES must also be enormous.
VIRTUALPAGECNT: When a process binds to a database, it maps the entire database global section into virtual memory. Prior to V6.1, each stream maps to the global section, requiring a cumulative number of virtual pages to be mapped.
LOCKIDTBL: Each page in the global section (whether referenced or not) requires a page lock. Thus, the number of global buffers directly impacts this parameter.
RESHASHTBL: This parameter is directly tied to the number of locks used on the system. Under allocating this resource will increase CPU utilization.
Working Set Adjustments
AWSTIME: Determines how frequently working set characteristics are sampled by VMS.
PFRATH/PFRATL: Determines the upper and lower page fault rate thresholds to begin adjustments.
WSDEC/WSINC: Determines the size of the adjustments to the working set. Consider setting WSDEC to 0 (zero).
WSMAX: Determines the maximum working set possible on the system. A processes working set cannot grow larger than this value regardless of their WSEXTENT.
Lookaside lists (pre OpenVMS v6.0)
NPAGEDYN: Contains the lookaside lists for OpenVMS V6.0+. Use AUTOGEN with feedback to assist in correct sizing.
User Parameters (AUTHORIZE)
PGFLQUOTA: This parameter determines the maximum number of pages that a process may use in the system page file. Mapped global section pages (global buffers) do not count against PGFLQUOTA. However, during an unbind, when the global section is unmapped, memory becomes local. Thus, the page file may take a hit at unbind time, as well as physical memory requirements.
WSEXTENT: This parameter determines the maximum size of a processes working set. Memory use beyond this value will require page faulting. A value too small will cause excessive global valid page faulting to occur.
The Rdb [DBMS] Monitor
It is the DBMS monitor process that initially creates and maps each database global section. Thus, if multiple databases are accessed simultaneously, the MONITOR process must have sufficient resources to map all database global sections.
What it all means
Ok, so now we know all of the SYSGEN and AUTHORIZE parameters that need to be adjusted when working with global buffers. When used in moderation, these parameters can be set to reasonable values, and global buffers work well. When we begin working with massive global buffers (on the order of ¼ gigabyte or more), the process begins to break down.
Why? If we review the SYSGEN and AUTHORIZE parameters required to bind and unbind a database with massive global buffering enabled, we can see that these values must also be massive. This not to say that the "massive" values by their self are bad.
However, mapping and unmapping global sections is not free. The larger the global section, the more expensive. When you consider the amount of time required for each process (stream) to map and unmap a ½ gigabyte of memory, the bind and unbind time becomes massive. Even in an OLTP-style environment with database servers that pre-bind to the database, massive global buffers are impractical. In the event of a database shutdown or system failure, all processes will attempt to unbind simultaneously, causing an overload of system resources. During a recovery scenario, multiple with recovery processes attempting to bind simultaneously, system resources will again be overloaded.
Even the recovery of a single process is unacceptable when Rdb or DBMS takes out a FREEZE lock that prevents any other process from performing database work until the recovery completes. Assuming the recovery process takes only 3 minutes to bind to the database, all other database activity is suspended until this single recovery process completes. Thus, if you have 500 processes attempting to use the database, all 500 processes are suspended for the duration of the recovery!
That's not to say
While massive global buffers (Main Memory Databases), are not possible at this time, this does not mean that global buffering itself is bad. In fact, many environments can realize significant reductions in I/O by utilizing moderate levels of global buffering to cache database "hot spots."
While main memory databases still elude us, there are still many options that will help to minimize I/O bottlenecks. First, determine the following:
1. Is access to the data evenly distributed across available disk drives and controllers?
2. Is there a sufficient number of disk drives available to sustain the required I/O rates?
3. Have conventional I/O spreading techniques such as I/O balancing and disk stripping been explored?
If these issues have been addressed, then further steps must be taken.
Disk caching can also improve throughput. Software Concepts International recommends using a write-through (to disk immediately) cache rather than write-back (deferred writes) unless you are operating in a single-node system. When a disk cache is used in conjunction with Rdb/DBMS buffering, a double buffering effect occurs - Rdb/DBMS first checks its buffers for the data, then checks the disk cache for the data. Thus, disk caches are generally serve best when used for massive caching, and Rdb/DBMS buffering is minimized. When attempting to cache moderate amounts of data, Rdb/DBMS global buffers typically performs better than in-memory disk caches of the same size. However, disk caching may allow significantly more data to be cached than is practical using global buffering. As always, a solid recovery strategy that has been tested is recommended.
Solid-state (Electronic) Disks (SSDs) also offer a good way to minimize I/O bottlenecks. Rather than eliminate physical I/Os, as buffering or other caching strategies do, SSDs minimize I/O bottlenecks by eliminating the seek and rotational latencies associated with conventional disk drives. Thus, SSDs are able to achieve I/O rates in excess of 2000 I/Os per second. SSDs also utilize non-volatile RMA, with a battery backup that is able to survive power failures.
SSDs appear as a conventional disk drive on your system – they connect to your system through a standard disk bus. Thus, if your I/O bottleneck is at the bus level, then a SSD will not help to eliminate the bottleneck.
RAM disks (a non-option)
"RAM disks" differ from SSDs in many ways. A RAM disk is a virtual disk that resides in main system (volatile) memory – making this too risky for a production environment. A RAM disk is not connected to a disk bus, and thus is not subject to the bus bottlenecks of SSDs. However, this also means that RAM disks cannot be shared across multiple systems. In the event of a system failure, the contents of your "RAM disk" will be lost. "RAM disks" are far too risky to be used in a production critical environment.