Understanding DBAM and Storage Management

DBAM requires that physical disk device storage be allocated to it to store the directory database and the transaction logs. The following tasks need to be performed by you and/or other administrators at your site before you can set up the DBAM database:

First, you need to examine your directory database requirements and determine how much disk storage you need for the DBAM database.
Then, you must decide on a fault-tolerant strategy for maximizing the DBAM database’s availability and reliability.
The next step is to allocate storage for DBAM on the operating system platform on which DirX Directory is to run according to your database requirements.
Finally, you need to describe the allocated disk storage to the DBAM database.

This chapter provides background information about disk technologies that may help you to understand the implications and requirements of the DBAM database on the physical disk configuration at your site. It briefly describes:

The anatomy of a disk device. This information provides an introduction to the components and terminology used to describe a disk device.
Redundant Array of Disks (RAID) structure and management. This information will help you to understand RAID-based fault-tolerant configurations that can be used with DirX Directory to maximize its availability and reliability. It also makes recommendations about the fault-tolerant configurations best suited to DirX Directory.
The disk management architectures and tools of the two operating systems on which DirX Directory can be installed: Microsoft Windows and Linux. This information will help you to understand how the DirX Directory host operating system’s physical disks can be configured and the process for configuring them.

The chapter also describes the DBAM "view" of the underlying disk storage and introduces the tools you use to configure it to the storage you have allocated for its use.

Disk Hardware Concepts

A magnetic disk device consists of a set of platters on which data resides and a set of disk arms with heads that read and write data to and from the platters.Data is stored on both sides of a platter.Each track arm-head pair reads/writes one side of the platter.For example, head 0 controls the top side of platter 0 (the topmost platter), and head 1 controls the bottom side of platter 0. The following figure illustrates these components.

Figure 1. Platter, Disk Arm, and Head Relationships

A platter consists of a number of tracks, for example 1024. Tracks exist on both sides of the platter.A cylinder represents the top and bottom tracks on each platter.For example, cylinder 0 consists of the outermost top and the bottom track on the platter.The following figure illustrates these components.

Figure 2. Platter, Track and Cylinder Relationships

A track is divided into sectors.A sector is the smallest physical storage unit on a disk.It is almost always 512 bytes in size.Operating systems use sectors in clusters.The following figure illustrates these components.

Track, Sector and Cluster Relationships

Track

The operating system managing the disk device translates the cylinder, head, and sector physical format into a logical format.

RAID Disk Management

A RAID system consists of an intelligent manager, either implemented in hardware or software, that can manage multiple disk drives so that the system can withstand the failure of any individual member without a loss of data.RAID provides a method of accessing multiple individual disks as if the array of disks is one large disk.Data access is spread over these multiple disks, reducing the risk of losing all data if one drive fails, and improving access time.RAID improves availability, but it cannot replace backup.

RAID Levels

The RAID Advisory Board has created industry standards for levels of RAID functionality based on the 1988 University of California, Berkeley RAID papers.Basic RAID levels of interest to this discussion include:

Disk striping (RAID level 0)
Disk mirroring (RAID level 1)
Disk striping and mirroring (RAID level 0+1, or 10)
Distributed parity RAID (RAID level 5)

Disk Striping (RAID-0)

Disk striping is a performance-oriented data mapping technique that provides no fault tolerance at all. Data is written in blocks across multiple disks so that one drive can be writing or reading a block while the next drive is seeking the next block. The following figure illustrates disk striping.

Figure 3. Disk Striping (RAID 0)

Disk striping provides a higher access rate and full use of the disk array’s capacity. However, if one disk fails, the entire group fails and data cannot be accessed until the disk has been repaired and the data has been restored.

Disk Mirroring (RAID-1)

Disk mirroring provides an identical twin (or more) for a selected disk. Data is written twice—once to each disk. If there is a read failure on one of the disks, the RAID system can read the data from the other disk in the mirror set. The following figure illustrates disk mirroring.

Disk Mirroring (RAID 1)

Disk mirroring provides very good fault tolerance. It may give better read performance because data can be read from both (or all) disks in parallel, but writes must go to several disks, so writes can be more expensive. However, only half the available disk space can be used for storage, as the other half is needed to create the mirror. This makes the mirrored disk configuration more expensive to implement.

Mirrored Striped Set (RAID 0+1)

A mirrored striped set combines both disk mirroring and disk striping without parity. When data is written to a mirrored striped set, two mirrored virtual disks are created instead of just one virtual disk, as with striping. This configuration provides fast data access, like disk striping and single-disk fault tolerance, like disk mirroring. However, like mirrored disks, only half the available disk space can be used for storage; the other half must be used to create the mirror.

Distributed Parity (RAID-5)

Parity adds fault tolerance to disk striping by including parity information with the data for error recovery. In Distributed Parity (RAID level 5), the parity information is distributed on the different disks (rather than being contained on a dedicated disk, as is the case with RAID levels 3 and 4). The parity information is used to recover data if one disk fails.

The disadvantage of this configuration is a slower write cycle (n-2 reads and 2 writes for each block written). The disk array capacity is n-1, and requires a minimum of 3 disks. Whereas RAID 0+1 requires 100% capacity to protect the data, RAID-5 requires as little as 50% (2+1) and commonly only 20% (5+1).

Recommendations for a DirX Directory Fault Tolerant Configuration

As mentioned earlier, RAID systems can be implemented as software-based systems or hardware-based systems.If you plan to provide a fault-tolerant environment for DirX Directory operation (it is an option, not a requirement), we recommend the following:

Choose a hardware-based fault-tolerant solution.There is a wide choice of hardware-based fault-tolerant solutions offered by both computer manufacturers and third party suppliers.
Implement the RAID 0+1 configuration.However, if disk economy is an issue, implement the RAID-5 configuration (although disks are inexpensive).
Build in hardware redundancy for all components: controllers, buses, and so on.

You must provide an uninterruptible power supply (UPS) for the RAID configuration.Otherwise, the RAID system’s disk cache may lose data.There is a large performance penalty on Windows without disk caching.

Windows Disk Management

A physical disk managed by Windows is either a basic disk or a dynamic disk.When you first install Windows, whether it is a new installation or an upgrade from Windows NT, all disks on the system are configured as basic disks.You can convert a basic disk to a dynamic disk with the Windows Disk Management snap-in tool.

Basic Disks

A basic disk is composed of one or more partitions.A basic disk can contain a maximum of 4 partitions.A partition is composed of one or more volumes (also called logical drives).

A partition can be a primary partition or an extended partition.A primary partition contains a single volume that is the size of the partition.An extended partition can contain more than one volume.The combined size of all the volumes on an extended partition must be less than or equal to the size of the partition.

Basic disks are backward compatible with earlier versions of Windows (for example, Windows NT 4.0 and Windows 98) and with MS/DOS.

Dynamic Disks

Dynamic disks are not available on notebook computers, and they are not supported on removable disks or on disks using Universal Serial Bus (USB) or FireWire (IEEE 1394) interfaces.

A dynamic disk contains volumes; there is no concept of partitions or logical devices. A dynamic disk has the following advantages:

Unlike a basic disk, which is limited to four partitions, a dynamic disk can contain an unlimited number of volumes. You can create as many volumes on a dynamic disk as there is disk space available.
You can extend volumes by adding free space to them.
On multiple disk systems, you can combine space from more than one disk into special volume configurations.
If you want to use the fault-tolerant features provided by the operating system as software RAID support, such as mirroring (RAID-1) or striping with parity (RAID-5), you must use dynamic disks.

As with basic disk partitions, you can assign drive letters or drive paths to volumes when you create them (however, drive paths must be used for disks in a DBAM configuration.)

A volume can be:

A simple volume, which is a single disk.
A spanned volume, which contains up to 32 separate disks. Data fills the space on the first disk, and then the next, and so on.
A mirrored volume, which contains two disks that each contain an identical copy of the volume to provide for fault tolerance.
A striped volume, which contains up to 32 separate disks. Data is spread evenly between all of the disks to provide disk-access efficiency.
A RAID-5 volume, which consists of three or more separate disks. Data and parity information are spread evenly over the disks to provide for fault tolerance.

Simple and spanned volumes can be extended: you can add available free space to increase their size.

RAID-5 volumes can only be created on Windows server computers.

Volumes and basic disk partitions cannot co-exist on the same physical disk, but you can have both dynamic and basic disks on the same system.

When creating volumes, keep in mind that:

A volume is a logical area that resides on one (simple volume) to several (spanned or striped volume) disks.
A disk can be divided into one or several volumes.

The following figures illustrate these concepts and how they relate to the DBAM data storage model (which is described in detail later on in this chapter in the section "DBAM Storage Model"). The first figure shows one disk with two simple volumes, one for directory data and one for transaction data.

Figure 4. One Disk, Two Simple Volumes

The second figure illustrates three disks with one spanned or striped volume for directory data and one simple volume for transaction data.

Figure 5. Three Disks, One Simple Volume, One Spanned Volume

Windows Disk Management Tool

Administrators use the Disk Management snap-in for the Microsoft Management Console (MMC) to manage basic and dynamic disks on Windows. To access the Disk Management snap-in:

Right-click the Start button, and then click Computer Management.
Expand the Storage folder in the hierarchical tree view and click the Disk Management folder. The Disk Management snap-in displays information about the disk devices available to the operating system, as shown in the following figure.

Figure 6. Disk Management Window

When you make changes to the disk configuration with Disk Management, you do not need to restart your computer for your changes to take effect. The MMC online help provides more information about how to use MMC and the Disk Management snap-in.

DBAM Requirements for Windows Disks

Windows disks to be used for DBAM data storage have the following requirements:

It is recommended to use dynamic disks (either upgraded from basic disks or created from scratch)
In order to get best performance the disk "write cache" feature must be enabled. This implies the usage of an uninterruptable power supply (USP). The "write cache" can be switched on under "Computer Management", "Device Manager", "Disk Drives" and then in each concerned device in the tab "Disk Properties".
Volumes created for DBAM on these disks:
- Must be assigned a drive path (mounted at an empty folder in an NTFS partition).This action gives the volume a name that can be used in DBAM configuration.(DBAM does not support drive letters).
- Must not be formatted

In addition, the machine on which the DBAM database is to reside must run under the control of an uninterruptible power supply (USP) or within a computer center that guarantees an uninterrupted power supply.

Note that RAID devices appear to the Windows disk management system as normal disks and are detected automatically when the system is booted.

Linux Disk Management

Refer to your system documentation to get information about Linux disk management.

The DBAM Storage Model

In the DBAM model, the raw devices—the volumes on Windows dynamic disks or the raw data slices (slices that do not contain a file system) on Linux disks—that comprise its database storage are one of two types:

A transaction device, which stores transaction logs.The DBAM database requires that you configure one (and only one) volume or slice as a transaction device.
A directory data device, which stores directory data.The DBAM database requires that you configure at least one unformatted volume or non file-system slice as a directory data device.However, you can configure up to six volumes or slices to split directory data over several disks, according to their respective purposes, to improve performance.

You use the DBAM configuration and initialization tools to configure unformatted volumes or non file-system slices as DBAM devices.

A DBAM data device can store the following kinds of directory data:

Real directory object data—the complete information about the object, including its relative distinguished name (RDN), a pointer to its parent object, its attributes and their values.
Pseudo object data—references to the "real" objects used internally by the DBAM database.
Attribute value index data—the references to the "real" objects that contain the corresponding attribute value.
Bit string data—the bit string references for bitmapped indexes.
Tree data—the hierarchical relationships between the objects.
General data—storage administration data that does not have a specific purpose.

You allocate directory data to a data device according to its type. You can allocate all data types on a single DBAM data device, or you can use multiple data devices to hold the different types of data. For example, you can allocate real object data to one data device, allocate attribute index data to a second data device, and allocate the remaining types of data to a third data device. Each data device maps to a raw device. You can create a maximum of six data devices on six different raw devices, with each data device allocated to one DBAM data type. However, you cannot allocate the same DBAM data type on two different data devices (raw devices).

You can also allocate DBAM data types to a data device according to size; for example, you can specify that 5GB of a data device is to be allocated to tree data, and the remaining disk space is to be divided up between pseudo object, bit string, and general data.

Thus, when you configure a raw device as a data device, you configure it according to the type(s) of directory data it will store and the amount of storage to be allocated to the type(s).

The following figure illustrates the DBAM storage hierarchy.

Figure 7. DBAM Storage Hierarchy

In the figure:

The DBAM data types are allocated to one to six DBAM data devices, which map to one to six raw devices.
The DBAM transaction device maps to one raw device.
The file system is separate from the DBAM devices.It contains the DirX Directory installation, and stores "normal" files such as journal, audit, work area and backup files.
The raw devices are defined at the operating system level using the operating system tools.
The optional RAID system is connected to the operating system to manage striped and mirrored, striped and parity configurations.

As noted in the figure, the block size of each real and attribute value index data type is fixed, whereas the block size of the other data types is variable.The following figure illustrates a sample DBAM disk configuration to manage 30 million directory entries in a mirrored disk configuration.

Figure 8. Example DBAM Disk Configuration

DBAM Configuration Tools

To configure the DBAM database, you first use the DBAM configuration tool dbamconfig to create a configuration of raw devices to be used as the data device(s) and transaction device and save it as a "database profile".You then supply this profile to the DBAM initialization tool dbamboot, which uses the information in the profile to initialize the DBAM database and set up the raw device mapping to the DBAM data and transaction devices.Alternatively, you can use the dbaminit command allocate physical disk space in the file system for the data and transaction devices instead of using raw devices.

Instead of performing a dbamconfig and a dbamboot (or dbaminit) command you can perform a dirxconfig command that first performs a dbamconfig and then a dbamboot (or dbaminit) command with pre-defined values.

dirxconfig

The dirxconfig tool is a command-line tool (accessible from the MS/DOS prompt in Windows) that is co-located with the DBAM database and the DSA.The dirxconfig tool internally performs first a dbamconfig and then a dbamboot command with pre-defined values.It distinguishes between values for a small, a mid-size and a large configuration.

dbamconfig

The dbamconfig tool is a command-line tool (accessible from the MS/DOS prompt in Windows) that is co-located with the DBAM database and the DSA. You use dbamconfig to create, delete and display DBAM database profiles, which are structures that link the allocated raw devices to the DBAM data device/transaction device format. These profiles are configuration information and are not part of the database data. They are stored in the registry on Windows and in a file on Linux.

When you use dbamconfig to create a profile, you specify one raw device as a transaction device and up to six raw devices as data devices. If you specify a single data device and do not make specific directory type allocations, dbamconfig uses the following default allocation:

Real object data is allocated to 40% of the device
Attribute value indexes are allocated to 40% of the device
The other data types are allocated to the remaining 20% of the device

The DirX Administration Reference provides command syntax details and examples.

dbamboot

The dbamboot tool is a command-line tool (accessible from the MS/DOS prompt in Windows) that is co-located with the DBAM database and the DSA. You use the dbamboot tool to initialize the DBAM database according to a DBAM profile created with dbamconfig. You also use the dbamboot tool to define the size of individual real object data blocks and the number of indexed attribute types. You can also use the dbamboot tool whenever you want to re-initialize (delete) your database. The DirX Administration Reference provides command syntax details and examples.

dbaminit

The dbaminit tool is a command-line tool (accessible from the MS/DOS prompt in Windows) that is co-located with the DBAM database and the DSA. You use the dbaminit tool to initialize a file-based DBAM database; that is, you allocate physical disk space in the file system for the DBAM data device(s) and transaction device. When you use dbaminit to configure the database, you specify the name and size of a file to be used as the transaction device and name(s) and size(s) of file(s) to be used as the data device(s). The DirX Administration Reference provides command syntax details and examples.