Using Multiple Contact DSAs

This chapter provides information on how to set up and manage a floating master-shadow configuration that distributes DAP processing requests from LDAP servers across the shadow DSAs in the configuration to provide for better DAP load balancing.

The sample shadow configuration provided in this chapter builds on the sample scenarios described in the chapters “Creating a Shadow DSA”, “Distributing the DIT Across Multiple DSAs”, “Multireplication” and “Creating a Synchronous Shadow DSA”. We suggest you read these chapters to familiarize yourself with shadowing, distribution and replication concepts and procedures before reading this chapter.

Understanding the Multiple Contact DSA Configuration

The LDAP servers configured in the DirX Directory service scenarios described in the previous chapters in this guide all use a single contact DSA – a DSA whose name and PSAP address is configured in the LDAP server’s dirxldap.cfg file - for performing DAP operations derived from incoming LDAP operations.In these scenarios, there is a one-to-one relationship between the LDAP server and its contact DSA, which means that all LDAP traffic is forwarded to this DSA for processing.

In master-shadow configurations where the complete DIT is replicated to all consumer DSAs and all consumer DSAs have the same system configuration, the consumer DSAs are idempotent: an LDAP request can be forwarded to any one of the consumer DSAs for equivalent processing.This kind of configuration allows for the distribution of LDAP traffic and subsequent DAP operations among the consumer DSAs rather than isolating DAP processing to the DSA connected to the LDAP server and potentially overloading it while the other consumer DSAs remain mostly idle.

To improve DAP load balancing for this kind of master-shadow configuration, administrators can define multiple contact DSAs in an LDAP server’s configuration file.On each new DAP bind, the LDAP server will contact these DSAs on demand using round-robin selection.The following figure illustrates the difference in DAP load balancing between single and multi-contact DSA configurations:

Figure 1. Single vs. Multiple Contact DSA Configurations

As shown in the figure, the multiple-contact DSA configuration distributes the load across all DSAs no matter how much load occurs at any of the LDAP server front ends.

Configuration Requirements

The multiple-contact DSA configuration has some requirements on DSA, LDAP server and network configuration:

All contact DSAs must be identical: they must shadow the complete DIT and they must all have the same system configuration.
The master DSA should not be included as a contact DSA because unexpected operations can occur.For example, unexpected chaining operations can occur when the master DSA is switched.
The shadow DSAs must use synchronous shadow agreements if the environment supports “read after write” clients and these clients are allowed to contact shadow DSAs.If asynchronous mode is used, there is a race condition between the time it takes for the DISP update from the master to the shadows and how quickly the client performs the read after completing the modify.If the client read occurs before the shadow DSA is updated, the read data will be the old data, not the latest content.
Network communication must be possible between all contact DSAs and the LDAP server.Consequently, all firewalls and routers in the configuration must permit DAP connections (IDM on top of TCP/IP with port 21200) from all consumer DSA machines to all other consumer DSA machines.
All contact DSAs should always be available.DSAs that are expected to be frequently off-line—especially during LDAP server startup—should not be included as contact DSAs.However, DSAs that may fail or be taken off-line intermittently can be included without causing problems.
Contact DSAs must all have similar response times for seamless interaction with LDAP clients.DSAs whose response times are known to be significantly slow —because of geographical distance, number of hops, or slow lines, for example—should not be included as contact DSAs.
The primary LDAP servers running on the shadow DSAs in the configuration should not be scheduled to re-start at the same time, and each LDAP server should be configured to iterate through its contact DSAs in a different order from the other LDAP servers.Using these techniques optimizes LDAP server startup performance and avoids potential delays in LDAP server startup should one or more DSAs become unresponsive.

Additional Features, Limitations and Issues

The multiple contact DSA configuration offers some benefits to a DirX Directory service installation as well as presenting some limitations, tradeoffs and issues regarding other components in the installation:

The LDAP server in a multiple contact DSA configuration automatically moves to the next available DSA when it detects that a contact DSA is down and then retries the operation.As a result, this kind of configuration provides a simple failover mechanism as opposed to the single contact DSA configuration.The section titled “Automatic Failover Handling” provides more detail about DSA failover and the consequences of DSA outages in the multiple contact DSA configuration.
Multi-contact DSA load distribution is independent of external load balancing across LDAP servers; for example, through an external hardware load balancer that distributes the LDAP load across the LDAP servers.If the DirX Directory environment includes an LDAP server load balancer, implementing a multiple-contact DSA configuration is not necessary because the LDAP load balancer provides the same solution, and having multiple contact DSAs does not provide any additional benefit for load balancing.For failover, however, there are real benefits: the LDAP servers will retry a running operation and possibly succeed with the next DSA, keeping the error invisible to the client.This is not the case with load balancers in a single contact DSA environment: they return the DSA failure to the client which then (possibly) retries, prompting the hardware load balancer to select another LDAP server.
Modifications performed at a shadow DSA in a synchronous shadowing configuration take longer than they do in an asynchronous shadowing configuration because all shadow DSAs must be synchronized before the modification is returned to the LDAP client.You can read more about the effects of synchronous and asynchronous shadowing in the chapter “Creating a Synchronous Shadow”.
LDAP clients that perform a lot of modifications to the DIT and that connect to shadow DSAs may experience performance degradation because the shadow DSA must forward the operation to the master DSA, which is unnecessary if the client is directly connected to the master.However, the master DSA will also experience heavy load in this scenario regardless of the contact DSA configuration, because the DirX Directory service permits only one DSA to perform updates.So, there is a tradeoff between reducing the search load at the master DSA by distributing it to the shadows and incurring the additional DSP call between the shadow and the master for every modify operation that enters a shadow.Predicting which strategy performs better is difficult and depends on many different parameters, so enabling DAP distribution may not always be the best choice.
There can be a mix of single-contact and multiple-contact DSA configurations within a set of multiple LDAP servers on a machine (via a subentry-specific LDAP configuration file).If Nagios monitoring is used to query and observe data from a dedicated DSA, the LDAP server that handles the DSA-specific Nagios queries should use the single-contact DSA configuration (specifying the dedicated DSA as its contact) to ensure that the results always come from this DSA.The section “Mixing Single and Multiple Contact DSAs” in this chapter provides an example.

Automatic Failover Handling

Although the multiple-contact DSA configuration is intended primarily for DAP distribution, it provides a simple but powerful failover handling capability. With a single contact DSA, the client receives an error when the DSA is down. In a multiple-contact DSA configuration, the LDAP server moves to the next available DSA after detecting the dropout and then retries the operation, allowing the LDAP the client operation to succeed in most cases.

Single vs. Multiple Contact DSA Configurations when a DSA Fails

100% success is an idealistic assumption, as there are rare situations in which an error can be returned if the DSA outage occurs at certain stages of an operation’s processing, especially if the operation is running in the DSA while the DSA crashes. In these situations, it depends on exactly where the operation is when the crash occurs. For example, if the operation is returning its result by sending the DAP result PDU back to the LDAP server, the DSA dropout can be experienced at this stage as an invalid PDU received on the LDAP side and is thus handled differently than if just the socket is detected as closed due to the DSA crash. In the long term, most of the situations will end up with a normal socket-close detection on the LDAP side. In these cases, the LDAP server will retry the operation against another selectable DSA, and the client will see nearly100% success for most cases. However, when severe network problems occur and no DSA can be reached, errors will appear.

How the LDAP Server Temporarily Disables a Failing DSA

When the LDAP server detects an unavailable DSA, it automatically disables it from contact DSA selection for a specified time period to avoid frequently repeated retries and errors for this DSA. The LDAP server distributes the requests among the remaining DSAs without any further contact with the DSA it has disabled. When the disable period expires, the DSA becomes selectable again and the LDAP server re-tries it. Depending on the outcome of the re-try, the DSA either remains enabled or is disabled again for the same time period.

The DIRX_LDAP_AUTO_DISABLE_FAILING_DSA environment variable controls how long the LDAP server excludes a DSA from selection; the format is

DIRX_LDA_AUTO_DISABLE_FAILING_DSA=nn

where nn is the number of seconds for which the DSA is to be excluded from selection. The default value set in this environment variable is 60 seconds. You can change this default by setting the DIRX_LDAP_AUTO_DISABLE_FAILING_DSA environment variable to a different value. See the DirX Directory Administration Reference chapter on DirX Directory environment variables for details about this and other DirX Directory environment variables.

How the Watchdog Timer Handles DSA Failures

In DirX Directory, the main server processes are started and observed by their watchdog (on Windows, dirxsrv; on Linux, dirxdsas). The watchdog in earlier versions of DirX Directory runs in a mode where it automatically re-starts the LDAP server when its local DSA crashes, resulting in an unconditional loss of all existing LDAP connections at the time of the DSA crash. In newer DirX Directory versions (V8.5 and higher), the watchdog does not automatically re-start the LDAP server if the local DSA crashes. The LDAP server detects a DSA crash by the drop of the backend DAP connections, and, if running in multiple contact DSA mode, tries to establish new DAP connections to the other contact DSAs and retries the operation. As a result, the DSA outage remains invisible to the client.

Trying a dropped DSA may take some time (due to TCP rules). Consequently, the operation, while ultimately successful, may experience a longer duration than it would have if the DSA was up.

You can enable the previous watchdog mode by setting the DIRX_WDOG_RESTART_LDAP_ON_DSA_RESTART environment variable. See the DirX Directory Administration Reference chapter on DirX Directory environment variables for more information about this and other DirX Directory environment variables.

DSA Outage Handling and its Consequences

If multiple contact DSA are configured, it is necessary to understand the consequences of the fact that a local LDAP server has connections to other DSAs rather than just the local one. As the selection is simply based upon round-robin, a DSA that is permanently down or for a long time will cause every nth backend connection to fail. This leads to some consequences. A dropout of a contact DSA can occur at three special times, as seen from the LDAP client’s point of view:

The selected DSA is down and the LDAP client performs a new bind
The LDAP client has established a connection but no operation is running when the DSA drops out
The LDAP client has established a connection and the DSA drops out while the client is performing an operation

To understand the reaction to each of these three cases, it is necessary to understand that each LDAP frontend connection is associated with one DAP backend connection to a DSA represented by a DAP handle. The handle is created during LDAP bind time and is stored along with the LDAP connection. Thus, whenever a LDAP operation is performed on this LDAP connection, this DAP handle is used to perform the corresponding DAP request. Now let’s examine the consequences for each case.

Case 1: Selected DSA is Down, Client Performs a New Bind

In this case, a DAP handle does not exist at LDAP bind time. A DAP bind call will fail and no DAP handle is generated. The LDAP server checks if more than one contact DSA is configured and then retries the DAP bind implicitly by selecting the next available DSA until success or until all DSAs have been tried.

If the DAP bind can be established to any of the configured DSAs, the client will receive success, and the dropout of a DSA is invisible.

If DSAs are down when the DAP bind occurs, it may take some time (mainly because of the TCP retry timeout) for TCP to report the unavailability. The time it takes for TCP to detect that a connect() call will not be answered depends on many factors and can vary from instantaneously to up to 20-30 seconds depending on the topology and the machine states. For example, connects to machines that are down may experience the full 20-30 second timeout while connects to co-located DSAs are detected instantly.

Bind Delay due to TCP Timeout

A DAP bind always starts by establishing a new TCP connection via a connect() call. If the target DSA is not reachable (for example, because its host is down), TCP internally retries the connect() call using TCP retransmission protocol, which retries the connection establishment several times (configured with TCP parameters) and returns an error only after the last try fails.

This operation leads to the delayed detection of an unreachable peer and results in a delay before the next DSA is selected that cannot be avoided. As a result, even if the next selection finally succeeds, there can be a time delay for the LDAP client before a DAP bind can be established. Therefore, it is important to configure only DSAs for the selection that are assumed to be “always” available.

be careful about changing TCP parameters to shorten the timeout sequences, as they might have positive effects for the DAP rebind but may have fatal consequences for other aspects. The better choice is to remove “unreachable” DSAs from the contact-DSA list entirely and/or disable them with the dirxextop extended operation ldap_disable_config_dsa. For details, see the dirxextop reference page in the DirX Directory Administration Reference.

Effect of Unexpected Bind Delay

Due to the possible TCP delay for detecting unreachable peers, an unexpected bind behavior can occur. Let’s assume the LDAP server has two DSAs configured as possible contact DSAs. Let’s further assume the host of DSA1 is down and TCP takes 10 seconds to detect the outage while DSA2 is up and will respond immediately when contacted. Let’s also assume that DSA1 is the one to be selected next for a new DAP Bind.

To better illustrate the effect, let’s also assume that a failing DSA is temporarily (automatic) disabled for only a very short time (less time than it takes for TCP to detect it).

When looking at this scenario, you would assume that 100% of the binds succeed but 50% of the new LDAP binds respond immediately and 50% respond with a delay of 10 seconds. However, this is not the case. To understand why, it is necessary to understand a little how TCP works:

If an LDAP client performs a new bind, the DAP bind is tried against DSA1. After 10 seconds, the error shows up from TCP and DSA2 is selected. Because DSA2 responds instantly, the LDAP client sees the response after 10 seconds.
But what about the next new client bind? To which DSA will this bind try the DAP connection? As the previous one succeeded with DSA2, the round-robin algorithm again chooses DSA1 for the next DAP bind and again the TCP delay occurs. Thus, in a scenario where one of the two DSAs is permanently down, all LDAP binds see the TCP delay, not just 50% of them.

This behavior is a direct consequence of the simple round-robin algorithm and the fact that 50% of the resources are down. Thus, it is essential to have all configured DSA available all the time in a multiple contact DSA configuration. Usually there are two types of DSA dropouts: accidental and intentional. Intentional dropouts are mostly caused by a machine reboot or by an intentional shutdown of the DirX Directory service—for example, for maintenance—and usually can take some time. For these cases, we recommend using dirxextop ldap_disable_config_dsa to disable the corresponding DSA from the selection list of the LDAP servers dynamically before shutting it down or rebooting the host. For details, see the dirxextop reference page in the DirX Directory Administration Reference.

Accidental dropouts are usually less critical, as TCP usually can detect the missing peer port listener quite rapidly, so no significant delay is to be expected for these cases.

This example shows that you should not set the DIRX_LDAP_AUTO_DISABLE_FAILING_DSA environment variable too low. At a minimum, it should be higher than the TCP retry transmission sequence time for the given host (this is a function of local TCP configuration and typically lies in the range of 10 to 20 seconds).Thus the default of 60 seconds is a good choice in most cases.

Case 2: LDAP Connection Exists, No Client Operation is Running

An existing LDAP connection always has a corresponding DAP bind represented by the DAP handle. A dropout of the DSA can only be detected when the DAP handle is used again in order to perform the next DAP operation. In this case, the next DAP operation experiences a DAP error indicating that the DAP handle is no longer valid. The LDAP server then tries to re-bind the DAP connection, implicitly iterating through the contact DSA list. On a successful DAP re-bind, the LDAP server exchanges the old (invalid) handle with the new one and re-tries the client operation on the new DAP connection. Thus the DSA dropout is transparent.

DAP Connection Loss and Paged Searches

If the next operation performed when the DAP handle has meanwhile become invalid is a nextpage request of a previously started paged-search request, no re-bind is performed; instead, an LDAP_OPERATION_ERROR is returned. This behavior is necessary because a paged search usually leaves continuation information in the contact DSA that stores information about where to continue when the next page is requested. As this resume-info is stored in the memory of the DSA that has received the first page request, no other DSA has any knowledge of it, and so a nextpage request cannot implicitly be redirected to another DSA. Instead, the nextpage request produces an error and the search cannot continue.

If a nextpage operation is performed after a different previous operation was processed, the invalid DAP handle may be restored via rebind in the context of the previous operation and so the nextpage will possibly use the wrong DSA, as the rebind may have selected a different DSA than the one that holds the resume-info for the nextpage call. In this case, the operation returns with an UNWILLING_TO_PERFORM error and an error message indicating that the QueryRef (the paging cookie) is invalid. (This action may only occur if backend sharing is enabled in the LDAP server’s configuration.)

Case 3: LDAP Connection Exists, Client Operation is Running

If the DSA drops out while an operation is being performed, the corresponding DAP operation may experience an internal DAP error (for example, a REMOTE_ABORT, LOCAL_ABORT or BAD_SESSION error). The LDAP server analyzes the error. If a dropout is indicated, the server tries to re-bind the DAP connection (implicitly iterating through the contact DSAs) and on success, re-issues the DAP operation, making the DSA dropout transparent to the client.

Consequently, as long as only one DSA from the configured list is down, the LDAP client should not recognize the DSA outage. However, if more than one DSA is down, LDAP errors may appear even though the co-located DSA is still up and running.

How the LDAP Server Handles DSA Outages at Startup

When an LDAP server starts, it tries to read its configuration and schema from the DSA. There is no guarantee that the contacted DSA is the co-located DSA. If the selected DSA is down, the LDAP server tries to come up by re-trying all contact DSAs and continuing if at least one of them is up and returns the required data for configuration.

The LDAP server also establishes a pool of anonymous DAP connections each of which uses a different DSA. If a DSA cannot be contacted, The LDAP server tries the next one until the full DAP pool is established or all DSA are down, in which case the server exits.

TCP Timeout when Connecting to Dropped-out DSAs

Care must be taken if DSAs are down during LDAP server startup, as the detection of whether a DSA can be reached is completely based upon TCP rules and possibly firewall configuration. If no firewalls are active, a client that tries to connect to a running machine where no DSA is up will receive an immediate reject by the TCP system. In such cases, the LDAP server immediately continues to contact the next DSA. If firewalls or virtual machines are in place, chances are (depending on the set of rules) that they are configured so that rejects are not allowed to be sent out back to the caller, and therefore detection occurs via TCP retry timeouts, which can take up to several seconds (typically 10-20 seconds) before the LDAP server receives a notice of failure from TCP for its connect() request. In these cases, the LDAP server startup process may experience a significant delay (n-1 * TCP timeout in the worst case) before the server finally starts.

Having many DSAs configured but only a few of them up during LDAP startup may lead to minutes of startup time.Consequently, don’t configure too many contact DSAs that are likely to be down frequently, especially during LDAP startup.If this situation occurs, either remove these DSAs from the contact DSA list for the LDAP server or—if the server is already up and should not be re-started—disable the DSA for selection using dirxextop ldap_disable_config_dsa.For details, see the dirxextop reference page in the DirX Directory Administration Reference.

How Backend Sharing Affects DSA Selection

When backend sharing is active (the default), a new LDAP bind does not necessarily Include the creation of a new DAP backend connection.Therefore, once a DAP bind exists, no new DSA selection takes place if the same user (but a different client) performs another LDAP bind.If a subsequent LDAP operation is forwarded to the DSA, the outage is detected and the DAP connection is restored internally.If backend sharing is active, the restoration may have occurred through some other client meanwhile because both clients share the same DAP connection and the first operation (whoever is the client) on this connection detects the error and causes the internal re-bind.

Consequently, client “A”, which starts communicating with DSA1, may perform the next operation against DSA2, although it has never experienced any loss of LDAP connection or received an error.This is the reason why it’s important for all DSAs to be idempotent.

If backend sharing is inactive, each new LDAP bind selects the next DSA from the configured list separately.

Planning the Multiple Contact DSA Configuration

Recall from “Creating a Synchronous Shadow DSA” that My-Company administrators set up a floating-master shadow configuration in which three DSAs forming the company’s data center back end use the synchronous shadowing protocol and are located within close geographical proximity to minimize network latency.

After evaluating the benefits, requirements and issues described in “Understanding the Multiple Contact DSA Configuration”, the administrators of the three DSAs in this master-shadow configuration decide to extend the LDAP servers of the two consumer DSAs to support the multiple contact DSA configuration for the following reasons:

The two consumer DSAs have already been configured to be identical: they shadow the complete DIT and support the same system configuration and they already support the synchronous shadowing protocol.
The DSAs are in proximity geographically so that they can provide similar response times for optimum interaction with LDAP clients.
All the firewalls in the configuration permit DAP connections (IDM on top of TCP/IP with port 21200) between DSA2 and DSA3.

Building the Multiple Contact DSA Configuration

Building the multiple contact DSA configuration is a simple task: For each LDAP server that supports one of the consumer DSAs, the administrators need only to remove the master DSA1 as the contact DSA and then add the name and presentation address of the two consumer DSAs to the LDAP server configuration file dirxldap.cfg file.

Consequently, the DSA2 administrator updates the DSA contact information in the dixldap.cfg file for the DSA2 LDAP server as follows:

"/CN=DirX-DSA-host2" "TS=DSA2,NA='TCP/IP_IDM!internet=123.45.67.92+port=21200',DNS='(HOST=host2,PLAINPORT=21200)'"
"/CN=DirX-dsa-host3" "TS=DSA3,NA='TCP/IP_IDM!internet=123.45.67.93+port=21200',DNS='(HOST=host3,PLAINPORT=21200)'"

The DSA3 administrator updates the DSA contact information for the DSA3 LDAP server as follows:

"/CN=DirX-DSA-host3" "TS=DSA3,NA='TCP/IP_IDM!internet=123.45.67.93+port=21200',DNS='(HOST=host3,PLAINPORT=21200)'"
"/CN=DirX-dsa-host2" "TS=DSA2,NA='TCP/IP_IDM!internet=123.45.67.92+port=21200',DNS='(HOST=host2,PLAINPORT=21200)'"

Note that the order in which each contact DSA’s information is listed should be different for each dirxldap.cfg file.

The administrators then re-start the LDAP servers for the updates to take effect.The multiple contact DSA configuration is now active.

Disabling and Enabling Contact DSAs

Recall from the section “Understanding the Multiple Contact DSA Configuration” that all contact DSAs must always be available to avoid TCP-related bind delays.The administrator for DSA2 determines that its host machine needs to be taken offline for software updates.Since contact DSAs must be removed from a multiple contact DSA configuration before they become unreachable, the DSA2 and DSA3 administrators need to use the dirxextop ldap_disable_ldap_disable_config_dsa extended operation to remove DSA2 from the contact DSA configuration information in each LDAP server’s dirxldap.cfg file.

Administrators of DSA2 and DSA3 both use the following dirxextop command to disable DSA2 from their respective LDAP server configuration:

dirxextop -D cn=admin,o=my-company -w dirx -t ldap_disable_config_dsa -P /CN=DSA2

Now DSA2 is blocked from being selected as a contact DSA by each LDAP server until the administrators explicitly re-enable it with the dirxextop extended operation ldap_enable_config_dsa or until the LDAP server is re-started.

Once the update is complete and DSA2 is back online, the administrators use dirxextop to re-enable DSA2 as a contact DSA for their LDAP servers with the following command:

dirxextop –D cn=admin,o=my-company –w dirx –t ldap_enable_config_dsa –P /CN=DSA2

Monitoring a Multiple Contact DSA Configuration

As multiple DSA selection is performed completely internally and without any impact on the actual LDAP operations, an LDAP client cannot detect it from its operation results.

For the DirX Directory administrator, there are several ways to get information about multiple DSA selection and usage; some of them may require a deeper understanding of networking: The following sections describe these methods.

Using the LDAP Exception Logs to Monitor Contact DSAs

When the LDAP server starts, its startup log contains the full list of selectable DSAs as configured in dirxldap.cfg.For example:

0 "DirX Directory V8.5 64-Bit LDAP Server running.
	OSName=Microsoft Windows 7  64-bit- Service Pack 1 (build 7601)
	HostName=host2, IP4=123.45.67.92, 123.45.67.93,
	Ldap-Port=8080, SSL-Port=636, Rpc-Port=6999, StartTLS=enabled
	Accepted SSL Protocols=SSLv3.0 TLSv1.0 TLSv1.1 TLSv1.2
	Ldap-Cfg=ldapConfiguration, Ldap-Conn-Max:555, Cache=disabled(3), SSL-CRL-Checking=OFF, PID=1788,
	UID=A412447 (0), EUID=n/a (0), CP:o=my-company
	ThreadPoolSize=32, Audit=on (level:max), SockMode=async, CtxLimit=12000, IPStack=4, BannedFilterAttr:-none-, FD_SETSIZE=8190"
          -- 0x45046b7f NOTICE   ldap_cfg mn_ldap_listener 1414
0 "LDAP-Server Up and Running...  Using the following DSAs:
	Contact-DSA:Name=/CN=DSA2, enabled=yes, fails=0,
PSAP=TS=DSA1,NA='TCP/IP_IDM!internet=1.2.3.4+port=4711',DNS='(HOST=host2,SSLPORT=21201,PLAINPORT=21200, MODE=ssl)'
	Contact-DSA:Name=/CN=DSA3, enabled=yes, fails=6,
PSAP=TS=DSA1,NA='TCP/IP_IDM!internet=1.2.3.4+port=4711',DNS='(HOST=host3,SSLPORT=21201,PLAINPORT=21200, MODE=plain)'"

The lines starting with Contact-DSA give the names and PSAPs of selectable DSAs. In this example, there are the two DSAs DSA2 and DSA3.

Whenever a new DAP bind fails, the server’s exception log indicates to which DSA the bind has failed. For example:

99 DRX_Bind(workspace: 0x000000000450AB88	session: 0x000000000450AA48
	res: ???	 do_sign: FALSE	bound_session: ???) = DRX_LOCAL_ABORT_RECEIVED(78)
session:         <ABSENT>
res: 	???
bound_session:	???
     -- 0x4504483e WARNING  api      dx_dap.cpp       1935-3257        30:58:154
99 DRX_Bind failed to selected DSA:
	DSA-Name:"/CN=DSA2"
	DSA-Addr:"TS=DSA1,NA='TCP/IP_IDM!internet=1.2.3.4+port=4711',DNS='(HOST=host2,SSLPORT=21201,PLAINPORT=21200,MODE=plain)'"

If you find entries like this one, check the corresponding DSA to make sure it’s up, reachable and running properly. If these messages appear for a longer period of time (significantly longer than just for a simple crash plus re-start time), it might be useful to disable this DSA for the lifetime of the LDAP server instance with the extended operation ldap_disable_config_dsa and then analyze the DSA problem before re-enabling it. For example, if a DSA from the selection list should be shut down for maintenance, it is a good idea to disable it first before shutting it down to avoid unnecessary bind errors and timeouts that can appear when this DSA gets selected by the round-robin algorithm.

Using netstat to Monitor Established Communications

As every DAP bind to a DSA creates a TCP connection first, you can use the netstat -an command to check to which DSA the LDAP server is connected and look for the ESTABLISHED connections. For example:

TCP    123.45.67.92:21201     123.45.67.92:55925     ESTABLISHED
TCP    123.45.67.92:21201     123.45.67.92:55926     ESTABLISHED
TCP    123.45.67.92:21201     123.45.67.92:55927     ESTABLISHED
TCP    123.45.67.92:21201     123.45.67.92:55928     ESTABLISHED
TCP    123.45.67.92:21201     123.45.67.92:55929     ESTABLISHED
TCP    123.45.67.92:21201     123.45.67.92:55934     ESTABLISHED
TCP    123.45.67.92:53726     123.45.67.93:60001      ESTABLISHED
TCP    123.45.67.92:53915     123.45.67.46:445        ESTABLISHED
TCP    123.45.67.92:53917     123.45.67.99:22        ESTABLISHED
TCP    123.45.67.92:55925     123.45.67.92:21201     ESTABLISHED
TCP    123.45.67.92:55926     123.45.67.92:21201     ESTABLISHED
TCP    123.45.67.92:55927     123.45.67.92:21201     ESTABLISHED
TCP    123.45.67.92:55928     123.45.67.92:21201     ESTABLISHED
TCP    123.45.67.92:55929     123.45.67.92:21201     ESTABLISHED
TCP    123.45.67.92:55934     123.45.67.92:21201     ESTABLISHED
TCP    123.45.67.92:56009     123.45.67.93:443        TIME_WAIT
TCP    123.45.67.92:56010     123.45.67.93:443        TIME_WAIT
TCP    123.45.67.92:56015     123.45.67.16:80         ESTABLISHED
TCP    123.45.67.92:56024     123.45.67.86:445        SYN_SENT
TCP    123.45.67.92:56025     123.45.67.16:80         ESTABLISHED
TCP    123.45.67.92:63697     123.45.67.93:60001      ESTABLISHED
TCP    123.45.67.92:63702     123.45.67.93:60000      ESTABLISHED
TCP    123.45.67.92:63710     123.45.67.86:60001     ESTABLISHED
TCP    123.45.67.92:63756     123.45.67.24:49167      ESTABLISHED
TCP    123.45.67.92:63889     123.45.67.24:49167      ESTABLISHED
TCP    123.45.67.92:63898     123.45.67.93:60001      ESTABLISHED
TCP    123.45.67.92:64072     123.45.67.15:5061       ESTABLISHED
TCP    127.0.0.1:8307         0.0.0.0:0              LISTENING
TCP    127.0.0.1:9089         0.0.0.0:0              LISTENING
TCP    127.0.0.1:54204        127.0.0.1:54205        ESTABLISHED
TCP    127.0.0.1:54205        127.0.0.1:54204        ESTABLISHED
TCP    192.168.44.1:139       0.0.0.0:0              LISTENING
TCP    192.168.163.1:139      0.0.0.0:0              LISTENING

From your configured PSAPs, you can determine by IP and port to which DSAs the LDAP server holds connections and to which it does not.

In the example, we see that only connections to ports 21201 appear but none for 21200, which indicates that something is wrong with /CN=DSA2.

Using LDAP Audit

Possibly the best and easiest way to trace the multiple contact DSA feature is to examine the LDAP audit. The header section lists the configured DSAs, which gives you an overview of the available contact DSAs (similar to the LDAP server startup message).

For example:

Contact-DSA               :Name=/CN=DSA2, enabled=yes, fails=0, PSAP=TS=DSA1,NA='TCP/IP_IDM!internet=1.2.3.4+port=4711',DNS='(HOST=host2,SSLPORT=21201,PLAINPORT=21200,MODE=ssl)'
Contact-DSA               :Name=/CN=DSA3, enabled=yes, fails=6, PSAP=TS=DSA1,NA='TCP/IP_IDM!internet=1.2.3.4+port=4711',DNS='(HOST=host3,SSLPORT=21201,PLAINPORT=21200,MODE=plain)'

If you look at the LDAP audit’s bind records, you can discover what happens during DSA selection. For example:

----------------- OPERATION 000007 ----------------
  Create Time    :Mon Apr 11 13:54:31.974791 2016
  Start Time     :Mon Apr 11 13:54:31.974831 2016
  Send End Time  :Mon Apr 11 13:54:34.735626 2016
  End Time       :Mon Apr 11 13:54:34.735660 2016
  PoolThread#    :6 (0x1454)
  OpUUID         :ba7966b1-65a6-4b68-a339-8d7cdae464e3
  DapBindId      :000e0006
  Contact-DSA    :/CN=DSA2
  Concurrency    :1
  OpStackSize    :1
  OpFlow In/Out  :0/0
  Duration       :2.760829 sec
   LDAP QTime    :0.000039 sec
   LDAP Prep Time:2.753598 sec (3 RecvCalls, 0 Wouldblocks)
   LDAP Resp Time:0.000128 sec
    LDAP Snd Time:0.000045 sec (1 SendCalls, 0 Wouldblocks)
    LDAP Enc Time:0.000026 sec
  OP Linger  Time:0.000034 sec
  API Time       :0.007102 sec
   API-Send      :0.000276 sec
   API-ICOM Wait :0.006670 sec
    IDM Time     :0.000123 sec (0 Wouldblocks)
    DSA Time     :0.006500 sec
   API-Recv      :0.000151 sec
    API-Dec      :0.000076 sec
  User           :cn=admin,o=my-company
  IP+Port+Sd     :[127.0.0.1]+56090+844
Op-Name        :LDAP_Con1_Op0
  UniqueOpID     :7
  Operation      :BIND
  Version        :3
  MessageID      :47
  Bind-Type      :simple
  Security       :normal
  DAP-Share-Count:1
  DSA-Retry-Count:1
  DSA-Retry-Dur  :2.760190 sec
  Controls #     :2
    Ctrl Type    :1.3.6.1.4.1.21008.108.63.1 (Session Tracking Control)
    Critical     :no
    SID-IP       :123.45.67.92
    SID-Name     :DirX Manager 2.3 (Build 82; 2016-01-15 13:20:23) [4604]
    SID-Oid      :1.3.6.1.4.1.21008.108.63.1.3 (Sasl-Auth-Username)
    SID-Info     :cn=admin,o=my-company
    Ctrl Type    :1.3.6.1.4.1.42.2.27.8.5.1 (Password Policy)
    Critical     :no
    Ctrl Val(len):0
  Bytes Received :223
  Bytes Returned :64
  Socket Mode    :ssl  Abandoned      :no
  Result Code    :0 (success)
  Error Message  :Bind succeeded.

In this example, the field:

Contact-DSA :/CN=DSA2

indicates the DSA that was (finally) chosen for the DAP backend bind. The fields

DSA-Retry-Count:1
DSA-Retry-Dur  :2.760190 sec

indicate how many DSAs failed to bind before final succeeding and how much time was spent to iterate through multiple DSAs to get a DAP bind (or all failed).

In this example—where /CN=DSA3 is down—we know that DSA3 was originally selected, that it failed and that the fail took 2.7 seconds before finally DSA2 succeeded. It’s easy to see that most of the total time of the bind operation was spent in DAP bind iterations.

Note that if a network is down or a host is not running or a blocking firewall is active and so on, TCP may take some time—typically around 20 seconds—to detect the situation by retransmission and timeout expiration. This action and its subsequent delay cannot be avoided, as this is a TCP procedure and not a server application issue. This means that even if a bind finally succeeds, it might take significant time to succeed from the LDAP client’s point of view.

If the first selected DSA succeeds (no further re-tries are necessary), you’ll find:

DSA-Retry-Count:0

For audit records other than bind - for example, search - you can see the DSA in the Contact-DSA field. For example:

OPERATION 000008 ----------------
  Create Time    :Mon Apr 11 13:54:34.744390 2016
  Start Time     :Mon Apr 11 13:54:34.744466 2016
  Send End Time  :Mon Apr 11 13:54:34.746446 2016
  End Time       :Mon Apr 11 13:54:34.746456 2016
  PoolThread#    :7 (0xf8)
  OpUUID         :d91a1a59-3466-4f9e-889b-01feb16db6d2
  DapBindId      :000e0006
  Contact-DSA    :/CN=DSA2
  Concurrency    :1
  OpStackSize    :1
  OpFlow In/Out  :0/0
  Duration       :0.001990 sec
   LDAP QTime    :0.000076 sec
   LDAP Prep Time:0.000428 sec (3 RecvCalls, 0 Wouldblocks)
   LDAP Resp Time:0.000248 sec
    LDAP Snd Time:0.000058 sec (2 SendCalls, 0 Wouldblocks)
    LDAP Enc Time:0.000046 sec
  OP Linger  Time:0.000010 sec
  API Time       :0.001312 sec
   API-Send      :0.000065 sec
   API-ICOM Wait :0.001053 sec
    IDM Time     :0.000079 sec (0 Wouldblocks)
    DSA Time     :0.000929 sec
   API-Recv      :0.000188 sec
    API-Dec      :0.000174 sec
  User           :cn=admin,o=my-company
  IP+Port+Sd     :[127.0.0.1]+56090+844
  Op-Name        :LDAP_Con1_Op1
  UniqueOpID     :8
  Operation      :SEARCH
  Version        :3
  MessageID      :48
  Base Obj       :(ldapRoot)
  Scope          :baselevel
  FilterLen      :18 (limit:1000)
  Filter         :(objectclass=PRES)
  Size Limit     :1000
  Time Limit     :0
  Deref Alias    :never
  Types Only     :no
  Req Attr #     :11
    Req Attr     :* (all user attributes)
    Req Attr     :namingContexts
    Req Attr     :altServer
    Req Attr     :supportedExtension
    Req Attr     :supportedControl
    Req Attr     :supportedSASLMechanisms
    Req Attr     :supportedLDAPVersion
    Req Attr     :subschemaSubentry
    Req Attr     :supportedFeatures
    Req Attr     :vendorName
    Req Attr     :vendorVersion
  Found Entries  :1
  Found Attrs    :11
  Found Values   :19
  Op  Ctx Size   :147456 Bytes
  API Ctx Size   :81920 Bytes
  All Ctx Size   :55 MB
  Controls #     :1
    Ctrl Type    :1.3.6.1.4.1.21008.108.63.1 (Session Tracking Control)
    Critical     :no
    SID-IP       :123.45.67.92
    SID-Name     :DirX Manager 2.3 (Build 82; 2016-01-15 13:20:23) [4604]
    SID-Oid      :1.3.6.1.4.1.21008.108.63.1.3 (Sasl-Auth-Username)
    SID-Info     :cn=admin,o=my-company
  Bytes Received :382
  Bytes Returned :693
  Socket Mode    :plain
  Cached Result  :no
  Abandoned      :no
  Result Code    :0 (success)
  Error Message  :Search succeeded. Found 1 Entries (0 Aliases), 11 Attributes, 19 Values. (ChainedResult=no)

Using DirX Directory Extended Operations

You can use the dirxextop ldap_show_config_dsas extended operation to monitor which DSAs are configured, which DSA is to be selected next, which DSAs are currently enabled and how many failures occurred when DAP binds were performed. The description of the dirxextop command in the DirX Directory Administration Reference provides command syntax and usage for the operation.

Here is example output returned by the operation:

List of configured Contact-DSAs for LDAP server on 'host2' at Tue May 17 14:21:26.335769
===============================================================================
 DSA-Name:/CN=DSA2
    Status      :enabled
    PSAP        :TS=DSA1,NA='TCP/IP_IDM!internet=1.2.3.4+port=4711',DNS='(HOST=host1,SSLPORT=21201,PLAINPORT=21200,MODE=ssl)'
    BindFails   :2
    PermDisables:0
    TempDisables:2  (Last: Tue May 17 14:21:15.753710)
    ReEnables   :1
    Selections  :5
-------------------------------------------------------------------------------

(*) == DSA to be selected for next Backend-Bind

Mixing Single and Multi-Contact DSA Configurations

Recall from the section “Setting up Multiple LDAP Servers” in the chapter “Extending the DirX Directory Service” that multiple LDAP servers can be set up on a single machine to handle the requirements of particular LDAP clients.These additional LDAP servers can be configured to read their contact DSA information from their own LDAP server configuration files, allowing each additional LDAP server to use either a multiple contact DSA configuration or a single contact DSA configuration depending on its requirements.

When an LDAP client in a multiple contact DSA configuration needs to receive its data from a dedicated DSA and the primary LDAP server in this configuration is set up to use multiple contact DSAs, an additional LDAP server needs to be configured that uses the dedicated DSA as its single contact DSA and reads its contact information about this DSA from a separate LDAP configuration files that is specific to this additional LDAP server.

For example, suppose the administrator of DSA2 has set up the Nagios monitoring environment with the intention of evaluating DSA2’s performance over time.The primary LDAP server – the first server to be set up – uses the multiple contact DSA configuration because it’s fielding calls from LDAP clients for data-centric operations on DSA2’s DIT.To support the Nagios environment, the DSA2 administrator now needs to set up an additional LDAP server to handle Nagios communication to and from DSA2. This LDAP server’s configuration needs to specify DSA2 as the only contact DSA to ensure that the performance information returned by calls from Nagios plugins is always coming from this particular DSA.

To set up the additional Nagios-specific LDAP server, the DSA2 administrator must perform the following tasks:

Set up the additional LDAP server to be dedicated to handling Nagios communications
Set up a specific LDAP configuration file for this dedicated server
Set the DirX Directory environment variable that permits LDAP servers to read their own LDAP configuration files
Re-start the DirX Directory service

Creating the Additional LDAP Server

To create the additional LDAP server, the DSA2 administrator follows the procedure described in the section “Setting up Multiple LDAP Servers” in the chapter “Extending the DirX Directory Service” in this guide. The common name that the DSA2 administrator uses for this additional LDAP server’s configuration subentry is dsa2ldapConfig2.

Creating the Additional Server’s LDAP Configuration File

Next, the DSA2 administrator creates an individual LDAP configuration file for the additional LDAP server that specifies DSA2 as the only contact DSA. The file name is in the format:

dirxldap.cfg[.subentry_name]

where subentry_name is the common name of the LDAP server’s configuration subentry that corresponds to the new LDAP server. In this scenario, it is dsa2ldapConfig2:

dirxldap.cfg.dsa2ldapConfig2

Setting the Environment Variable

To enable the new LDAP server to read its own LDAP configuration file instead of the common dirxldap.cfg file, the DSA2 administrator must set the DirX Directory environment variable DIRX_LDAP_USE_SEPARATE_CLCFG_FILE before re-starting the DirX Directory service:

DIRX_LDAP_USE_SEPARATE_CLCFG_FILE=1

The additional Nagios-specific LDAP server will read the configuration file dsa2ldapConfig2 when it starts up again.

Re-starting the DirX Directory Service

Now the DSA2 administrator can re-start the DirX Directory service as described in the section “Setting up Multiple LDAP Servers” in the chapter “Extending the DirX Directory Service”.