Operation

This chapter provides information about DLP server operation in the areas of server process startup, connect timeout, off-line handling and server retry, round-robin selection and failover and character set handling and provides a general workflow example that shows how server selection works and how failover is handled for a user-routing rule.

LDAP Server Process Startup for DLP

When the LDAP server process (dirxldapv3) starts, it reads the ldapProxyMode attribute from its corresponding LDAP configuration subentry.The subentry name is specified on the command line with the -n option; if it is not specified, the server process uses the subentry name ldapConfiguration.

When the ldapProxyMode attribute is set to 1 or 2, the server process runs as a DLP server.All well-known settings from the LDAP configuration subentry continue to apply as long as they refer to the client side (for example, LDAP port, number of pool threads, maximum number of client connections).Settings that affect the DAP backend (for example, unbind delay time, DAP share count) are ignored.

Note the following about LDAP server process startup as a DLP:

  • If the ldapProxyMode attribute is not present in the LDAP configuration subentry, the LDAP server continues to run as a plain LDAP server.

  • If the server process starts with a proxy mode >=1 but it cannot locate the DLP server configuration file, it starts in plain LDAP server mode.

  • If the server process starts with a proxy mode >=1 but the DLP server configuration file contains syntax errors, the LDAP server will not start until the configuration file is correctly specified.

  • If the server process starts with a proxy mode >=1 but the LDAP configuration subentry name specified in the -n option (or the default name ldapConfiguration if the -n option is not specified) does not match any of the LdapProxy object names defined in the DLP server configuration file, it starts in plain LDAP server mode. Therefore, it is important to note that the LDAP server process must find 1) a proxy mode >= 1 in the LDAP configuration subentry and 2) a matching LdapProxy object in the DLP server configuration file in order to establish itself as a working DLP server.

Connect Timeout

Before the DLP server can forward a client request to a selected target server, it must perform a TCP-connect operation (possibly followed by an SSL-connect()).The connection is usually established rather quickly, but timeout effects may occur if the peer is unavailable or unreachable.A normal successful TCP connect consists of a three-way handshake between the initiator and the responder (we assume here that all IP addresses are properly configured and routable):

State Initiator (DLP Server) Direction Responder (LDAP Server)

1

send SYN

recv SYN

2

recv SYN-ACK

send SYN-ACK

3

send ACK

recv ACK

The connect timeout typically appears at state 2 when the initiator waits for the SYN-ACK from the peer. The amount of time spent in state 2 depends heavily on the state of the responder:

  1. Responder-host is running, application listens on port

  2. Responder-host is running, no application listens on port

  3. Responder-host is running, application listens but firewall blocks initiator IP

  4. Responder host is down

  5. Network to responder is broken

For cases a. and b., an immediate response is sent from responder to initiator.

For case c., it depends on the firewall whether or not it will create a RST (reset) to reject the connection.If it doesn’t, the SYN is un-answered and times out on the initiator side.

For cases d. and e., there is no peer TCP that could respond with error and so the initial SYN packet is lost and is never answered.After some time, the initiator TCP starts a re-transmission of the SYN packet, likely with the same unanswered result.Usually the re-transmission occurs N times (a TCP parameter) and the time the sender is willing to wait is doubled.On the Windows platform, N is 3 and the initial wait is 3 seconds, which results in 3 + 6 + 12 = 21 second timeout for cases c., d. and e.

The DLP server contacts the target servers sequentially one after the other.Therefore, if a rule contains M servers, each try might take up to 21 seconds to detect that the target is down before it tries the next server from the list.You can use the ConnectTimeout value in the Defaults object definition to help reduce the time it takes to detect a server outage during the TCP-connect().By default, the DLP server uses 3 seconds - which is the first retransmission timeout on Windows - to speed up detection of cases c., d. and e.We recommend leaving the default in place until TCP analysis has shown that some other value helps to overcome connection problems.Never set ConnectTimeout to 0 as it may render the DLP server completely unable to connect.

Note that there is no timeout for normal I/O read/write operations because introducing one imposes the requirement of determining the worst-case runtime of any of the connected target servers for all possible legal searches.For example, when a search request is sent out, what is the right amount of time to wait before returning with a timeout?Choosing the right timeout requires the ability to predict the maximum search runtime that can occur with any possible search.This is almost impossible to calculate, as some simple searches are fast but more complicated searches can last a long time.

Offline Handling and Server Retry

When the DLP server forwards a request to a target LDAP server, it may detect a network failure.There are basically two incidents for such a failure:

  • Failure detected while processing the TCP connect() while establishing a connection to the target server

  • Failure detected during read/write I/O while the connection was already established

Both detections lead the DLP server to mark the target server as OFFLINE.

If more than one target server is configured and the rule allows for failover, the DLP server continues to try to process the request to the next target server.This action continues until the request can be processed or all until servers fail.Switching to another target server is transparent to the client: the DLP server automatically re-establishes the LDAP connection with the credentials that existed at failure time if the error occurs while a connection was established at error time.

If a target server is marked OFFLINE, the DLP server will not select it for any further operations for a certain amount of time, no matter for which rule the server was configured. Even if a server outage was detected by user X, the target server will not be selectable for any other user Y after the detection.

The selection of possible target servers for a user occurs at the time of the first operation on the corresponding client LDAP connection; usually this operation is a bind, but LDAPv3 allows starting with any operation, in which case the anonymous user is then assumed.As a result, if a target server is marked as OFFLINE due to failure detection, it does not affect the target servers that are already selected for other user connections that existed at failure time.Users that establish a new LDAP connection will not have servers marked as OFFLINE as their target server choice.

+ You can use the OfflineRetryTimeout key in the Defaults object definition to control the duration for which an OFFLINE server is not selectable; after this amount of time has passed, the server is selectable again for a retry for new users.The default is 60 seconds

Please note that there is a difference between “selectable” and a real retry.

When a target server is selectable, it means that the DLP server adds it to the list of possible servers configured by the corresponding rule.If the target server is not the rule’s primary server (the first one in the list) it’s possible that it may never be contacted.Thus, being selectable does not necessarily imply that an actual retry occurs.Therefore, the server remains selectable until a real retry returns another error, in which case it is marked as OFFLINE again and will not be selectable again for the configured retry time.If a target server is retried successfully after an offline-retry timeout, it remains selectable.

Be careful not to set OfflineRetryTime too high.As we have seen, if a server cannot be reached and is marked as OFFLINE, a server is not retried until the OfflineRetryTime expires.Thus, if all relevant servers cannot be reached, they will all stay in the OFFLINE state for at least OfflineRetryTime and will therefore not be selected even if the server is physically up before the OfflineRetryTime expires.If you set this time to a high value – for example, 5 minutes – for users that only have these servers configured for their use, further operation is not be possible until the OfflineRetryTime expires.On the other hand, if you set OfflineRetryTime to a very low value – for example, 1 second – a lot of operations will retry these servers very frequently and may experience a significant TCP timeout for their operation duration if the server is down for a long time.

We recommend setting OfflineRetryTime in the range of 30-60 seconds to establish a good balance between not frequently calling servers that are down and having a reasonable early detection timeframe once they are up again.

Round-Robin Selection and Failover

If round-robin (RR) selection is enabled for a rule and failover (FO) is also enabled, they are both applied separately, which may lead to unexpected target selection for subsequent binds.

Let’s illustrate this concept with the following example:

  • User A has a rule where both RR and FO are enabled.

  • User A uses the target servers L1, L2 and L3.

  • The target server L3 is down.

  • The time between each bind is longer than the OfflineRetryTime setting.

On user A’s first bind, the selected target servers are L1, L2, L3. The first bind goes to L1 and succeeds.When user A makes a second bind later on, L2 is selected (due to RR), resulting in a successful bind.After some time, user A issues a third bind; L3 is selected (RR) and fails.Because FO is active, the next server L1 is selected and succeeds.

What happens when user A performs a fourth bind?Which server is contacted?It is server L1, because RR and FO are treated separately; that is, a failover selection (next server) will not change the next selection for the RR algorithm.Therefore, as RR selected L3 for bind #3 (which failed and was shifted to L1 by FO), the RR algorithm will select the next after L3, which is L1.

This behavior may look strange, as L1 has received the last two binds from user A although RR is active, but it is due to the fact that FO simply selects the next server from the current failing server, and RR simply selects the next server in line after the last RR selection.

Character Set Handling

Although proxy rule tokens, keys and components are plain ASCII strings, there may be some assignment values within a proxy rule condition or action that contain special Latin-1 characters like ä, ö, ü, and so on (for example, for DNs).In order to match a condition or action containing these Latin-1 characters properly to the UTF-8 character values contained in LDAP, it is necessary to identify to the DLP server which format the DLP server configuration file uses.There are currently two choices defined in the Defaults object definition of the DLP server configuration file:

"JSONCodeSet" : 0 // declares the configuration file to be a Latin-1 content file
"JSONCodeSet" : 1 // declares the configuration file to be a UTF-8 content file

If a value of 0 is specified, the DLP server interprets all condition and action strings contained in all ProxyRule objects as Latin-1 characters and performs an implicit conversion to UTF-8 before storing them to its internal configuration.This option might be useful if your JSON text editor does not support storing Latin-1 characters like ö as their multi-byte UTF-8 char representations.

A value of 1 indicates that all condition and action strings in the DLP server configuration file are encoded in UTF-8 format and do not require conversion.By default, UTF-8 is assumed (JSONCodeSet = 1).

General Operation Forwarding Example

This example describes internal operation when a user X performs a sequence of LDAP operations bindsearchunbindbindsearchunbind on a plain (non SSL) connection against the DLP server.We assume that the primary server LDAP3 is down and unavailable.Servers LDAP1 and LDAP2 are up and available.We further assume that the DLP server has not yet contacted LDAP3 and so is not aware that it is down.We also assume that user X has an explicit rule of the form:

{
   "object"      : "ProxyRule",
   "ruleType"    : "UserRouting",
   "name"        : "USERROUTING2",
   "condition"   : "(user=cn=admin,o=pqr)",
   "actions"     : [ "forwardto(LDAP3,LDAP1)" ],
   "loadbalance" : 0,
   "failover"    : 1
}

before he can send out the bind PDU, he must establish a TCP connection. During TCP connection establishment, the DLP server recognizes a new LDAP connection and creates an internal object called LdapConnection. As no authentication happened so far (remember that the bind PDU has not yet been sent), the DLP server assigns the ‘anonymous’ user to this new connection and waits for further data to arrive.

Once the TCP connection is established, the client can now send the LDAP bind PDU containing the credentials (user+pwd) for this connection. The DLP server receives this PDU, detects that it is a bind PDU and extracts the user name (DN) from it.

Next, the DLP server reads the configured rule sets and checks to see if there is a rule for user X. It finds the user rule and then builds a list of possible target servers by reading the forwardto servers from the rule and checking whether or not a round-robin selection should be used on the list. Because the rule does not supply the loadbalance key and it is a specific-user rule, round-robin selection is not performed and the first server in the list (LDAP3) becomes the primary server. As LDAP3 has never been contacted, the DLP server assumes that it is available and selects it as the target server for user X.

Now the DLP server opens a TCP connection to LDAP3 and detects the outage (this may take a few seconds) and marks LDAP3 as OFFLINE. Next, it checks whether the user rule allows failover and whether there are other servers configured for failover. Both conditions are true. Thus, LDAP2 becomes the new primary target server. The DLP server performs a TCP connect against LDAP2 which now succeeds.

Next, the DLP server forwards the bind to this target server and receives the bind result PDU indicating a successful bind. The received PDU is now sent back to the client.

Once the client receives the bind success, it issues the search by sending the search PDU out to the DLP server. The DLP server receives the PDU and recognizes a search. For the DLP server, this means that no target server selection is necessary as there is already a working connection to LDAP2. Thus the DLP server simply sends out the search PDU to LDAP2 and receives the search result. After receiving the result, the DLP server returns the result to the client.

After the client has received the search result, it invokes the unbind operation. The unbind PDU is sent to the DLP server and is received. Again the DLP server knows that the target server is LDAP2 and sends out the unbind operation. As unbind is not a confirmed operation, the DLP server does not need to wait for a response and closes the TCP connection to LDAP2. It also closes the frontend connection to the client and destroys the internal LdapConnection object. As a result, the DLP server no longer has any frontend or backend connection and is back to the state it was in before the first bind.

Now the client sends the second bind. The DLP server establishes a TCP connection, creates a new LdapConnection object and assigns the anonymous user. The client sends the second bind PDU which is received by the DLP server. The DLP server extracts user X from the bind and performs a lookup against the rules. It again finds a user rule without load balancing where the primary server is LDAP3. However, as it has detected that LDAP3 is OFFLINE, the DLP server ignores it, selects LDAP2 as the primary server, establishes a new TCP connection and sends the bind to LDAP2 again. The successful bind response is received and returned to the client. The remaining operations search and unbind operations work exactly as for the first sequence.