Distributed Cache and Performance

This section describes the influence of the distributed cache employed by DirX Access on overall system performance. The section lists the configuration parameters that enable approximating the performance and reaching a target state. The cache is built on the Apache Ignite platform (version 2.6.0). Consequently, for advanced administration, we recommend that system administrators read at least the section “Production Readiness” (Preparing for Production) of the Apache Ignite documentation. In this section, for each of the DirX Access’s configuration parameter, we reference its relationship to the corresponding configuration parameter in Ignite.

We focus on three aspects of overall system performance: computational performance, communication performance, and memory consumption. The administrator needs to consider other aspects to achieve the desired balance between efficiency, security, and reliability.

Cache Mode

DirX Access employs a distributed cache and messaging system. These mechanisms are used to:

  • Speed up access to data

  • Distribute data among services instances

  • Provide system-wide listening to predefined actions.

The goals on which each cache focuses determine the cache parameters to be applied to the cache mode parameter. DirX Access enables a cache to operate in two different modes: partitioned and replicated. Most of the caches have their mode predefined; however, several caches allow a choice.

Replicated Cache Mode

A replicated cache mode enforces each cache entry to exist in the same state (to be replicated) in all the DirX Access Services instances. A PRIMARY_SYNC write synchronization mode is used; that is, the cache waits for write/commit to complete on a primary DirX Access Services server but not for backups to be updated. For n cache records and m Services servers, this has the following implications:

  • Memory - each of the m local caches contains n records.

  • Communication - the communication of each cache operation grows linearly with m (for example, the update action of a single record must be sent at least m times).

  • Performance - any operation on all records grows linearly with n without any influence of m.

Caches

In DirX Access, caches configured in replicated mode by default mainly satisfy the goal of speeding up access to data. These include caches containing configuration data and XACML and UMA policies.

Partitioned Cache Mode

In partitioned cache mode, each cache record is stored once (in a single Services server) as a primary copy with a configurable number of backup copies on other machines (PRIMARY_SYNC write synchronization mode is used - see the “Replicated Cache Mode” section). For n cache records, m Services servers, and b number of backups, this has the following implications:

  • Memory - each of the m local caches is expected to contain (1 + b) * n / m records.

  • Communication - the communication of each cache operation is static with respect to m and grows linearly with b.

  • Performance - operations on locally-stored cache records grow with n / m and operations on all the records are distributed between all the services servers, hence, the communication aspect is added.

Caches

In DirX Access, caches configured in partitioned mode by default predominantly fill the role of session memory. By “sessions”, we mean the single sign-on sessions, SAML, and OAuth data and many similar other types. These sessions are typically valid for a certain period. As a result, their removal is time-based, not space-based and the “Cache max size” parameter of the “Server” configuration object must be set up accordingly to prevent an out-of-memory state.

Number of Cache Backups

As seen from the previous section, the number of cache backups substantially influences the necessary memory space and system communication. While having a small number of backups is desirable, so is system stability: the more backups required, the higher probability that the original system state (for example, all SSO sessions) will be preserved even in the case of a system failure (Services instance malfunction or network segmentation). In such a case, the backup records are redistributed between the existing nodes, and new backups are created.

The original system state is preserved if all the services instances forming the cluster group after the failure contain all the entries, either as a primary entry, or in a form of backup. The main rule states that the number of backups determines the number of services instances that can be disconnected from the cluster group without any loss.

Number of Cache Threads

The “Number of cache threads for each cache thread type” parameter of the “Server” configuration object determines the number of threads created for Apache Ignite. If the parameter is set to k, the numbers are:

  • 2*k for system pool, utility cache pool, data streamer cache pool, striped pool, and IGFS messages pool,

  • k for public pool, query pool, service pool, and asynchronous callback pool.

Details about these pools can be found at: Ignite Configuration (Ignite 2.16.0) .

Pre-set Ignite Parameters

DirX Access internally sets up several Apache Ignite parameters that have an effect on the overall system behavior. These parameters are mainly:

  • Failure detection timeout = 3s - determines how long a cluster node should wait before considering a remote connection, with another node, failed.

  • Message queue = 4096 – defines the message queue limit for incoming and outgoing messages.

SSL/TLS Communication

DirX Access configuration enables switching between the plain and protected communication of the distributed caches. See the “Keystores” appendix of this guide for details about the crypto-material used for the protection.

Note that a protected configuration is typically slower due to the additional operation necessary to perform. An administrator must consider whether the SSL/TLS communication brings an additional security factor in each respective system setup or whether it is redundant and thus can be switched off.

Garbage Collector

The DirX Access external log may contain a log entry: “Possible too long JVM pause”. This entry comes from Apache Ignite engine and is typically connected to the memory and network settings. Sometimes it is not necessary to improve these two aspects - the system configuration can be used instead - specifically, the properties of the Java Garbage Collector. If the administrator observes any connected issues, we recommend following the instructions at link:https://apacheignite.readme.io/docs/jvm-and-system-tuning and possibly discussing the results with the vendor[Garbage Collection Tuning].