Cache Server

Overview

The Cache Server is a component that has been historically embedded within the DirX Access Server. The real-life scenarios have shown that the overall system benefits from this component being carved out of the DirX Access Server, leaving both sides focus on a smaller space of features - the DirX Access Server is less oriented on the infrastructural changes and lifecycle and more on the actual business services, while the Cache Server is oriented purely on providing following services to the DirX Access Server:

Information synchronization across all DirX Access Servers in a cluster
Messaging mechanism between the DirX Access Servers in a cluster
Long-term records persisting

The Cache Server component has been also introduced to bring improvements in following areas:

Overall system robustness and stability
Simpler patching and upgrade process

The Cache Servers do not fully replicate the records across all the servers, rather, a partitioning algorithm is employed, that computes where to store the primary and backup records. For more information, please, see the documentation of the Apache Ignite component.

High-level Architecture

This section explains the aspects of the high-level architecture corresponding to the Cache Server component. To learn their implications and recommendations for the production deployments, please, see the DirX Access Blueprint page.

Components

This section lists the main system components.

DXA Cache Client

The Cache Client is an instance of Apache Ignite Thick Client running within the DirX Access Server. It enables to manage records at the Cache Servers via the communication channels. There are no cache or persisted records stored in this component.

DXA Cache Server

The core of the Cache Server is Apache Ignite Server, extended with configuration interface and tailored to fit the purposes of DirX Access system.

DXA Cache Server Persistence

The Cache Server Persistence is a file-system-based storage within the working directory of the Cache Server. It enables to persist states across long time periods and server restarts.

Protection of this storage has to be seriously considered as it can contain security tokens and breach of it might lead to serious security violations. In current version, DXA doesn’t encrypt this storage.

Figure 1. HLA Cache server

Relationships

Figure 1 depicts all reasonable relationships between the Cache Server component and other system components. Description follows.

Note: Contrary to older versions of DirX Access, the DirX Access Server components are not interconnected between each other. This makes them more resilient to network malfunctions.

Cluster of DXA Cache Servers

A cluster of Cache Servers is a set of Cache Servers configured to communicate with each other and to represent a single (distributed) storage of information. The ring topology is used and the servers are interconnected via the discovery and communication channels.

DXA Cache Client - DXA Cache Server

Communication channels between Cache Clients and Cache Servers can be configured in an arbitrary way with implications on, predominantly, high-availability features. The underlying mechanism is described in the Routing section.

One-to-one

The typical deployment is represented by a one Cache Client communicating to one Cache Server. This shall be reflected on the physical level in the way that the interconnected components have faster communication channels. In Figure 1, this relationship is between DXA Cache Client 1 and DXA Cache Server 1.

One-to-many

In Figure 1, DXA Cache Client 2 connected to DXA Cache Server 2 and DXA Cache Server 3 represent this relationship. This shall be considered in scenarios, where performance of business services is fully satisfied by lower number of DXA Servers (e.g., one in each data center), while having higher number of (persisted) copies is necessary to face the possibility of malfunction at the side of the Cache Servers.

Example scenario: There are two data centers, in each one DXA Server (in the case of malfunction of one of these, the second one can take over). The communication between the data centers is slower, hence, having a backup of a record stored at Cache Server in one data center in a Cache Server in another data center is impossible.

Many-to-one

In Figure 1, DXA Cache Client 2 and DXA Cache Client 3 connected both to DXA Cache Server 3 represent this relationship. This shall be considered in scenarios, where performance of business services is more demanding, while a single Cache Server can serve multiple DXA Servers.

Example scenario: There are two data centers with fast enough interconnection to have just two Cache Servers overall (providing backup to each other). Services provided by DXA Servers don’t have high demand of the records stored in Cache Servers, however, are very demanding when it comes to operational power (e.g., high amount of cryptographic operations). Multiplicating the DXA Server while preserving the low count of Cache Servers can be a good idea in this case.

DXA Cache Server - DXA Cache Server Persistence

Each Cache Server has in its collocated working directory a storage for persisted records.

Installation

Installation of DirX Access creates a template folder for DXA Cache Server in {installation_directory}/Services/templates/cacheServer. To install new Cache Server, following actions have to be performed:

Copy and rename the {installation_directory}/Services/templates/cacheServer folder into the desired place that will represent the working directory of the new Cache Server instance. (Furthermore, we will name this new folder CacheServer).
Update the CacheServer/etc/cacheServer.properties file with appropriate configuration values (see Configuration section).
Optionally, upload any keystore referenced from the CacheServer/etc/cacheServer.properties configuration file into the working directory.
Optionally, update the CacheServer/etc/log4j2.xml file to provide the wanted logging output.
Optionally, update the cacheServer.properties files of other Cache Servers from the same cluster. This concerns mainly the cache.servers configuration parameter. For more information, see the Configuration and Cache Server Cluster Lifecycle sections.
Optionally, change the Java memory sizes in the CacheServer/bin/setenv.bat file.
Depending on the operating system:
1. Windows OS:
  1. Install the Windows service by executing the command CacheServer/bin/serviceInstall.bat %SERVICE_NAME% %DISPLAY_NAME% %SERVICE_USER% %SERVICE_PASSWORD%, where:
    
    %SERVICE_NAME% is the windows service identifier,
    
    %DISPLAY_NAME% is the windows service display name,
    
    %SERVICE_USER% specifies the name of the account under which the service should run (in the case the local system user shall be used, use LocalSystem and no value for %SERVICE_PASSWORD%), and
    
    %SERVICE_PASSWORD% is the password for the service user account (set by --ServiceUser parameter).
  2. You can run/stop/restart the Cache Server component as the Windows service.
2. Linux OS: Run the Cache Server component via the CacheServer/bin/cacheServer.bat file.

Configuration

To configure successfully the Cache Server, some actions have to be performed at both, the Cache Server and corresponding DXA Server.

Cache Server Configuration

The cacheServer.properties contains the main configuration parameters of the Cache Server. This file is periodically reloaded according to the config.reload.period parameter with all changes since the last load being immediately applied on the server. The application may or may not lead to a restart of the actual embedded Ignite Cache Server instance, depending on the configuration being changed.

The initial property values are referencing the defaults used without those properties being explicitly configured. If left empty, the default value is not universally predictable.

The bold properties are required to be filled in by the administrator, otherwise a default is used or corresponding feature is not employed.

Property Default Description

Property	Default	Description
home.dir	local directory	Directory in the file system where the Cache Server will store its persistent and temporary files. The Cache Server process has to have reading and writing access to this directory.
config.reload.period	300000	Period with which this file is read and applied on the running Cache Server, in milliseconds.Change of this value is immediately propagated on the next period durationIf set to 0, the configuration file is never reloaded.
failure.detection.timeout	10000	The timeout of failure detection in milliseconds. The system thread worker blockage is configured to be detected after twice this time.
log4j.configuration.file	etc/log4j2.xml	Path to the Log4J2 configuration file.
server.address	localhost	The address at which the Cache Server opens its ports for communication with the other nodes.Has to be the same as the address used for this Cache Server in the `cache.servers` parameter (of the other nodes).Is part of the consistent identifier of this server and as such is also used as a reference by the DXA Server.
discovery.port	31118	The port at which the discovery mechanism runs.It is also part of the consistent identifier.
discovery.port.range	1	The range parameter determines how many ports will be tried if the initial port not available (including).
communication.port	31119	The port at which the communication mechanism runs.
communication.port.range	1	The range parameter determines how many ports will be tried if the initial port not available (including).
keystore.file		Path to the keystore file - currently supporting PKCS12 keystores only.If left empty, the communication will operate via non-secure protocol. Otherwise, the keystore will be used.
keystore.password		Password to configured keystore.
truststore.file		Path to the truststore file - currently supporting PKCS12 truststores only.If left empty, the `keystore.file` is used as the truststore.
truststore.password		Password to configured truststore.
data.center	dc1	Identifier of the data center this Cache Server belongs to. The topology influences the routing and persistence algorithms.
cache.servers		A comma-delimited baseline topology with servers identified by the servers address and discovery port.It is important to keep the values of this parameter in sync across all the Cache Servers in the same cluster.
data.region.default.initial.size	104857600	Initial size of the in-memory data region for the caches assigned to the default data region, in bytes.
data.region.default.max.size	2147483648	Maximal size of the in-memory data region for the caches assigned to the default data region, in bytes.
data.region.evicted.initial.size	10485760	Initial size of the in-memory data region for the caches assigned to the data region supporting eviction, in bytes.
data.region.evicted.max.size	268435456	Maximal size of the in-memory data region for the caches assigned to the data region supporting eviction, in bytes.

home.dir

local directory

Directory in the file system where the Cache Server will store its persistent and temporary files. The Cache Server process has to have reading and writing access to this directory.

config.reload.period

300000

Period with which this file is read and applied on the running Cache Server, in milliseconds.Change of this value is immediately propagated on the next period durationIf set to 0, the configuration file is never reloaded.

failure.detection.timeout

10000

The timeout of failure detection in milliseconds. The system thread worker blockage is configured to be detected after twice this time.

log4j.configuration.file

etc/log4j2.xml

Path to the Log4J2 configuration file.

server.address

localhost

The address at which the Cache Server opens its ports for communication with the other nodes.Has to be the same as the address used for this Cache Server in the cache.servers parameter (of the other nodes).Is part of the consistent identifier of this server and as such is also used as a reference by the DXA Server.

discovery.port

31118

The port at which the discovery mechanism runs.It is also part of the consistent identifier.

discovery.port.range

The range parameter determines how many ports will be tried if the initial port not available (including).

communication.port

31119

The port at which the communication mechanism runs.

communication.port.range

The range parameter determines how many ports will be tried if the initial port not available (including).

keystore.file

Path to the keystore file - currently supporting PKCS12 keystores only.If left empty, the communication will operate via non-secure protocol. Otherwise, the keystore will be used.

keystore.password

Password to configured keystore.

truststore.file

Path to the truststore file - currently supporting PKCS12 truststores only.If left empty, the keystore.file is used as the truststore.

truststore.password

Password to configured truststore.

data.center

dc1

Identifier of the data center this Cache Server belongs to. The topology influences the routing and persistence algorithms.

cache.servers

A comma-delimited baseline topology with servers identified by the servers address and discovery port.It is important to keep the values of this parameter in sync across all the Cache Servers in the same cluster.

data.region.default.initial.size

104857600

Initial size of the in-memory data region for the caches assigned to the default data region, in bytes.

data.region.default.max.size

2147483648

Maximal size of the in-memory data region for the caches assigned to the default data region, in bytes.

data.region.evicted.initial.size

10485760

Initial size of the in-memory data region for the caches assigned to the data region supporting eviction, in bytes.

data.region.evicted.max.size

268435456

Maximal size of the in-memory data region for the caches assigned to the data region supporting eviction, in bytes.

DXA Server Configuration

The DXA Server configuration parameters are described into detail in the Administration Guide. In this section, we provide additional information corresponding to the configuration of the Cache Server.

Cluster

The Cluster configuration object configures the whole cluster of DXA Servers with the same parameters. The Cache-related parameters occur in the Cache section of the Cluster configuration object. The following ones have a specific relationship to the configuration of Cache Server

Use SSL, Keystore Identifier and Truststore Identifier have to be configured according to correspond to the cryptomaterial configured at the Cache Servers.
The Cache Servers parameter has to contain a map of Cache Server consistent identifiers ({server.address}:{discovery.port}) and corresponding data center to which that Cache Server belong ({data.center}). This provides the DXA Servers the full topology of the cluster of the Cache Servers and together with other configuration enables to compute properly the routing mechanism.
Cross data center backup parameter: if true, the records distribution algorithm guarantees to assign at least one backup copy (if configured) to a cache server residing in a different data center than the primary created record (the primary record is stored in the cache server that is primary to the DXA Server creating the record). Use of this feature has to be configured according to the with several aspects in mind: performance of the communication between the data centers, high-availability in the “data center down” scenario, etc.

Server

The Server configuration object configures parameters specific to given server. Regarding the Cache Server, there is a single important configuration parameter:

Primary Servers is a list of cache servers ({server.address}:{discovery.port}) at which this DXA Server can store the cache records it creates (primary records). For more information, please, see the Routing section.

Cache Servers Cluster Lifecycle

The Cache Servers Cluster can be exposed to the following main actions:

Cache Server addition
Cache Server removal
Temporary downtime of Cache Server
Split of the cluster

To understand, how Cache Servers behave upon these actions, it is important first to discuss following terms:

Partitioning — Each cache is divided into multiple partitions (1024). Each cache record is assigned to one partition. In the caches that are not fully replicated, but partitioned, each Cache Server contains only subset of all partitions (primary and backup partitions), hence, only a subset of all records. By this, the cache size and necessary communication is decreased. Employing a clever routing mechanism additionally means no increase in the communication when requesting given records by the DXA Server.
Rebalancing — When the number of Cache Servers changes, some partitions will lose their primary server and one of the backups. At this moment, the rebalancing mechanism takes place. The assignment of partitions to the servers is changed in order for each partition to have primary server and a preconfigured number of backups. According to the content of the caches, the rebalancing can be very costly in both communication and computational aspects.
Baseline — To face the expensiveness of the rebalancing process, the cluster of Cache Servers have its current topology (all the servers currently active and interconnected in the cluster) and permanent topology - baseline. The current topology is computed automatically using the heartbeat mechanism (expecting responses from all on-line servers) and change in it does not trigger the rebalancing mechanism. Baseline is configured manually, is equal to the cache.servers parameter, and triggers the rebalancing mechanism. From this reason, it is very important to keep in sync the cache.servers parameter of all the Cache Server instances.
Cluster Activation — To face an inconsistent behavior, no operation can’t be performed at a Cluster until it is activated. The cluster is activated when all the Cache Servers listed in the baseline are active.

Knowing all the terms, we can describe, what happens in the aforementioned situations:

Cache Server addition
- Permanent Cache Server addition requires running through the steps listed in the Installation section. The Cache Server becomes part of the baseline after: it is started AND the cache.servers property is read from one of the Cache Servers containing the new Cache Servers. This can happen during the start of the new server or during the configuration reload action of any other server. It must be pointed out, that if another cache.servers parameter not containing the new server (e.g., by running the configuration reload action on another server where the new server is not part of the cache.servers parameter), the new Cache Server can be immediately removed from the cluster. From this reason, we stress out the necessity to synchronize all the configuration of all the Cache Servers.
Cache Server removal
- Permanent removal of the Cache Server is executed by its removal from the cache.servers parameter AND loading of this configuration (again, this can happen upon (re)start of any Cache Server or upon the configuration reload action). If the server is running after this event, it is still part of the current topology, but not part of the baseline, hence, doesn’t contain any records. After the restart of this server, it doesn’t reconnect to the cluster.
Temporary downtime of Cache Server
- The temporary downtime of Cache Server doesn’t lead to rebalancing of the records in caches. This may imply one of two states:
  - At least one of the other servers holds backup records for the primary records of the down server - the whole cluster operates without changes. Any changes at the records are done on the backup records.
  - Some primary records have no backups within the running servers - requesting these records by the DXA Servers invokes exception in processing of the whole request. According to the type of request, DXA Server can either still process it (with some limitations), or produce an internal server error.
- After the reconnection, the originally down Cache Server takes back handling of its primary records. In the case, there have been any changes done to the records via their backups, the cluster tries to pronounce the latest updated record as the valid one. However, this can be slightly indeterministic in the case the cluster has been split.
Split of the cluster (Disconnection of server(s))
- The split of the cluster occurs, when there are communication issues between the Cache Servers of the Cluster. The split may be even into more than two parts.
- All the sub-clusters are running and are able to execute (to some limitations mentioned under the “Temporary downtime of a Cache Server” bullet) and are running until re-connection occurs. This, however, means that the cache holding the internal state (e.g., authenticated sessions) becomes different in each part of the disconnected cluster. Due to the use of the sticky-session mechanism, where the authenticated session (in a form of cookie) always determines the same server to be used for all its requests, the disconnected state works and all servers are running. At cluster reconnection time, the integrity must be reestablished. This is achieved by preserving the sub-cluster with the server with the cache that runs for the longest (as it has the potential to contain the majority of the valid states). Other sub-clusters are restarted, which effectively means (among other things) that part of the records created during the disconnection time is lost.

Patching Process

Splitting the original DXA Server into two specialist components enables a simpler patch application process, or generally a process that requires a temporary downtime of one or both of the components.

Firstly, it is expected that the DXA Server needs patching more often as it encompasses much more functionality and third-party components. During patching of the DXA Server, the Cache Server can be running and no (even temporary cached) data are lost. This is a major improvement over the situations when patching of the original DXA Server led to emptying the caches (SSO sessions, etc.).

Routing

As depicted in Figure 1, the DXA Servers are not by default connected to all the Cache Servers, but only to a subset of them. This is configured at the side of the DXA Server by the Primary servers configuration parameter. This configuration determines at which Cache Servers the DXA Server will store the records it creates (e.g., cached/persisted OAuth tokens, SSO session data, etc.), while backup of these records can be stored also on other servers.

This feature has been introduced to reflect the inequality of communication connections between components in the production environment. E.g., while a DXA Server can be deployed on the same machine as its collocated Cache Server, another Cache Server can be deployed in a completely different data center. The time costs of storing a record by the DXA Server on these two different Cache Servers can be substantial, hence, the configuration of Primary Servers is a good candidate to be exercised in this scenario.

Monitoring

The Apache Ignite component provides an extensive feature of monitoring. Both, DXA Server and DXA Cache Server, components enable to use this feature. For more information, please, see the Monitoring section of the Apache Ignite documentation.