Cache Server
Overview
The Cache Server is a component that has been historically embedded within the DirX Access Server. The real-life scenarios have shown that the overall system benefits from this component being carved out of the DirX Access Server, leaving both sides focus on a smaller space of features - the DirX Access Server is less oriented on the infrastructural changes and lifecycle and more on the actual business services, while the Cache Server is oriented purely on providing following services to the DirX Access Server:
-
Information synchronization across all DirX Access Servers in a cluster
-
Messaging mechanism between the DirX Access Servers in a cluster
-
Long-term records persisting
The Cache Server component has been also introduced to bring improvements in following areas:
-
Overall system robustness and stability
-
Simpler patching and upgrade process
The Cache Servers do not fully replicate the records across all the servers, rather, a partitioning algorithm is employed, that computes where to store the primary and backup records. For more information, please, see the documentation of the Apache Ignite component.
High-level Architecture
This section explains the aspects of the high-level architecture corresponding to the Cache Server component. To learn their implications and recommendations for the production deployments, please, see the DirX Access Blueprint page.
Components
This section lists the main system components.
DXA Cache Client
The Cache Client is an instance of Apache Ignite Thick Client running within the DirX Access Server. It enables to manage records at the Cache Servers via the communication channels. There are no cache or persisted records stored in this component.
DXA Cache Server
The core of the Cache Server is Apache Ignite Server, extended with configuration interface and tailored to fit the purposes of DirX Access system.
DXA Cache Server Persistence
The Cache Server Persistence is a file-system-based storage within the working directory of the Cache Server. It enables to persist states across long time periods and server restarts.
Protection of this storage has to be seriously considered as it can contain security tokens and breach of it might lead to serious security violations. In current version, DXA doesn’t encrypt this storage.
Relationships
Figure 1 depicts all reasonable relationships between the Cache Server component and other system components. Description follows.
Note: Contrary to older versions of DirX Access, the DirX Access Server components are not interconnected between each other. This makes them more resilient to network malfunctions.
Cluster of DXA Cache Servers
A cluster of Cache Servers is a set of Cache Servers configured to communicate with each other and to represent a single (distributed) storage of information. The ring topology is used and the servers are interconnected via the discovery and communication channels.
DXA Cache Client - DXA Cache Server
Communication channels between Cache Clients and Cache Servers can be configured in an arbitrary way with implications on, predominantly, high-availability features. The underlying mechanism is described in the Routing section.
One-to-one
The typical deployment is represented by a one Cache Client communicating to one Cache Server. This shall be reflected on the physical level in the way that the interconnected components have faster communication channels. In Figure 1, this relationship is between DXA Cache Client 1 and DXA Cache Server 1.
One-to-many
In Figure 1, DXA Cache Client 2 connected to DXA Cache Server 2 and DXA Cache Server 3 represent this relationship. This shall be considered in scenarios, where performance of business services is fully satisfied by lower number of DXA Servers (e.g., one in each data center), while having higher number of (persisted) copies is necessary to face the possibility of malfunction at the side of the Cache Servers.
Example scenario: There are two data centers, in each one DXA Server (in the case of malfunction of one of these, the second one can take over). The communication between the data centers is slower, hence, having a backup of a record stored at Cache Server in one data center in a Cache Server in another data center is impossible.
Many-to-one
In Figure 1, DXA Cache Client 2 and DXA Cache Client 3 connected both to DXA Cache Server 3 represent this relationship. This shall be considered in scenarios, where performance of business services is more demanding, while a single Cache Server can serve multiple DXA Servers.
Example scenario: There are two data centers with fast enough interconnection to have just two Cache Servers overall (providing backup to each other). Services provided by DXA Servers don’t have high demand of the records stored in Cache Servers, however, are very demanding when it comes to operational power (e.g., high amount of cryptographic operations). Multiplicating the DXA Server while preserving the low count of Cache Servers can be a good idea in this case.
DXA Cache Server - DXA Cache Server Persistence
Each Cache Server has in its collocated working directory a storage for persisted records.
Installation
Installation of DirX Access creates a template folder for DXA Cache Server in
{installation_directory}/Services/templates/cacheServer.
To install new Cache Server, following actions have to be performed:
-
Copy and rename the
{installation_directory}/Services/templates/cacheServerfolder into the desired place that will represent the working directory of the new Cache Server instance. (Furthermore, we will name this new folderCacheServer). -
Update the
CacheServer/etc/cacheServer.propertiesfile with appropriate configuration values (see Configuration section). -
Optionally, upload any keystore referenced from the
CacheServer/etc/cacheServer.propertiesconfiguration file into the working directory. -
Optionally, update the
CacheServer/etc/log4j2.xmlfile to provide the wanted logging output. -
Optionally, update the
cacheServer.propertiesfiles of other Cache Servers from the same cluster. This concerns mainly thecache.serversconfiguration parameter. For more information, see the Configuration and Cache Server Cluster Lifecycle sections. -
Optionally, change the Java memory sizes in the
CacheServer/bin/setenv.batfile. -
Depending on the operating system:
-
Windows OS:
-
Install the Windows service by executing the command
CacheServer/bin/serviceInstall.bat %SERVICE_NAME% %DISPLAY_NAME% %SERVICE_USER% %SERVICE_PASSWORD%, where:-
%SERVICE_NAME%is the windows service identifier, -
%DISPLAY_NAME%is the windows service display name, -
%SERVICE_USER%specifies the name of the account under which the service should run (in the case the local system user shall be used, useLocalSystemand no value for%SERVICE_PASSWORD%), and -
%SERVICE_PASSWORD%is the password for the service user account (set by --ServiceUser parameter).
-
-
You can run/stop/restart the Cache Server component as the Windows service.
-
-
Linux OS: Run the Cache Server component via the
CacheServer/bin/cacheServer.batfile.
-
Configuration
To configure successfully the Cache Server, some actions have to be performed at both, the Cache Server and corresponding DXA Server.
Cache Server Configuration
The cacheServer.properties contains the main configuration parameters of the Cache Server.
This file is periodically reloaded according to the
config.reload.period parameter with all changes since the last load being immediately applied on the server.
The application may or may not lead to a restart of the actual embedded Ignite Cache Server instance, depending on the configuration being changed.
The initial property values are referencing the defaults used without those properties being explicitly configured. If left empty, the default value is not universally predictable.
The bold properties are required to be filled in by the administrator, otherwise a default is used or corresponding feature is not employed.
| Property | Default | Description |
|---|---|---|
home.dir |
local directory |
Directory in the file system where the Cache Server will store its persistent and temporary files. The Cache Server process has to have reading and writing access to this directory. |
config.reload.period |
300000 |
Period with which this file is read and applied on the running Cache Server, in milliseconds.Change of this value is immediately propagated on the next period durationIf set to 0, the configuration file is never reloaded. |
failure.detection.timeout |
10000 |
The timeout of failure detection in milliseconds. The system thread worker blockage is configured to be detected after twice this time. |
log4j.configuration.file |
etc/log4j2.xml |
Path to the Log4J2 configuration file. |
server.address |
localhost |
The address at which the Cache Server opens
its ports for communication with the other nodes.Has to be the same as
the address used for this Cache Server in the |
discovery.port |
31118 |
The port at which the discovery mechanism runs.It is also part of the consistent identifier. |
discovery.port.range |
1 |
The range parameter determines how many ports will be tried if the initial port not available (including). |
communication.port |
31119 |
The port at which the communication mechanism runs. |
communication.port.range |
1 |
The range parameter determines how many ports will be tried if the initial port not available (including). |
keystore.file |
Path to the keystore file - currently supporting PKCS12 keystores only.If left empty, the communication will operate via non-secure protocol. Otherwise, the keystore will be used. |
|
keystore.password |
Password to configured keystore. |
|
truststore.file |
Path to the truststore file - currently supporting
PKCS12 truststores only.If left empty, the |
|
truststore.password |
Password to configured truststore. |
|
data.center |
dc1 |
Identifier of the data center this Cache Server belongs to. The topology influences the routing and persistence algorithms. |
cache.servers |
A comma-delimited baseline topology with servers identified by the servers address and discovery port.It is important to keep the values of this parameter in sync across all the Cache Servers in the same cluster. |
|
data.region.default.initial.size |
104857600 |
Initial size of the in-memory data region for the caches assigned to the default data region, in bytes. |
data.region.default.max.size |
2147483648 |
Maximal size of the in-memory data region for the caches assigned to the default data region, in bytes. |
data.region.evicted.initial.size |
10485760 |
Initial size of the in-memory data region for the caches assigned to the data region supporting eviction, in bytes. |
data.region.evicted.max.size |
268435456 |
Maximal size of the in-memory data region for the caches assigned to the data region supporting eviction, in bytes. |
DXA Server Configuration
The DXA Server configuration parameters are described into detail in the Administration Guide. In this section, we provide additional information corresponding to the configuration of the Cache Server.
Cluster
The Cluster configuration object configures the whole cluster of DXA Servers with the same parameters. The Cache-related parameters occur in the Cache section of the Cluster configuration object. The following ones have a specific relationship to the configuration of Cache Server
-
Use SSL, Keystore Identifier and Truststore Identifier have to be configured according to correspond to the cryptomaterial configured at the Cache Servers.
-
The Cache Servers parameter has to contain a map of Cache Server consistent identifiers (
{server.address}:{discovery.port}) and corresponding data center to which that Cache Server belong ({data.center}). This provides the DXA Servers the full topology of the cluster of the Cache Servers and together with other configuration enables to compute properly the routing mechanism. -
Cross data center backup parameter: if true, the records distribution algorithm guarantees to assign at least one backup copy (if configured) to a cache server residing in a different data center than the primary created record (the primary record is stored in the cache server that is primary to the DXA Server creating the record). Use of this feature has to be configured according to the with several aspects in mind: performance of the communication between the data centers, high-availability in the “data center down” scenario, etc.
Server
The Server configuration object configures parameters specific to given server. Regarding the Cache Server, there is a single important configuration parameter:
-
Primary Servers is a list of cache servers (
{server.address}:{discovery.port}) at which this DXA Server can store the cache records it creates (primary records). For more information, please, see the Routing section.
Cache Servers Cluster Lifecycle
The Cache Servers Cluster can be exposed to the following main actions:
-
Cache Server addition
-
Cache Server removal
-
Temporary downtime of Cache Server
-
Split of the cluster
To understand, how Cache Servers behave upon these actions, it is important first to discuss following terms:
-
Partitioning — Each cache is divided into multiple partitions (1024). Each cache record is assigned to one partition. In the caches that are not fully replicated, but partitioned, each Cache Server contains only subset of all partitions (primary and backup partitions), hence, only a subset of all records. By this, the cache size and necessary communication is decreased. Employing a clever routing mechanism additionally means no increase in the communication when requesting given records by the DXA Server.
-
Rebalancing — When the number of Cache Servers changes, some partitions will lose their primary server and one of the backups. At this moment, the rebalancing mechanism takes place. The assignment of partitions to the servers is changed in order for each partition to have primary server and a preconfigured number of backups. According to the content of the caches, the rebalancing can be very costly in both communication and computational aspects.
-
Baseline — To face the expensiveness of the rebalancing process, the cluster of Cache Servers have its current topology (all the servers currently active and interconnected in the cluster) and permanent topology - baseline. The current topology is computed automatically using the heartbeat mechanism (expecting responses from all on-line servers) and change in it does not trigger the rebalancing mechanism. Baseline is configured manually, is equal to the
cache.serversparameter, and triggers the rebalancing mechanism. From this reason, it is very important to keep in sync thecache.serversparameter of all the Cache Server instances. -
Cluster Activation — To face an inconsistent behavior, no operation can’t be performed at a Cluster until it is activated. The cluster is activated when all the Cache Servers listed in the baseline are active.
Knowing all the terms, we can describe, what happens in the aforementioned situations:
-
Cache Server addition
-
Permanent Cache Server addition requires running through the steps listed in the Installation section. The Cache Server becomes part of the baseline after: it is started AND the
cache.serversproperty is read from one of the Cache Servers containing the new Cache Servers. This can happen during the start of the new server or during the configuration reload action of any other server. It must be pointed out, that if anothercache.serversparameter not containing the new server (e.g., by running the configuration reload action on another server where the new server is not part of thecache.serversparameter), the new Cache Server can be immediately removed from the cluster. From this reason, we stress out the necessity to synchronize all the configuration of all the Cache Servers.
-
-
Cache Server removal
-
Permanent removal of the Cache Server is executed by its removal from the
cache.serversparameter AND loading of this configuration (again, this can happen upon (re)start of any Cache Server or upon the configuration reload action). If the server is running after this event, it is still part of the current topology, but not part of the baseline, hence, doesn’t contain any records. After the restart of this server, it doesn’t reconnect to the cluster.
-
-
Temporary downtime of Cache Server
-
The temporary downtime of Cache Server doesn’t lead to rebalancing of the records in caches. This may imply one of two states:
-
At least one of the other servers holds backup records for the primary records of the down server - the whole cluster operates without changes. Any changes at the records are done on the backup records.
-
Some primary records have no backups within the running servers - requesting these records by the DXA Servers invokes exception in processing of the whole request. According to the type of request, DXA Server can either still process it (with some limitations), or produce an internal server error.
-
-
After the reconnection, the originally down Cache Server takes back handling of its primary records. In the case, there have been any changes done to the records via their backups, the cluster tries to pronounce the latest updated record as the valid one. However, this can be slightly indeterministic in the case the cluster has been split.
-
-
Split of the cluster (Disconnection of server(s))
-
The split of the cluster occurs, when there are communication issues between the Cache Servers of the Cluster. The split may be even into more than two parts.
-
All the sub-clusters are running and are able to execute (to some limitations mentioned under the “Temporary downtime of a Cache Server” bullet) and are running until re-connection occurs. This, however, means that the cache holding the internal state (e.g., authenticated sessions) becomes different in each part of the disconnected cluster. Due to the use of the sticky-session mechanism, where the authenticated session (in a form of cookie) always determines the same server to be used for all its requests, the disconnected state works and all servers are running. At cluster reconnection time, the integrity must be reestablished. This is achieved by preserving the sub-cluster with the server with the cache that runs for the longest (as it has the potential to contain the majority of the valid states). Other sub-clusters are restarted, which effectively means (among other things) that part of the records created during the disconnection time is lost.
-
Patching Process
Splitting the original DXA Server into two specialist components enables a simpler patch application process, or generally a process that requires a temporary downtime of one or both of the components.
Firstly, it is expected that the DXA Server needs patching more often as it encompasses much more functionality and third-party components. During patching of the DXA Server, the Cache Server can be running and no (even temporary cached) data are lost. This is a major improvement over the situations when patching of the original DXA Server led to emptying the caches (SSO sessions, etc.).
Routing
As depicted in Figure 1, the DXA Servers are not by default connected to all the Cache Servers, but only to a subset of them. This is configured at the side of the DXA Server by the Primary servers configuration parameter. This configuration determines at which Cache Servers the DXA Server will store the records it creates (e.g., cached/persisted OAuth tokens, SSO session data, etc.), while backup of these records can be stored also on other servers.
This feature has been introduced to reflect the inequality of communication connections between components in the production environment. E.g., while a DXA Server can be deployed on the same machine as its collocated Cache Server, another Cache Server can be deployed in a completely different data center. The time costs of storing a record by the DXA Server on these two different Cache Servers can be substantial, hence, the configuration of Primary Servers is a good candidate to be exercised in this scenario.
Monitoring
The Apache Ignite component provides an extensive feature of monitoring. Both, DXA Server and DXA Cache Server, components enable to use this feature. For more information, please, see the Monitoring section of the Apache Ignite documentation.