Performance
Introduction
The DirX Access Performance Guide can help the administrators to evaluate their deployment’s performance. This guide first establishes a benchmarking framework, lists used testing tools, and later describes most notable scenarios together with the performance results to be expected depending on respective configuration and environmental parameters.
Testing Tools
The testing tools are chosen with the emphasis on ease of use, enabling replicability of tested scenarios in customer’s environment.
JMeter
Apache JMeter is an application designed to load test functional behavior and measure performance. JMeter enables to export/import the test configuration, hence, the tests set up during the product development phase can be replicated with minor changes (hostnames, etc.) in the target deployment. For its download and more information, please, see Jmeter .
Clumsy
Clumsy is a network emulation tool. It enables to drop, lag, or tamper with the network packets and hereby simulate the product performance in specific network conditions. Clumsy’s simple filtering syntax enables to alter the communication for each product’s communication channel separately generating multiple different testing scenarios. For its download and more information, please, see Clumsy .
Types of Tests
This section generally describes respective types of tests that might be used in the actual tested scenarios.
Load Test
The purpose of this test is to estimate the upper limits of the system under test. The parameters measured are throughput and average response time. In addition, the tester is permitted to perform a warm-up run of the test scenario in order to bring the servers to a steady state. The test shall take a sufficient time period (typically in the order of hours) to ensure the results generated results are sustainable.
Linear Scale
The purpose of the linear scale test is to determine how the system utilization scales with throughput. Ideally, the system will show linear behavior throughout its performance range, with no drop-offs or flattening of the performance curve prior to saturation. In this test, virtual users (configured with non-zero think times) are ramped up slowly.
Unstable Environment Testing
To simulate the system performance under an unstable environment conditions the tested scenario is performed while simultaneously environment failures are enforced (e.g., using the Clumsy tool). The failures are tested deterministically and represent following states (terms used correspond to Figure 1):
|
Components:
Communication Channels:
Actions:
|
Table of unstable environment events.
| Event # | Time | Event | Description |
|---|---|---|---|
0:00:00 |
Test start. |
||
1 |
0:10:00 |
RSCT (DXA1) |
Stop DXA1. |
0:12:00 |
Start DXA1. |
||
2 |
0:17:00 |
RSCT (DXA1, DXA2) |
Stop DXA1 and DXA2. |
0:19:00 |
Start DXA1 and DXA2. |
||
3 |
0:24:00 |
RSCI (CSF) |
Stop CSF. |
0:24:30 |
Start CSF. |
||
4 |
0:29:30 |
RSCI (CSL) |
Stop CSL. |
0:30:00 |
Start CSL. |
||
5 |
0:35:00 |
RSCT (CSF) |
Stop CSF. |
0:37:00 |
Start CSF. |
||
6 |
0:42:00 |
RSCT (CSL) |
Stop CSL. |
0:44:00 |
Start CSL. |
||
7 |
0:49:00 |
RSCI (CSL, CSF) |
Stop CSL and CSF. |
0:49:30 |
Start CSL and CSF. |
||
8 |
0:54:30 |
RSCT (CSL, CSF) |
Stop CSL and CSF. |
0:56:30 |
Start CSL and CSF. |
||
9 |
1:01:30 |
RSCT (CSL, CSF, DXA1) |
Stop CSL, CSF and DXA1. |
1:03:30 |
Start CSL, CSF and DXA1. |
||
10 |
1:08:30 |
RSCT (CSL, CSF, DXA1, DXA2) |
Stop CSL, CSF, DXA1 and DXA2. |
1:10:30 |
Start CSL, CSF, DXA1 and DXA2. |
||
11 |
1:15:30 |
RCCI (B_CSL_CSF) |
Block traffic from CSL to CSF and vice versa. |
1:16:00 |
Allow traffic from CSL to CSF and vice versa. |
||
12 |
1:21:00 |
RCCT (B_CSL_CSF) |
Block traffic from CSL to CSF and vice versa. |
1:23:00 |
Allow traffic from CSL to CSF and vice versa. |
||
13 |
1:28:00 |
RCCT (U_CSF_CSL) |
Block traffic from CSF to CSL. |
1:30:00 |
Allow traffic from CSF to CSL. |
||
14 |
1:35:00 |
RCCT (U_CSL_CSF) |
Block traffic from CSL to CSF. |
1:37:00 |
Allow traffic from CSL to CSF. |
||
15 |
1:42:00 |
RCCT (B_DXA1_CSL) |
Block traffic from DXA1 to CSL and vice versa. |
1:44:00 |
Allow traffic from DXA1 to CSL and vice versa. |
||
16 |
1:49:00 |
RCCT (B_DXA1_CSF) |
Block traffic from DXA1 to CSF and vice versa. |
1:51:00 |
Allow traffic from DXA1 to CSF and vice versa. |
||
17 |
1:56:00 |
RCCT (B_DXA*_CS*) |
Block traffic from DXA1 and DXA2 to CSF and CSL and vice versa. |
1:58:00 |
Allow traffic from DXA1 and DXA2 to CSF and CSL and vice versa. |
||
18 |
2:03:00 |
RCCT (B_DXA*_CSL, B_CSL_CSF) |
Block traffic from DXA1 and DXA2 to CSL and vice versa, from CSL to CSF and vice versa. |
2:05:00 |
Allow traffic from DXA1 and DXA2 to CSL and vice versa, from CSL to CSF and vice versa. |
||
19 |
2:10:00 |
RCCT (B_DXA*_CSF, B_CSL_CSF) |
Block traffic from DXA1 and DXA2 to CSL and vice versa, from CSL to CSF and vice versa. |
2:12:00 |
Allow traffic from DXA1 and DXA2 to CSL and vice versa, from CSL to CSF and vice versa. |
In each of these scenarios, measured parameters are: throughput and average response time. Furthermore, these scenarios are the ones the deployment can still operate in. There are numerous other scenarios in which the system becomes unresponsive (e.g., Application Repository malfunction). These scenarios are not part of this type of testing. Although, some errors might be expected, they are justifiable due to the testing approach, as the system as a whole is still responsive at given time (e.g., requests already sent to the server being shut down).
General Deployment Diagrams
The Figure 2 depicts a simplified dual server deployment for scenarios accessing resources at the DirX Access Servers, such as Authentication Application, or FEPs. The testing tool is directed at the Load Balancer which in turn requests one of the two existing DirX Access Servers. The Load Balancer shall be configured in a way that it does not represent a bottleneck of the overall deployment. Furthermore, it has to enable a fast switch-over for the cases when one of the server becomes unresponsive.
General Performance Influence Factors
In this chapter, we list the most important factors influencing the overall system performance. We also discuss several of them with respect to the scenarios they might influence. Some of these parameters might be used in the actual testing scenarios to demonstrate different results with different configurations.
Repository
-
Physical placement of repository (local vs. remote)
-
Type of LDAP communications (plain vs. SSL/TLS protected)
-
Allocation of User vs Application Repository (co-located vs. separated)
-
LDAP connection management setup
-
Directory caching yes/no
-
Size of User and Application Repositories
-
General policy characteristics (number of Policy(Set) objects, width and depth of policy tree, number of resource objects, complexity of matching / conditions)
-
PDP configuration (e.g. presence / absence of custom finders)
-
PEP configuration (e.g. RBAC vs. ABAC PDP, presence / absence of request injection or rich subject representation)
-
AuthnMethod settings: user correlation yes/no, certificate path validation yes/no, CRL checking yes/no
-
SubjectTemplate settings: refresh user account yes/no, update user account yes/no
-
Cache settings: cache sizes, SSO cache mode replicated vs partitioned
-
Server settings: session correlation yes/no, update user account yes/no (Authn and Fed)
Server
-
Number of Servers in a cluster
-
Configuration
-
Cluster
-
Federation Service
-
Do Update User Account
-
Session Stored SAML Assertion Limit
-
Cluster Service
-
Cache: Use SSL/TLS
-
-
SSO Service
-
SSO Do Session Correlation
-
SSO Refresh Subject After Seconds
-
SSO Update Subject After Seconds
-
-
User Service
-
User Cache Enabled
-
LDAP Query Optimization
-
-
-
Server
-
Cluster Service
-
Session Cache initial/maximal size
-
Evicted Cache initial/maximal size
-
User Cache initial/maximal size
-
Cluster Group
-
-
-
Subject Template
-
Group Object Classes
-
Determine group membership
-
-
Client
-
Cache settings for client: cache timeout secs (timeout = 0 => no caching, i.e. e.g. Pep object is retrieved for each request; timeout > 0 => Pep object is retrieved after timeout seconds)
-
Cluster update timeout
General
-
Operating systems
-
Size of hard-disk and volatile memory
-
JRE settings
-
The memory configuration parameters (set up via wrapper.java.initmemory and wrapper.java.maxmemory) have the biggest influence at the performance. While the initial memory size may remain relatively low, as it is increased whenever necessary, the maximal memory shall be determined cautiously. Experiences from long-term projects have shown that for scenarios with dual server deployment employing authentication and identity federation with 100-200k sessions, the maximal memory shall be around 4GB. While this memory is unused most of the time, it is necessary due to a doubling of demand during the cluster reintegration.
-