Managing DirX Audit Server Error Handling

This chapter provides information on error handling inside DirX Audit Server, including:

How DirX Audit Server error handling works
Error message structure
Configuring DirX Audit Server error handling
Recoverable error handling
Non-recoverable error handling
Specific input validation and processing in File and JMS collectors

The standard notation for the Java datetime formatter pattern is used for the datetime pattern values used in folder and file names in this section.

About DirX Audit Server Error Handling

A set of collector routes are created and run for every tenant. Also a separate instance of DirX Audit Server is used for every tenant starting with version 7.2. Each collector route has the following parts that are connected by the routing mechanisms:

An input endpoint (collector) that collects incoming records from external system.
An optional transformer that transforms the external record format into DirX Audit message if needed.
A Persistence unit that creates the final audit message, extends it with digest and dimensions and stores it into the Database.

An error can occur at any of these steps when processing an incoming record. DirX Audit Server makes sure that if such an error occurs, the relevant record is not lost but is either correctly processed or stored with additional information into error storage for later automatic or manual processing.

The incoming records can contain one or more target messages (for example, when using the file or the LDAP collector). If an error occurs when processing such a record, the contained messages are cut to single messages by the splitter component and are processed one by one (so that only the ones that are causing errors are stored in error storage when needed).

The record is stored into error storage if an error occurs and it can’t be resolved by the DirX Audit Server.

The error storage component stores errors into files. For every error, it stores both the original record and the information about the error (type, exception and related properties).

The record(s) are kept in the source system and should be picked up again later by the collector (on the next poll) in the following cases:

If the error occurs during initial reading in the collector (before passing to the transformer or the Persistence unit)
If the error is not handled correctly by the Persistence unit or the error storage component

If the error occurs later in the transformer or Persistence unit (but is handled by them or the error storage component), the incoming record is removed from the source system and is either correctly reprocessed by the DirX Audit Server or stored into the error storage.

Two main types of error can occur during processing:

A recoverable error – an error occurring on the Persistence unit caused by a temporary problem with the database (for example, a network connectivity problem). These errors are expected to be corrected in a short time period. The DirX Audit Server tries to reprocess the affected messages twice with a short delay. If it isn’t successful, the message is stored in a special folder and it’s reprocessed later automatically. Error files are stored in type folder: 100-recoverable.
A non-recoverable error – an error caused by the record content (for example, invalid format or missing required fields). These records can’t be processed and are stored as files in the error storage. These errors must be analyzed and processed manually. Error files are stored in a type folder (if not one of the sub-types below): 200-nonrecoverable.

There are two sub-types that are stored in separate folders:

Duplicate – The incoming record was already stored in the database. These errors can be ignored. An increasing number of stored duplicates can indicate other problems in the installed systems (e.g. frequent network interruptions) and should be analyzed by the administrator. Error files are stored in type folder: 300-duplicates.
XML processing error – The error occurred when parsing or transforming the incoming record. These errors should be analyzed, and a correction should be applied on the source or in the transformation. Error files are stored in the type folder:
250-nonrecoverable-xml.

Error Message Structure

The base folder is configurable in the Tenant Configuration Wizard. All subfolders (for every error type and sub-type) within this folder are created automatically when first needed.
The following type subfolders can be created:

100-recoverable

200-nonrecoverable

250-nonrecoverable-xml

300-duplicates

When an error is being stored, additional subfolders are created within the corresponding type subfolder – based on the current date. Two levels are created: the first is based on current year and month (in the format yyyy_MM) and second based on current date (in the format _dd_). For example, 2020_06/02 (for June 2nd). For each error, two files are created in the final folder: a content file (zipped) containing the original record (file name format yyyyMMdd_HHmmssSSS-source_components_info.content.zip) and an info file containing additional error information (file name format yyyyMMdd_HHmmssSSS-source_components_info.info).

Here is an example of the full paths of the two files (for a duplicate coming through the DirX Identity File collector):

300-duplicates/2020_06/02/20200602_094739922-dxi_file-pers.info

300-duplicates/2020_06/02/20200602_094739922-dxi_file-pers.content.zip

Configuring DirX Audit Server Error Handling

For information on how to configure DirX Audit error handling, see the section “Server Error Handling” in “Configuring DirX Audit” in the DirX Audit Installation Guide.

Error Handling for Recoverable Errors

If a recoverable error is stored in error storage, it is automatically processed again at least three times. If any of these attempts is successful (the stored record is persisted into the database) the stored error is removed along with the corresponding information file and attempt information (if created before).

If the attempt is not successful (an error occurs), the stored error is kept in the folder and the information about the processing error is stored in an additional file (a file name with the suffix .redelivery.properties). Only one additional file is kept (the older ones are rewritten if they exist).

There are two jobs scheduled by default for this processing with different schedules and scopes:

Daily recovery processor (named recovery_1) – scheduled to run every day at 7pm with scope of last two days.
Weekly recovery processor (named recovery_2) – scheduled to run every Sunday at 5am with scope of last seven days.

Each job tries to redeliver all errors stored within the given scope. Every stored error is tried once during a single job execution.

The jobs configuration can be modified manually by editing configuration files. Either the default file can be modified (then all tenants will use the changed configuration) or the tenant-specific configuration file can be modified (then only this tenant will use the changed configuration). The tenant-specific values overwrite the values from the default file.

The location of the default file is:
install_path/conf/defaults/tenant/configuration.cfg

The location of the tenant-specific configuration file is:
install_path/conf/tenants/tenant_id/configuration.cfg

The sections relevant for the two jobs are (the jobs are simply numbered as 1 and 2 and differ only in schedule and scope and are otherwise functionally equal):

[server.apps.error_handling.recovery_1]
enabled = true
cron_expression = 0+0+19+?+*+*
scope_days = 2

[server.apps.error_handling.recovery_2]
enabled = true
cron_expression = 0+0+5+?+*+SUN
scope_days = 7

The following options can be set:

enabled – Boolean value – whether (true) or not (false) the job will run.
cron_expression – String value – a CRON-like expression for the job schedule.
scope_days – Integer value – the number of past days to check for stored errors. For example, a value of 2 means today and two days before (yesterday and the day before yesterday).

You must reconfigure server jobs for the given tenant using the Tenant Configuration Wizard to apply the changed values (or restart the DirX Audit Server service for a given tenant).

The administrator should check all stored errors that remain in the recoverable errors folder after the configured maximum days scope for either one of the two jobs (by default, eight days). Such stored errors will no longer be processed automatically and should be analyzed and processed manually.

Error Handling for Non-recoverable Errors

All stored non-recoverable errors are not processed automatically and the administrator should analyze them manually.

Duplicates do not need to be processed since they are already in the database. If the number of stored duplicates keeps growing, the entire system should be checked since it indicates that there are other problems (for example, frequent network disconnections).

Stored XML errors should be always checked since these records are missing in the database. The typical reason can be wrong format (duplicated XML header, invalid XML syntax, collector type mismatch), invalid content (missing required elements/attributes) or an error in the transformation template (in case of different source format; for example, DirX Identity records). Once the problem is fixed, the stored files should be reprocessed manually, for example, using the corresponding File collector.

Specific Input Validation and Processing in File and JMS Collectors

File and JMS collectors perform additional validation before or during processing of input data. They have different ways of performing input validation and specific processing of the input.

File collectors will not try to process files that do not pass the basic validation checks. Error handling will not be used for such input files and those files will stay in input folder.

If the file passes all validation checks it is processed further. The file content is not processed at once. It is cut into groups of a given number or records (by default 10). These groups are then processed separately in sequence. Any error that happens during this cutting process will cause the whole file to be processed by error handling and stored into an error storage. It might happen that some of the file content have been already stored successfully into the database before this error occurred.

JMS collectors perform the basic validation check after the message is read. In the event that the content is not valid error handling will be processed. Such messages will be stored in an error storage.

The following sections provide more details on these checks.

File Collector Input Validation

The File collector performs the following validation checks before reading and later when processing a file from the input folder:

File name and file size reprocessing check – The File collector stores the name and size of every file that has been processed. If you later put into the input folder a file with exactly the same name and size it will not be processed, and the file will stay in the input folder. No log message is written into the log file unless a specific logger is set to DEBUG level. The list of processed files is kept in memory (by default the last 1000 processed files). The list is cleared when DirX Audit Server is restarted, or the collector route redeployed.
The file content is checked for basic conformance to the XML format expected for the given source product, for example DirX Identity. The file will not be processed if the content is not a valid XML or does not contain the expected root element and first sub-element. No log message is written into the log file, unless a specific logger is set to DEBUG level. If the file does not pass that check it will stay in the input folder.

The file name is checked against the configured file name mask, configured in the Tenant Configuration Wizard for the given File collector. Files that do not match the given mask are not being processed. No log entry is created in this case.

JMS Collector Input Validation

The JMS collector performs the following validation checks after reading a message from the input queue:

The message content is checked for basic conformance to the XML format expected for the given source product, for example DirX Identity. The message will be processed by error handling and stored into an error storage if the message content is not a valid XML or does not contain the expected root element and first sub-element. The message will be always removed from the input queue.