Managing Fact and Dimension Tables
This chapter describes how to create and maintain the OLAP tables with facts and dimensions, which are the sources for all the Dashboard components and also some reports in DirX Audit Manager. DirX Audit provides two mechanisms for this purpose:
-
A fact population job that the DirX Audit Server runs periodically
-
A fact population tool that can be run on demand
The next sections describe these elements and also describe the fact population configuration files on which these elements rely.
Managing the Scheduled Fact Population Job
The DirX Audit Server can execute scheduled jobs, including a job for the regular population of OLAP cubes. This job obtains the fact and dimension tables from configuration files, creates tables if necessary and then updates them.
The Fact population job is part of DirX Audit Server. The job typically runs once per day and updates the facts for the previous several days.
In order to save time, the Fact population job typically should not calculate the facts for the entire database. Therefore, it supports configuration options that tell it how far into the past it should go. Based on this configuration, the Fact population job searches for audit messages and history entries created in the last several days, updates the dimension tables for newly created dimension types and values, calculates the facts and dimensions and overrides the appropriate rows of all fact tables.
Defining the Schedule
If you want to change the fact population schedule, you only need to start the Tenant Configuration Wizard for the respective tenant and choose the Scheduled Jobs Configuration. Finishing the Tenant Configuration Wizard modifies the respective tenant route XML definition file. If the DirX Audit Server for the tenant is running, the job is rescheduled immediately. For more details, see "Scheduled Jobs Configuration" in the DirX Audit Installation Guide.
It is also possible to modify the route manually in the file located at install_path/server_container/tenants/tenantID/deploy/routes/route-dxt-scheduler-factpopulation.xml.
Note that when manual modifications are made to this file, it must be copied to some other directory outside of the tenant route deployment folder, deleted from the tenant route deployment folder, and finally copied back into the tenant route deployment folder. If the file is not deleted from the folder and then copied back in, the modifications will not be reflected in the job scheduling.
The tenant route XML file contains several processing directives, but the most important parts are the route elements: route-dxt-scheduler-factpopulation,
factpopulation-script-based-datadb, and factpopulation-script-based-historydb.
To create a scheduled job, you need to specify the URI of a component that is able to process it. DirX Audit Server contains such a component - an Apache Camel endpoint - that accepts a new schedule at dxt-scheduler://groupName/jobName?jobParameters.
In this way, you can create a new scheduled job.
Time Factors
It is important to consider the effects of daylight savings time on job’s operation. For example, in the Central European time zone, you go from 1:59:59.999 to suddenly being 3:00:00.000 for spring forward and from 2:59:59.999 to being 2:00:00.000 for autumn back. If you schedule your job between 2:00:00.000 a.m. and 2:59:59.999 a.m. inclusive, it will not run on the daylight-saving date in spring and run twice on the daylight-saving date in autumn. Therefor it is highly recommended to schedule your job out of this range. An example below schedules the job to run daily at 3:01:00 a.m.
Defining a Trigger
To start a job in DirX Audit at certain time, the Quartz Scheduler, which is a Java variant of the Linux cron tool, is used. The cron expression string defines the regular schedule in terms of seconds, minutes, hours, day-of-month, month, day-of-week and year (optional). The format should be well-known from the UNIX tool. For a detailed tutorial on cron expressions, see the following Quartz tutorial: http://www.quartz-scheduler.org/documentation/2.3.1-SNAPSHOT/tutorials/tutorial-lesson-06.html.
For example, to run fact population daily at 4:01 AM, the following expression is sufficient:
0 1 4 * * ?
However, because the cron parameter is a part of the URI and it does not allow spaces, it is necessary to rewrite the cron expression like this:
cron=0+1+4+*+*+?&
Note the "& amp;" part of trigger. It is a URI parameter separator "&" encoded in UTF-8 and is necessary if there are other parameters that follow the cron parameter in the URI.
Defining a Job
A typical job definition for scheduled XML-based fact population uses the "relative date" job class:
... <route id="route-dxt-scheduler-factpopulation"> <from uri="dxt-scheduler://DXT-6bc24196-327d-441c-8d66-3633ee6b887b-jobs/PopulateFactTablesRelDatesJob?cron=0+0+5+?+*+*&jobClass=com.dirxcloud.audit.common.jobs.quartzjobs.PopulateFactTablesRelDatesJob&jobdatamap.tenant_id=6bc24196-327d-441c-8d66-3633ee6b887b&jobdatamap.arg_relative_date_from=TD-5&jobdatamap.arg_relative_date_to=TD-1" /> <log loggingLevel="INFO" logName="com.dirxcloud.audit.server.routes.scheduler.factpopulation.Route" message="Job 6bc24196-327d-441c-8d66-3633ee6b887b - PopulateFactTablesRelDatesJob (factpopulation-xml-based) has finished." /> <to uri="direct:factpopulation-script-based-datadb"/> </route> ...
Here is an example of what the script-based fact population job definition looks like:
... <route id="factpopulation-script-based-datadb"> <from uri="direct:factpopulation-script-based-datadb"/> <log loggingLevel="INFO" logName="com.dirxcloud.audit.server.routes.scheduler.factpopulation.Route" message="Job 6bc24196-327d-441c-8d66-3633ee6b887b - PopulateFactTablesRelDatesJob (factpopulation-script-based-datadb) is starting." /> <to uri="dxt-sqlrunner:DXT_DATA?tenantId=6bc24196-327d-441c-8d66-3633ee6b887b&useFreemarker=true&replaceRelDates=true&param_VALIDFROM=%REL_DATE_FROM(TD-5)%&param_VALIDTO=%REL_DATE_FROM(TD)%&folder=common/factpopulation&filesListFile=factpopulation_scripts_datadb_list.txt" /> <log loggingLevel="INFO" logName="com.dirxcloud.audit.server.routes.scheduler.factpopulation.Route" message="Job 6bc24196-327d-441c-8d66-3633ee6b887b - PopulateFactTablesRelDatesJob (factpopulation-script-based-datadb) has finished." /> <to uri="direct:factpopulation-script-based-historydb"/> </route> ...
As already mentioned, the main part is the route element, more specifically, the from element and its attribute URI. There are two main parameters that are mandatory for all scheduled jobs. The cron parameter specified according to the rules described here and the jobClass parameter that represents the Java class handling job. Make sure that the jobClass is correct. It needs to be an existing class of the DirX Audit Server’s Fact Population binding component. The following parameters are also important:
-
jobdatamap.tenant_id - defines the tenant identifier.
-
jobdatamap.arg_relative_date_from - defines the start date for the XML-based fact population and must conform to the relative date expression described in "Using Relative Date Expression".
-
jobdatamap.arg_relative_date_to - defines the end date for the XML-based fact population and must conform to the relative date expression described in "Using Relative Date Expression".
-
param_VALIDFROM - defines the start date for the script-based fact population.
Additional parameters include:
-
jobdatamap.arg_timezone_id
-
jobdatamap.arg_locale_country
-
jobdatamap.arg_locale_language
The additional parameters are all optional and define the time zone and the locale. They are used for setting dates. By default, the Java Virtual Machine default settings are used.
We recommend performing changes - either manually or by running the Tenant Configuration Wizard - with a running instance of the DirX Audit Server service for a specific tenant. If you want to change the data time scope specified by time parameters, you must modify the parameters manually. After you are finished with the (re)configuration of a job, you need to save the tenant route XML file in a different directory, delete it from the tenant route deployment directory, and then copy the saved file back to the tenant deployment directory. Modifying the file and saving the changes is not sufficient to load the changes; the file itself must be deleted from the route deployment folder and then copied back in for the changes to be triggered on the server.
Using Relative Date Expression
The following string defines the regular expression for the relative date:
^(TD|TW|TM|TY)(-[1-9][0-9]*)?$
The enumeration TD, TW, TM and TY stand for:
-
TD: this day / today - the current day.
-
TW: this week.
-
TM: this month.
-
TY: this year.
The optional suffixes -n specify a shift into the past relative to the T* expression.
Examples:
TD: For from-date: beginning of the current day.
TD: For to-date: end of the current day.
TD-1: For from-date: beginning of yesterday.
TD-1: For to-date: end of yesterday.
TW-1: For from-date: beginning of the last week. Depending on the locale this might be Sunday or Monday.
TW-1: For to date: end of the last week (Saturday or Sunday).
Running the Fact Population Tool
There are cases where scheduled fact population is not enough; for example:
-
When you have exported old audit messages to files, re-imported them and want to have facts for them.
DirX Audit provides the db_fact_population command line tool to address these use cases. The tool accepts a start and an end date and calculates the facts for this time range. See the section "Using the DirX Audit Tools" in the DirX Audit User Interface Guide for usage information on this tool.
Fact and dimension tables are created when the population runs for the first time. DirX Audit Manager can issue alerts about non-existing fact and dimension tables. To create the tables immediately after DirX Audit installation, run the fact population tool. For more details, see "Using the DirX Audit Tools" in the DirX Audit User Interface Guide.
Fact Population Configuration Files
Both the scheduled and on-demand fact population services use the following fact and dimension table configuration files, which are all contained in the folder install_path/conf/tenants/tenantID/fact-configuration when tenant-specific customization is defined or in the folder install_path/conf/fact-configuration where the default definition is available:
-
confFactTables.xml - fact tables configuration file.
-
confFacts.xml - facts configuration file.
-
confDimensions.xml - dimensions configuration file.
The Data DB and History DB (if licensed) must be set for the same DB server type.
See the section "Customizing Fact and Dimension Tables" in the DirX Audit Customization Guide for details on the structure of these files and how to add fact tables, facts or dimensions. The tenant-specific configuration is described there as well.
Note that both the scheduled and on-demand fact population services create the fact and dimension tables according to the configuration in these files when necessary. They also add fact and dimension columns to existing tables when missing, but they don’t delete existing columns from tables when they are removed from the configuration.