Directory Data File Formats

Different directories use different data file formats to represent the entries and attributes within them. The meta controller needs to be able to recognize the different directory data file formats so that it can:

Interpret the source data generated from different connected directories
Process the source data into different target directory formats

In general, an individual directory’s data file format is either:

Tagged, where each attribute in an entry is tagged by a prefix and optionally by a suffix. Directories that use tagged file formats include X.500 DAP directories, LDAP directories, and Lotus Notes directories.
Untagged (or "position-driven"), where each attribute in an entry is identified based on its position in the entry. Microsoft Exchange is an example of a directory that uses an untagged data file format.

The meta controller supports both untagged and tagged data file formats, including the LDAP Data Interchange Format (LDIF) tagged file format. This chapter describes the characteristics of tagged and untagged data file formats. It also provides a general discussion of LDIF file format.

Tagged Data File Format

In a tagged data file, each entry and its attributes are tagged by a prefix and a suffix.The presence of these tags allows the entry to contain an unordered list of attributes.A multiple-value separator can be defined to separate the attribute values of a multivalued attribute.

Each entry can be limited to a single line or split over several lines.If the entry is split over several lines, the description of the data file (that is, the relevant fields in the attribute configuration file) must contain global information that determines the end of an entry.

Here is an example of a tagged file in which:

The prefixes are "SN=", "GN=", and so on
The suffixes used include the semicolon (;) and the pipe character (|)
The record (entry) separator is the line feed (<LF>).

The fields in the attribute configuration file that define this data format are:

Record-Sep:'\012'
Abbr:SN  Name:Surname     Prefix:'SN='
         Suffix:';'       Rec-Sep:''
         MRule:-
Abbr:GN  Name:Given-Name  Prefix:'GN='
         Suffix:'|'       Rec-Sep:''
         MRule:-
Abbr:TN  Name:Telephone-Number  Prefix:'TN='
         Suffix:'|'             Rec-Sep: ''
         MRule:-
Abbr:FTN Name:Fax-Telephone-Number  Prefix:'FAX='
         Suffix:';'                 Rec-Sep: ''
         Mrule:-
Abbr:UP  Name:User-Password     Prefix:'UP='
         Suffix:';'             Rec-Sep: ''
         Mrule:-

The data file format looks like:

SN=Serling;GN=Rod|TN=44597|FAX=44598;UP=twyl8t;<LF>
SN=Stefano;FAX=44232;UP=xante;<LF>
SN=Fontana;<LF>

Here is an example of a tagged file in which:

The prefixes are "SN=", "GN=", and so on.
The suffixes used include ";<LF>" and "|<LF>"
The record (entry) separator is the double line feed (<LF><LF>).

The fields in the attribute configuration file that define this data format are:

Record-Sep:'\012\012'
Abbr:SN  Name:Surname        Prefix:'SN='
         Suffix:';\012'      Rec-Sep: ''
         MRule:-
Abbr:GN  Name:Given-Name     Prefix:'GN='
         Suffix:'|\012'      Rec-Sep: ''
         MRule:-
Abbr:TN  Name:Telephone-Number  Prefix:'TN='
         Suffix:'|\012'         Rec-Sep: ''
         MRule:-
Abbr:FTN Name:Fax-Telephone-Number  Prefix:'FAX='
         Suffix:';\012'             Rec-Sep: ''
         MRule:-
Abbr:UP  Name:User-Password   Prefix:'UP='
         Suffix:';\012'       Rec-Sep: ''
         MRule:-

The data file format looks like:

SN=Serling;<LF>
GN=Rod|<LF>
TN=44597|<LF>
FAX=44598;<LF>
UP=twyl8t;<LF>
<LF>
SN=Stefano;<LF>
FAX=44232;<LF>
UP=xante;<LF>
<LF>
SN=Fontana;<LF>
<LF>

Untagged Data File Format

In an untagged data file, an individual attribute is identified based on its position in an entry.Attributes are separated by an attribute separator or a field width for the attribute can be defined.The attribute delimiter can differ for each attribute, but typically it is the same for all attributes in the file.A multiple-value separator can be defined to separate the attribute values of a multivalued attribute.Because the attributes are identified by their positions in the entry, attribute separators follow each other for attributes with no value.

The meta controller supports the following untagged file formats:

Comma-separated value (CSV) format
Fixed-width table format

CSV format is a special format (the comma is the delimiter between attributes) of a more generic "character-separated" format, in which attributes are delimited by any combination of characters.Here is an example of a character-separated format for the attribute types Surname, Given Name, Telephone Number, Department.The fields in the attribute configuration file that define this format are:

Record-Sep:'\012'
Abbr:SN  Name:Surname        Prefix:''
         Suffix:'|'          Rec-Sep:''
         MRule:-
Abbr:GN  Name:Given-Name     Prefix:''
         Suffix:'|'          Rec-Sep:''
         MRule:-
Abbr:TN  Name:Telephone-Number   Prefix:''
         Suffix:'|'              Rec-Sep:''
         MRule:-
Abbr:DEP  Name:Department     Prefix:''
          Suffix:'|'          Rec-Sep:''
          MRule:-

The data format looks like:

Fontana|Frank|301-555-2223|NR 1 FE|<LF>
Brown|Murphy||NR 1 AC|<LF>

Here is an example of a fixed-width table format for the attribute types Surname, Given Name, Telephone Number, Department. The fields in the attribute configuration file that define this format are:

Record-Sep:'\012'
Abbr:SN  Name:Surname        Prefix:''
         Suffix:''           Attrlen:15
         Rec-Sep:''          MRule:-
Abbr:GN  Name:Given-Name     Prefix:''
         Suffix:''           Attrlen:15
         Rec-Sep:''          MRule:-
Abbr:TN  Name:Telephone-Number  Prefix:''
         Suffix:''              Attrlen:12
         Rec-Sep:''             MRule:-
Abbr:DEP Name:Department     Prefix:''
         Suffix:''           Attrlen:10
         Rec-Sep:''          MRule:-

The data format looks like:

Fontana    Frank      301-555-2223   NR 1 FE      <LF>
Brown      Murphy                    NR 1 AC      <LF>

When using the table format, the output of each field is limited by the AttrLen component where AttrLen defines the maximum length of the field.The field value is composed of the prefix, attribute value(s), optionally multi-value separators, and finally the suffix.In tables Prefix, Suffix and Rec-Sep are usually empty (as shown in the example above).If the composed field value exceeds the length specified in AttrLen the field value is truncated; a truncated value is indicated by the string “…” at the end of the field.

LDIF Format

LDIF format is a kind of tagged data file format.There are two types of LDIF format:

LDIF content format
LDIF change format

LDIF format supports the following features:

Base-64 encoding
UTF-8 encoding
References to external files, in URL format; for example, an attribute type/value pair such as:

jpegphoto:< file://usr/local/photos/monroe.jpeg
Alternate record terminators, such as <CR>, <LF>, or <CR><LF>
Comment lines and continuation lines
Multiple separators between entries

The next sections briefly describe the LDIF content and change file formats. For a complete description of LDIF formats, see the document entitled "G. Good, The LDAP Data Interchange Format (LDIF) - Technical Specification, RFC 2849".

LDIF Content Format

A data file in LDIF content format contains a list of directory entries and their attributes. Each entry consists of a distinguished name and a list of attributes. Each attribute has a prefix and one or more values. For example:

dn: cn=George Costanza, ou=sales, o=NY Yankees, c=us
objectclass: top
objectclass: person
objectclass: organizationalPerson
cn: George Costanza
cn: G. Costanza
cn: Georgie
sn: Costanza

LDIF Change Format

A data file in LDIF change format contains a list of directory modifications. Each entry in the change file contains a special LDIF "changetype" attribute that indicates the type of directory modification to be made. There are four types of modification specified in an LDIF change file:

Add a directory entry
Delete a directory entry
Modify one or more attributes of a directory entry
Modify the distinguished name of a directory entry

These modifications correspond to the set of LDAP operations that modify a directory. The number and type of attributes present in each entry in the change file differs depending upon the value of the changetype attribute for the entry. The next sections describe the types of change file entry structures.

Add Directory Entry Format

If the value of the "changetype" attribute is "add", the entry contains the distinguished name of the entry to be created and the attribute type and value pairs to be created for the entry. For example:

dn: cn=Joe Isuzu, ou=sales, o=Isuzu, c=us
changetype: add
objectclass: top
objectclass: person
objectclass: organizationalPerson
surname: Isuzu

Delete Directory Entry Format

If the value of the "changetype" attribute is "delete", the entry contains the distinguished name of the entry to be deleted. For example:

dn: cn=Joe Isuzu, ou=sales, o=Isuzu, c=us
changetype: delete

Modify Entry Format

If the value of the "changetype" attribute is "modify", the entry contains the distinguished name of the entry to be modified and a list of attributes that represent one or more modifications to be made to attributes of the entry. There are three types of modification operations that can be recorded in an entry of "changetype" modify":

Add an attribute value (including multiple attribute values)
Delete an attribute or an attribute value (including multiple attribute values)
Replace an attribute with other values (including multiple attribute values)

An attribute in the entry identifies the type of modification operation to be performed; its value is the name of the attribute on which to perform the operation. The attribute structure used to represent the modification to be made to the attribute differs depending on the type of modification. The dash (-) is the "end-of-modification" suffix and terminates each modification structure. For example:

dn: cn=S. Beckhardt, ou=iris o=lotus c=us
changetype: modify
add: telephonenumber
telephonenumber: 603 222 4344
-
delete: description
description: CFO
-
replace: surname
surname: S. Beckhardt
surname: Beckhardt
-

The next sections describe the "add" "delete" and "replace" modification structures.

Add Attribute Value Structure

The "add attribute value" structure consists of:

The "add" modification identifier attribute
One or more attribute type/value pairs that specify the new values to apply.

For example:

add: telephonenumber
telephonenumber: 617 235 4764
telephonenumber: 508 546 6645
-

Delete Attribute and Delete Attribute Value Structure

The "delete attribute value" consists of:

The "delete" modification identifier attribute
One or more attribute type/value pairs that specify the values to be deleted.

For example:

delete:description
description:engineer
description:software engineering
-

The "delete attribute" operation consists of the "delete" modification identifier whose value is the attribute to be deleted. For example:

delete: description
-

Replace Attribute Value Structure

The "replace attribute value" operation contains:

The "replace" modification identifier attribute
A list of attribute values that replace the attribute

For example:

replace: surname
surname: Soeder
surname: C. Soeder
-

Modify Distinguished Name/Modify Relative Distinguished Name Format

If the value of the "changetype" attribute is "moddn" or "modrdn", the entry consists of:

The distinguished name of the entry whose name is to be modified
Information that specifies the new RDN to be applied to the entry
Information that specifies whether to delete the old RDN; the value of this attribute is either "0" (do not delete) or "1" (delete) (only relevant if "changetype" is "moddn")
Information that specifies the distinguished name of the entry’s new superior (only relevant if "changetype" is "moddn")

For example:

dn: cn=richard hustvedt, ou=engr-ma, o=digital, c=us
changetype: moddn
newrdn: cn=r. hustvedt
deleteoldrdn: 0
newsuperior: ou=engr-nh, o=compaq, c=us

Extensible Markup Language (XML) Format

Extensible Markup Language (XML) is a flexible file format.It is described in the XML standard and in various documents.DirX Identity supports two XML formats:

Directory Services Markup Language (DSML) V1.0
Flat XML, which is a simple structured format that is similar to LDIF

The next sections briefly describe the XML file formats.

Directory Service Markup Language (DSML V1) Format

A data file in DSML V1 format contains a sections of directory entries and their attributes. Each entry consists of sections of attributes, included into the <entry> and </entry> tags. The attributes are described in <attr> and <objectclass>sections. For example:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dsml SYSTEM 'dsml.dtd'>
<dsml>
    <!-- status error-code="0" msg="Ok" entry-count="1" -->
    <directory-entries>
        <entry>
            <attr name="cn">
                <value>James R. Doran</value>
                <value>Jimmy Doran</value>
            </attr>
            <attr name="telephoneNumber">
                <value>1-914-656-2650</value>
            </attr>
            <attr name="mail">
                <value>jrdoran@us.othercompany.com</value>
            </attr>
            <objectClass>
                <oc-value>person</oc-value>
                <oc-value>organizationalPerson</oc-value>
                <oc-value>othercompanyPerson</oc-value>
                <oc-value>ePerson</oc-value>
            </objectClass>
        </entry>
        <entry>
            <attr name="cn">
                <value>Harry Hirsch</value>
            </attr>
...
            <attr name="mail">
                <value>hhirsch@owncompany.com</value>
            </attr>
            <objectClass>
                <oc-value>person</oc-value>
                <oc-value>organizationalPerson</oc-value>
                <oc-value>owncompanyPerson</oc-value>
                <oc-value>ePerson</oc-value>
            </objectClass>
        </entry>
    </directory-entries>
</dsml>

DirX Identity does not currently support DTDs or DTD sections.

You can find more information about DSML at http://www.dsml.org.

Flat XML Format

A data file in flat XML format contains sections of directory entries and lists of their attributes. Each entry consists of a begin tag <entry>, the list of attributes and the end tag </entry>. Each attribute is described by a begin tag <attribute_name>, the attribute value and an end tag </attribute_name>. For example:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE address SYSTEM "C:\address.dtd">
<address>
    <entry>
        <dn>o=PQR</dn>
        <o>PQR</o>
        <description>PQR Company</description>
        <telephoneNumber>+49 12 345 67 890</telephoneNumber>
        <objectClass>organization</objectClass>
        <objectClass>top</objectClass>
        <createTimestamp>20000308120944Z</createTimestamp>
    </entry>
    <entry>
        <dn>cn=admin, o=PQR</dn>
...
        <createTimestamp>20000308120947Z</createTimestamp>
    </entry>
</address>