Directory Data File Formats
Different directories use different data file formats to represent the entries and attributes within them. The meta controller needs to be able to recognize the different directory data file formats so that it can:
-
Interpret the source data generated from different connected directories
-
Process the source data into different target directory formats
In general, an individual directory’s data file format is either:
-
Tagged, where each attribute in an entry is tagged by a prefix and optionally by a suffix. Directories that use tagged file formats include X.500 DAP directories, LDAP directories, and Lotus Notes directories.
-
Untagged (or "position-driven"), where each attribute in an entry is identified based on its position in the entry. Microsoft Exchange is an example of a directory that uses an untagged data file format.
The meta controller supports both untagged and tagged data file formats, including the LDAP Data Interchange Format (LDIF) tagged file format. This chapter describes the characteristics of tagged and untagged data file formats. It also provides a general discussion of LDIF file format.
Tagged Data File Format
In a tagged data file, each entry and its attributes are tagged by a prefix and a suffix.The presence of these tags allows the entry to contain an unordered list of attributes.A multiple-value separator can be defined to separate the attribute values of a multivalued attribute.
Each entry can be limited to a single line or split over several lines.If the entry is split over several lines, the description of the data file (that is, the relevant fields in the attribute configuration file) must contain global information that determines the end of an entry.
Here is an example of a tagged file in which:
-
The prefixes are "SN=", "GN=", and so on
-
The suffixes used include the semicolon (;) and the pipe character (|)
-
The record (entry) separator is the line feed (<LF>).
The fields in the attribute configuration file that define this data format are:
Record-Sep:'\012'
Abbr:SN Name:Surname Prefix:'SN='
Suffix:';' Rec-Sep:''
MRule:-
Abbr:GN Name:Given-Name Prefix:'GN='
Suffix:'|' Rec-Sep:''
MRule:-
Abbr:TN Name:Telephone-Number Prefix:'TN='
Suffix:'|' Rec-Sep: ''
MRule:-
Abbr:FTN Name:Fax-Telephone-Number Prefix:'FAX='
Suffix:';' Rec-Sep: ''
Mrule:-
Abbr:UP Name:User-Password Prefix:'UP='
Suffix:';' Rec-Sep: ''
Mrule:-
The data file format looks like:
SN=Serling;GN=Rod|TN=44597|FAX=44598;UP=twyl8t;<LF> SN=Stefano;FAX=44232;UP=xante;<LF> SN=Fontana;<LF>
Here is an example of a tagged file in which:
-
The prefixes are "SN=", "GN=", and so on.
-
The suffixes used include ";<LF>" and "|<LF>"
-
The record (entry) separator is the double line feed (<LF><LF>).
The fields in the attribute configuration file that define this data format are:
Record-Sep:'\012\012'
Abbr:SN Name:Surname Prefix:'SN='
Suffix:';\012' Rec-Sep: ''
MRule:-
Abbr:GN Name:Given-Name Prefix:'GN='
Suffix:'|\012' Rec-Sep: ''
MRule:-
Abbr:TN Name:Telephone-Number Prefix:'TN='
Suffix:'|\012' Rec-Sep: ''
MRule:-
Abbr:FTN Name:Fax-Telephone-Number Prefix:'FAX='
Suffix:';\012' Rec-Sep: ''
MRule:-
Abbr:UP Name:User-Password Prefix:'UP='
Suffix:';\012' Rec-Sep: ''
MRule:-
The data file format looks like:
SN=Serling;<LF> GN=Rod|<LF> TN=44597|<LF> FAX=44598;<LF> UP=twyl8t;<LF> <LF> SN=Stefano;<LF> FAX=44232;<LF> UP=xante;<LF> <LF> SN=Fontana;<LF> <LF>
Untagged Data File Format
In an untagged data file, an individual attribute is identified based on its position in an entry.Attributes are separated by an attribute separator or a field width for the attribute can be defined.The attribute delimiter can differ for each attribute, but typically it is the same for all attributes in the file.A multiple-value separator can be defined to separate the attribute values of a multivalued attribute.Because the attributes are identified by their positions in the entry, attribute separators follow each other for attributes with no value.
The meta controller supports the following untagged file formats:
-
Comma-separated value (CSV) format
-
Fixed-width table format
CSV format is a special format (the comma is the delimiter between attributes) of a more generic "character-separated" format, in which attributes are delimited by any combination of characters.Here is an example of a character-separated format for the attribute types Surname, Given Name, Telephone Number, Department.The fields in the attribute configuration file that define this format are:
Record-Sep:'\012'
Abbr:SN Name:Surname Prefix:''
Suffix:'|' Rec-Sep:''
MRule:-
Abbr:GN Name:Given-Name Prefix:''
Suffix:'|' Rec-Sep:''
MRule:-
Abbr:TN Name:Telephone-Number Prefix:''
Suffix:'|' Rec-Sep:''
MRule:-
Abbr:DEP Name:Department Prefix:''
Suffix:'|' Rec-Sep:''
MRule:-
The data format looks like:
Fontana|Frank|301-555-2223|NR 1 FE|<LF> Brown|Murphy||NR 1 AC|<LF>
Here is an example of a fixed-width table format for the attribute types Surname, Given Name, Telephone Number, Department. The fields in the attribute configuration file that define this format are:
Record-Sep:'\012'
Abbr:SN Name:Surname Prefix:''
Suffix:'' Attrlen:15
Rec-Sep:'' MRule:-
Abbr:GN Name:Given-Name Prefix:''
Suffix:'' Attrlen:15
Rec-Sep:'' MRule:-
Abbr:TN Name:Telephone-Number Prefix:''
Suffix:'' Attrlen:12
Rec-Sep:'' MRule:-
Abbr:DEP Name:Department Prefix:''
Suffix:'' Attrlen:10
Rec-Sep:'' MRule:-
The data format looks like:
Fontana Frank 301-555-2223 NR 1 FE <LF> Brown Murphy NR 1 AC <LF>
When using the table format, the output of each field is limited by the AttrLen component where AttrLen defines the maximum length of the field.The field value is composed of the prefix, attribute value(s), optionally multi-value separators, and finally the suffix.In tables Prefix, Suffix and Rec-Sep are usually empty (as shown in the example above).If the composed field value exceeds the length specified in AttrLen the field value is truncated; a truncated value is indicated by the string “…” at the end of the field.
LDIF Format
LDIF format is a kind of tagged data file format.There are two types of LDIF format:
-
LDIF content format
-
LDIF change format
LDIF format supports the following features:
-
Base-64 encoding
-
UTF-8 encoding
-
References to external files, in URL format; for example, an attribute type/value pair such as:
jpegphoto:< file://usr/local/photos/monroe.jpeg -
Alternate record terminators, such as <CR>, <LF>, or <CR><LF>
-
Comment lines and continuation lines
-
Multiple separators between entries
The next sections briefly describe the LDIF content and change file formats. For a complete description of LDIF formats, see the document entitled "G. Good, The LDAP Data Interchange Format (LDIF) - Technical Specification, RFC 2849".
LDIF Content Format
A data file in LDIF content format contains a list of directory entries and their attributes. Each entry consists of a distinguished name and a list of attributes. Each attribute has a prefix and one or more values. For example:
dn: cn=George Costanza, ou=sales, o=NY Yankees, c=us objectclass: top objectclass: person objectclass: organizationalPerson cn: George Costanza cn: G. Costanza cn: Georgie sn: Costanza
LDIF Change Format
A data file in LDIF change format contains a list of directory modifications. Each entry in the change file contains a special LDIF "changetype" attribute that indicates the type of directory modification to be made. There are four types of modification specified in an LDIF change file:
-
Add a directory entry
-
Delete a directory entry
-
Modify one or more attributes of a directory entry
-
Modify the distinguished name of a directory entry
These modifications correspond to the set of LDAP operations that modify a directory. The number and type of attributes present in each entry in the change file differs depending upon the value of the changetype attribute for the entry. The next sections describe the types of change file entry structures.
Add Directory Entry Format
If the value of the "changetype" attribute is "add", the entry contains the distinguished name of the entry to be created and the attribute type and value pairs to be created for the entry. For example:
dn: cn=Joe Isuzu, ou=sales, o=Isuzu, c=us changetype: add objectclass: top objectclass: person objectclass: organizationalPerson surname: Isuzu
Delete Directory Entry Format
If the value of the "changetype" attribute is "delete", the entry contains the distinguished name of the entry to be deleted. For example:
dn: cn=Joe Isuzu, ou=sales, o=Isuzu, c=us changetype: delete
Modify Entry Format
If the value of the "changetype" attribute is "modify", the entry contains the distinguished name of the entry to be modified and a list of attributes that represent one or more modifications to be made to attributes of the entry. There are three types of modification operations that can be recorded in an entry of "changetype" modify":
-
Add an attribute value (including multiple attribute values)
-
Delete an attribute or an attribute value (including multiple attribute values)
-
Replace an attribute with other values (including multiple attribute values)
An attribute in the entry identifies the type of modification operation to be performed; its value is the name of the attribute on which to perform the operation. The attribute structure used to represent the modification to be made to the attribute differs depending on the type of modification. The dash (-) is the "end-of-modification" suffix and terminates each modification structure. For example:
dn: cn=S. Beckhardt, ou=iris o=lotus c=us changetype: modify add: telephonenumber telephonenumber: 603 222 4344 - delete: description description: CFO - replace: surname surname: S. Beckhardt surname: Beckhardt -
The next sections describe the "add" "delete" and "replace" modification structures.
Add Attribute Value Structure
The "add attribute value" structure consists of:
-
The "add" modification identifier attribute
-
One or more attribute type/value pairs that specify the new values to apply.
For example:
add: telephonenumber telephonenumber: 617 235 4764 telephonenumber: 508 546 6645 -
Delete Attribute and Delete Attribute Value Structure
The "delete attribute value" consists of:
-
The "delete" modification identifier attribute
-
One or more attribute type/value pairs that specify the values to be deleted.
For example:
delete:description description:engineer description:software engineering -
The "delete attribute" operation consists of the "delete" modification identifier whose value is the attribute to be deleted. For example:
delete: description -
Modify Distinguished Name/Modify Relative Distinguished Name Format
If the value of the "changetype" attribute is "moddn" or "modrdn", the entry consists of:
-
The distinguished name of the entry whose name is to be modified
-
Information that specifies the new RDN to be applied to the entry
-
Information that specifies whether to delete the old RDN; the value of this attribute is either "0" (do not delete) or "1" (delete) (only relevant if "changetype" is "moddn")
-
Information that specifies the distinguished name of the entry’s new superior (only relevant if "changetype" is "moddn")
For example:
dn: cn=richard hustvedt, ou=engr-ma, o=digital, c=us changetype: moddn newrdn: cn=r. hustvedt deleteoldrdn: 0 newsuperior: ou=engr-nh, o=compaq, c=us
Extensible Markup Language (XML) Format
Extensible Markup Language (XML) is a flexible file format.It is described in the XML standard and in various documents.DirX Identity supports two XML formats:
-
Directory Services Markup Language (DSML) V1.0
-
Flat XML, which is a simple structured format that is similar to LDIF
The next sections briefly describe the XML file formats.
Directory Service Markup Language (DSML V1) Format
A data file in DSML V1 format contains a sections of directory entries and their attributes. Each entry consists of sections of attributes, included into the <entry> and </entry> tags. The attributes are described in <attr> and <objectclass>sections. For example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dsml SYSTEM 'dsml.dtd'>
<dsml>
<!-- status error-code="0" msg="Ok" entry-count="1" -->
<directory-entries>
<entry>
<attr name="cn">
<value>James R. Doran</value>
<value>Jimmy Doran</value>
</attr>
<attr name="telephoneNumber">
<value>1-914-656-2650</value>
</attr>
<attr name="mail">
<value>jrdoran@us.othercompany.com</value>
</attr>
<objectClass>
<oc-value>person</oc-value>
<oc-value>organizationalPerson</oc-value>
<oc-value>othercompanyPerson</oc-value>
<oc-value>ePerson</oc-value>
</objectClass>
</entry>
<entry>
<attr name="cn">
<value>Harry Hirsch</value>
</attr>
...
<attr name="mail">
<value>hhirsch@owncompany.com</value>
</attr>
<objectClass>
<oc-value>person</oc-value>
<oc-value>organizationalPerson</oc-value>
<oc-value>owncompanyPerson</oc-value>
<oc-value>ePerson</oc-value>
</objectClass>
</entry>
</directory-entries>
</dsml>
DirX Identity does not currently support DTDs or DTD sections.
You can find more information about DSML at http://www.dsml.org.
Flat XML Format
A data file in flat XML format contains sections of directory entries and lists of their attributes. Each entry consists of a begin tag <entry>, the list of attributes and the end tag </entry>. Each attribute is described by a begin tag <attribute_name>, the attribute value and an end tag </attribute_name>. For example:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE address SYSTEM "C:\address.dtd">
<address>
<entry>
<dn>o=PQR</dn>
<o>PQR</o>
<description>PQR Company</description>
<telephoneNumber>+49 12 345 67 890</telephoneNumber>
<objectClass>organization</objectClass>
<objectClass>top</objectClass>
<createTimestamp>20000308120944Z</createTimestamp>
</entry>
<entry>
<dn>cn=admin, o=PQR</dn>
...
<createTimestamp>20000308120947Z</createTimestamp>
</entry>
</address>