Reveal AI DAT Import Instructions
DAT Loading
DAT files are standard Concordance export. This format is also used by other popular legal review platforms.
A DAT file typically has the first line be a header row identifying the field names.
The Concordance default delimiters are:
Field ¶ ASCII character (020)
Quote þ ASCII character (254)
To import DAT file into SQL database, follow the steps below:
Create a new SQL database in SQL Management Studio.
Right click the database created and select “Tasks… > Import Data…”
Select Flat File Source, browse to the location of the DAT file, then enter “þ” as Text qualifier:
Note
Notice you might need to adjust the code page/Unicode option based on the encoding used by the DAT file.
Click Columns tab, enter “¶” as Column delimiter, click Refresh button if needed:
Click Advanced tab, select all columns in the middle, and then select the DataType dropdown, use “text stream [DT_Text]” for import (Note: if the DAT file is using Unicode, use “Unicode text stream [DT_NTEXT]” instead):
In the next screen, select SQL Server Native Client... and make sure the database is pointing to the one created above:
Click Next and finish importing.
Prepare Data for Processing
Before you kick off processing using the data imported into the SQL database, you need to run scripts to prepare the data and make it compatible with the formats Story Engine requires.
The following scripts provide a sample SQL script to update the data:
Open the SQL database in SQL Management Studio.
Check total records imported, make sure it is matching the records in the DAT file.
Run SQL updates to prepare data for processing.
-- Find the database where the DAT file is imported
USE {database}
-- check the data imported
SELECT * FROM Documents
-- 1. add an id field if it does not exist
-- always a good practice to add it as primary key
ALTER TABLE Documents ADD Id INT IDENTITY PRIMARY KEY
-- 2. check controlnumber, groupidentifier and attachments count
-- if beg_bates exists, use beg_bates as the controlnumber
-- if beg_attach exists, use beg_attach as the groupidentifier
-- notice the groupidentifier for the parent must be equal to its controlnumber
-- update the attachments count using the groupidentifier
ALTER TABLE Documents ADD nexlp_attachcounts INT
UPDATE a SET a.nexlp_attachcounts = b.cnt - 1
FROM Documents a inner join (
SELECT beg_attach, count(*) AS cnt FROM documents GROUP BY beg_attach
) b ON a.beg_attach = b.beg_attach
WHERE a.beg_bates = b.beg_attach
-- 3. check custodian, if custodian does not exists, give it a placeholder value
SELECT COUNT(*) FROM Documents WHERE LEN(custodian) = 0 OR custodian IS NULL
ALTER TABLE Documents ADD custodian VARCHAR(200)
-- if custodian names have unicode characters, use NVARCHAR(200)
UPDATE Documents SET custodian = 'custodian' WHERE LEN(custodian) = 0 OR custodian IS NULL
-- 4. check MD5SUM, if it is empty, give it a unique value, for example, the control number
SELECT COUNT(*) FROM Documents WHERE LEN(MD5SUM) = 0 or MD5SUM IS NULL
UPDATE Documents SET MD5SUM = controlnumber WHERE LEN(MD5SUM) = 0 OR md5sum IS NULL
-- 5. check datatime, add nexlpdatetime field
ALTER TABLE Documents ADD nexlpdatetime DATETIME
UPDATE documents SET nexlpdatetime = CAST(REPLACE(primarydatetime, '.', ':') AS DATETIME)
-- the next step will depend on the data available in the DAT file,
-- if there are multiple date fields, try to check each of them (recommend precedence: DateSent, DateReceived, DateLastModified, DateCreated)
-- if the Date and Time are stored in two separate fields, join date and time fields into one
-- here is an example:
UPDATE Documents SET nexlpdatetime = CAST(DATESENT + ' ' + TIMESENT AS DATETIME) WHERE LEN(DATESENT) > 0
UPDATE Documents SET nexlpdatetime = CAST(DATERECEIVED + ' ' + TIMERECEIVED AS DATETIME) WHERE LEN(DATERECEIVED) > 0 AND nexlpdatetime IS NULL
UPDATE Documents SET nexlpdatetime = CAST(DATEMOD + ' ' + TIMEMOD AS DATETIME) WHERE LEN(DATEMOD) > 0 AND nexlpdatetime IS NULL
UPDATE Documents SET nexlpdatetime = CAST(DATECREATED + ' ' + TIMECREATED AS DATETIME) WHERE LEN(DATECREATED) > 0 AND nexlpdatetime IS NULL
-- 6. add nexlptimezone, typicall options are 'EST', 'CST', 'PST', 'GMT'
ALTER TABLE Documents ADD nexlptimezone VARCHAR(20)
UPDATE Documents SET nexlptimezone = 'EST'
-- 7. add processing status
ALTER TABLE Documents ADD ProcessingStatus INT
UPDATE Documents SET ProcessingStatus = 0
-- to improve performance, also add a SQL index
CREATE INDEX idx_processing ON Documents (ProcessingStatus ASC)
-- 8. update text file path
-- provide the absolute path for the text files, UNC path recommended
-- for example, the query below updates original path from "text\\CDPHP_0001.txt" to "\\relativity\client\Text\CDPHP_00001.txt
UPDATE Documents SET text_path = REPLACE(text_path, 'text\\', '\\relativity\client\Text\')
Appendix:
The following table lists recommended fields for Story Engine processing purposes:
ID | Field Name | Field Type | Required | Notes |
---|---|---|---|---|
1 | Id | Number | Yes * | Numeric identifier of the document, for Relativity, use ArtifactID |
2 | Control Number | Text | Yes | Text identifier of the document |
3 | Custodian | Text | Yes | Name of the custodian, default value “empty_custodian” |
4 | MD5 Hash | Text | Yes | MD5Hash value, default to Control Number if empty |
5 | Group Identifier | Text | Yes | Used to group email and attachments together; default to Control Number if empty; Notice For parent, the Group Identifier must equal the Control Number |
6 | Process Status | Number | Yes * | Indicates Reveal AI processing status |
7 | Extracted Text | Text | Yes | Path to the text file exported, or text content |
8 | Email From | Text | Recommended | Email only, Email's "From" field |
9 | Email To | Text | Recommended | Email only, Email's "To" field |
10 | Email CC | Text | Recommended | Email only, Email's "CC" field |
11 | Email BCC | Text | Recommended | Email only, Email's "BCC" field |
12 | Email Subject | Text | Recommended | Email only, Email's "Subject" field |
13 | Primary Date Time | DateTime | Recommended | Date and Time the email was sent, or last modified date time for Attachments/EFiles. It is important to include the “Time” part for this field |
14 | Email Timezone | Text | Recommended | The time zone used by processing application. The most common values are: EST/CST/PST/GMT |
15 | Email Attachment Count | Number | Recommended | Email only, number of attachments the email has |
16 | eFile Author | Text | Recommended | Last Author of the efile or attachments, if exists |
17 | eFile Filename | Text | Recommended | Filename of the efile or attachments |