Skip to main content

Reveal Review Publication

Reveal AI DAT Import Instructions

DAT Loading
  1. DAT files are standard Concordance export. This format is also used by other popular legal review platforms.

  2. A DAT file typically has the first line be a header row identifying the field names.

  3. The Concordance default delimiters are:

    • Field ¶ ASCII character (020)

    • Quote þ ASCII character (254)

  4. To import DAT file into SQL database, follow the steps below:

    1. Create a new SQL database in SQL Management Studio.

    2. Right click the database created and select “Tasks… > Import Data…

      SaveDat1.png
    3. Select Flat File Source, browse to the location of the DAT file, then enter “þ” as Text qualifier:

      SaveDat2.png

      Note

      Notice you might need to adjust the code page/Unicode option based on the encoding used by the DAT file.

    4. Click Columns tab, enter “¶” as Column delimiter, click Refresh button if needed:

      SaveDat3.png
    5. Click Advanced tab, select all columns in the middle, and then select the DataType dropdown, use “text stream [DT_Text]” for import (Note: if the DAT file is using Unicode, use “Unicode text stream [DT_NTEXT]” instead):

      SaveDat4.png
    6. In the next screen, select SQL Server Native Client... and make sure the database is pointing to the one created above:

      SaveDat5.png
    7. Click Next and finish importing.

Prepare Data for Processing

Before you kick off processing using the data imported into the SQL database, you need to run scripts to prepare the data and make it compatible with the formats Story Engine requires.

The following scripts provide a sample SQL script to update the data:

  1. Open the SQL database in SQL Management Studio.

  2. Check total records imported, make sure it is matching the records in the DAT file.

  3. Run SQL updates to prepare data for processing.

-- Find the database where the DAT file is imported

USE {database}

-- check the data imported

 SELECT * FROM Documents

-- 1. add an id field if it does not exist  

-- always a good practice to add it as primary key  

ALTER TABLE Documents ADD Id INT IDENTITY PRIMARY KEY

-- 2. check controlnumber, groupidentifier and attachments count  

-- if beg_bates exists, use beg_bates as the controlnumber  

-- if beg_attach exists, use beg_attach as the groupidentifier  

-- notice the groupidentifier for the parent must be equal to its controlnumber  

-- update the attachments count using the groupidentifier  

ALTER TABLE Documents ADD nexlp_attachcounts INT  

UPDATE a SET a.nexlp_attachcounts = b.cnt - 1  

FROM Documents a inner join (  

SELECT beg_attach, count(*) AS cnt FROM documents GROUP BY beg_attach

 ) b ON a.beg_attach = b.beg_attach

WHERE a.beg_bates = b.beg_attach

-- 3. check custodian, if custodian does not exists, give it a placeholder value  

SELECT COUNT(*) FROM Documents WHERE LEN(custodian) = 0 OR custodian IS NULL  

ALTER TABLE Documents ADD custodian VARCHAR(200)

-- if custodian names have unicode characters, use NVARCHAR(200)  

UPDATE Documents SET custodian = 'custodian' WHERE LEN(custodian) = 0 OR custodian IS NULL

-- 4. check MD5SUM, if it is empty, give it a unique value, for example, the control number

 SELECT COUNT(*) FROM Documents WHERE LEN(MD5SUM) = 0 or MD5SUM IS NULL

 UPDATE Documents SET MD5SUM = controlnumber WHERE LEN(MD5SUM) = 0 OR md5sum IS NULL

-- 5. check datatime, add nexlpdatetime field  

ALTER TABLE Documents ADD nexlpdatetime DATETIME  

UPDATE documents SET nexlpdatetime = CAST(REPLACE(primarydatetime, '.', ':') AS DATETIME)

-- the next step will depend on the data available in the DAT file,

-- if there are multiple date fields, try to check each of them (recommend precedence: DateSent, DateReceived, DateLastModified, DateCreated)  

-- if the Date and Time are stored in two separate fields, join date and time fields into one  

-- here is an example:  

UPDATE Documents SET nexlpdatetime = CAST(DATESENT + ' ' + TIMESENT AS DATETIME) WHERE LEN(DATESENT) > 0  

UPDATE Documents SET nexlpdatetime = CAST(DATERECEIVED + ' ' + TIMERECEIVED AS DATETIME) WHERE LEN(DATERECEIVED) > 0 AND nexlpdatetime IS NULL  

UPDATE Documents SET nexlpdatetime = CAST(DATEMOD + ' ' + TIMEMOD AS DATETIME) WHERE LEN(DATEMOD) > 0 AND nexlpdatetime IS NULL  

UPDATE Documents SET nexlpdatetime = CAST(DATECREATED + ' ' + TIMECREATED AS DATETIME) WHERE LEN(DATECREATED) > 0 AND nexlpdatetime IS NULL

-- 6. add nexlptimezone, typicall options are 'EST', 'CST', 'PST', 'GMT'  

ALTER TABLE Documents ADD nexlptimezone VARCHAR(20)  

UPDATE Documents SET nexlptimezone = 'EST'

-- 7. add processing status  

ALTER TABLE Documents ADD ProcessingStatus INT  

UPDATE Documents SET ProcessingStatus = 0  

-- to improve performance, also add a SQL index  

CREATE INDEX idx_processing ON Documents (ProcessingStatus ASC)

-- 8. update text file path  

-- provide the absolute path for the text files, UNC path recommended  

-- for example, the query below updates original path from "text\\CDPHP_0001.txt" to "\\relativity\client\Text\CDPHP_00001.txt  

UPDATE Documents SET text_path = REPLACE(text_path, 'text\\', '\\relativity\client\Text\')

Appendix:

The following table lists recommended fields for Story Engine processing purposes:

ID

Field Name

Field Type

Required

Notes

1

Id

Number

Yes *

Numeric identifier of the document, for Relativity, use ArtifactID

2

Control Number

Text

Yes

Text identifier of the document

3

Custodian

Text

Yes

Name of the custodian, default value “empty_custodian”

4

MD5 Hash

Text

Yes

MD5Hash value, default to Control Number if empty

5

Group Identifier

Text

Yes

Used to group email and attachments together; default to Control Number if empty; Notice For parent, the Group Identifier must equal the Control Number

6

Process Status

Number

Yes *

Indicates Reveal AI processing status

7

Extracted Text

Text

Yes

Path to the text file exported, or text content

8

Email From

Text

Recommended

Email only, Email's "From" field

9

Email To

Text

Recommended

Email only, Email's "To" field

10

Email CC

Text

Recommended

Email only, Email's "CC" field

11

Email BCC

Text

Recommended

Email only, Email's "BCC" field

12

Email Subject

Text

Recommended

Email only, Email's "Subject" field

13

Primary Date Time

DateTime

Recommended

Date and Time the email was sent, or last modified date time for Attachments/EFiles. It is important to include the “Time” part for this field

14

Email Timezone

Text

Recommended

The time zone used by processing application. The most common values are: EST/CST/PST/GMT

15

Email Attachment Count

Number

Recommended

Email only, number of attachments the email has

16

eFile Author

Text

Recommended

Last Author of the efile or attachments, if exists

17

eFile Filename

Text

Recommended

Filename of the efile or attachments