Indexing Data
After importing documents has completed, it is a best practice to start the index process. An index at the most basic level locates words imported into the project to enable searching and highlighting of search hits, however, Reveal runs additional processes during this time (i.e., conversion into html, text extraction and color detection).
Below is a chart that indicates whether or not indexing is required after the corresponding event has occurred.
Event | Run Index |
---|---|
OCR | Yes |
Data Import - load files | Yes |
Index Error | Yes |
Data Import - Discovery Manager | No |
Uploader | No |
Transcription | No |
Indexing can be done by going to the Create pane and choosing Indexes.
In the Case Name field, use the dropdown menu to select your project.
In the Index table select the set of loaded data identified in the Import File column to be indexed.
To alert teams and/or users upon completion of an indexing job,
select Options button and
go to the Notifications tab to choose the appropriate recipients.
Choose the Index/Re-Index button located in the middle of the window beneath the Index table.
Select the Document Text Sets you wish to index. By default the Native/HTML, Extracted, and OCR/Loaded text sets are present, but additional text sets may have been added to the current project.
The Text Set choices will determine the order which the sets will be indexed.
OCR/Loaded will be indexed first, if present, then the Extracted Text will be indexed, followed by the Native/HTML view.
This is done to get data into the project as quickly as possible to make the documents searchable.
As soon as a Text Set completes indexing, the project will become searchable.
When indexing is complete the documents may be viewed in the Native / HTML viewer in the Document Review screen.
Specific documents can be targeted using the Doc List field to specify a List file containing the identified document numbering field (BEGDOC by default). Users can choose between Unindexed and Errors, Non-Error, and All Documents.
Users can choose to change the priority of a specific indexing jobs. The priority of a job is relative, so if a user chooses to make all indexing jobs High priority, the net effect is that all jobs are the same priority and, therefore, there is no high priority.
Status Tab
Progress
Total Documents – All Documents to be attempted to be indexed in the current job. This is scoped by the data sets or docid list that is selected.
Completed – Documents that were attempted to be indexed in the current job.
Note
The “Successful” count + Warnings “Total” count = The “Completed” count.
Successful – Documents that were successfully indexed in the current job.
Remaining – Documents to be attempted to be indexed in the current job.
Skipped – When running an “Unindexed and Error” or “Non-Error” index job unindexable files will get skipped (e.g. empty source files). Skipped Documents are included within the Successful count.
Not Defined – The path to the OCR or native is missing.
Retries – Indexing process was re-attempted on a document. Typically happens if an attempt to index a document times out.
Warnings
Missing – Native was not found.
Too Large - File is larger than the size limits set within the Text Set settings. See Creating Document Text Sets in the Review Manager Administration Guide for details.
Empty Source - File is empty, so it has no content to extract or convert.
Not Supported - Error when extracting text and creating html from an unsupported file type.
Convert Failures - Error when extracting text and creating html from the native. These are typically corrupt or encrypted files.
Convert Empty - Conversion of native was successful but there was no output text.
Insert Failures – There was a failure when attempting to add the document’s text to the Elastic Index.
Note
Convert is the process of extracting text and rendering html from the native.