High Precision Classification (HPC)
Reveal AI 3.0 introduces COSMIC with High Precision Classification, COSMIC HPC. HPC is an enhancement of the COSMIC supervised machine learning technology that allows the user to achieve desired model quality with a considerably smaller amount of review effort.
COSMIC HPC utilizes user highlights of most relevant passages of documents in addition to user tags. The highlights become the high precision input to the model. The model learns faster because it focuses on the relevant content. As the model scores documents, it identifies the most relevant snippets inside the document. The user sees the model scores for each document as well as the model highlights for the relevant snippets.
This user guide will detail how to use COSMIC HPC, addressing:
COSMIC HPC Configuration Settings
To start using COSMIC HPC and configure its settings, go the COSMIC Mission Control Settings. To navigate there, use the link to Applied AI in the Flyout Menu at the upper left corner of the screen.
When you either Create COSMIC Group or open the Settings for any listed model, the following settings need to be configured. Items 13, 14, 15, 16 and 17 are new in Reveal AI 3.0 and allow to configure COSMIC HPC options:
Name: must be supplied for the model. It should readily identify the purpose of the model.
Training Mode: Active Learning or Infinite Learning. Infinite Learning reduces the need to review non-relevant documents. Active Learning aims at limiting the human coding of even relevant documents. Defaults to Active Learning.
Checkout Size: Number of documents sent to reviewers at one time when entering the queue.
Retraining Interval: The amount of non-control set documents that must be coded for re-classification in the document universe.
Training Queue Size: The approximate number of documents to be added to the training queue after reclassification.
Minimum Positive Examples: The minimum number of documents tagged as Yes before a reviewer can start training COSMIC.
Control Set Percentage: The portion of random sample documents to be set aside per training queue for the purpose of statistical measurement. The Percentage can range from 0% to 100% of the training queue.
Is Inclusive Only: Tells the classifier to classify and select for training All documents or Inclusive (Inclusive emails, attachments, loose eFiles) documents only.
Autotune Enabled: Allows the system to automatically adjust weights based on current results to achieve best results (by default: On).
Autotune Cutoff Threshold: Maximum number of documents that will be used for autotuning (by default: 1000).
Stability Threshold: Minimum number of document tags added or deleted that will trigger stability to recalculate.
Auto Submit Status:
Enabled: System will automatically submit newly coded documents to the classifier when the retraining interval is reached.
Disabled: System will not submit newly coded documents to the classifier.
Infinite | Override - Continue Training with Stable Model: Continue submitting new documents to classifier even after stability is reached.
Standby: System will automatically set Auto Submit Status to “Standby” once it reaches stability. Under this status, the classifier will only run when a previously reviewed document was selected as a Control Set.
Turn on/off COSMIC HPC enhancement:
COSMIC – Train the model by tagging full documents.
COSMIC HPC – Train the model by highlighting relevant text within the document.
COSMIC Autodetect – Autodetect determines if COSMIC HPC is beneficial for the current model and turns it on and off, as appropriate.
Training Settings: Check to enable Run Score Reports to analyze and export COSMIC scoring details. Window Size and Step Size settings control the size of snippets that are analyzed by the model during scoring of a document.
Precision Output: Precision Output has options for the highlights that are returned by the model:
All highlights means that highlights for all snippets with scores above zero will be displayed on the document.
Only Highlights with Score Above Threshold option returns only the model highlights for the snippets with scores at or above the threshold.
Highlights for Training Queue Only Above Threshold option returns only highlights for documents selected for the Training Queue that are at or above the threshold
Threshold is the model score threshold that is used to configure what model highlights will be displayed in the document viewer.
Before clicking Save, make sure to check Available in Storybook so that the model can be used.
Let’s now examine how users create and manage highlights when using COSMIC HPC.
HPC Highlighting
COSMIC HPC allows users to create models that deliver desired results more quickly. The way COSMIC HPC does it is by allowing users to provide highlights as additional input to the model. The user can tag a document as responsive and highlight the most relevant sections of the document to focus the model on the most relevant part of the document's content.
Creating and Managing Highlights
Training is selected from the flyout menu at the upper left corner of the screen. This may be done at any point: following a search or selection of any filtering facets, or simply by selecting a thread from the search results list in the right-hand panel.
Select the Focus Model that you wish to train. This is the Model which you will be highlighting for HPC.
The Training viewer may initially open with highlighting turned off. The document view below is in Classic mode, which shows all email header information in presenting a thread from the latest message back to the earliest.
To turn on COSMIC HPC highlights, first click the Eye icon to turn on showing and selecting highlights. Note that the document view has been changed to Story because this is the only view that supports user and model highlights.
Creating User Highlights
Click the drop-down arrow next to Select… in the highlights box and select High Precision.
You will now be able to highlight and then tag the highlighted text throughout the document. NOTE that a highlight may be tagged as positive or negative; both are valuable in creating the model. You also have the option to turn on display of entities found in the text. We will discuss this further in reviewing the Thread Details pane below. The initial highlight will require clicking Save; you may later set your model highlighting to save automatically, as will be seen below.
You will also be able to see highlighting of highlights you have created for any other models and focus model highlights for the latest round of COSMIC HPC or turn off display of other models at your option.
Where highlighting for two or more models has been applied to the same text, the highlight color will show as grey.
Using Model Highlights
As a user tags documents and provides highlights, COSMIC HPC is running in the background creating updated models using user tags and model highlights. Once the model is updated and document scores are generated by the model, the model also identifies snippets in the documents that are most relevant to the user tags and user highlights. Those snippets will be highlighted with gold colored highlights. These are the highlights from the focus model selected in step 2.
These model highlights are not stored permanently and change with each round of COSMIC.
To turn these highlights into input to the model the user has to highlight the suggested snippet again and tag with an appropriate COSMIC model.
Models may be configured to show highlights only above a set threshold (default is 80) to reduce proliferation of highlights. This threshold, along with the Precision Output highlighting, is configured in COSMIC Mission Control Settings, and will take effect after a Run Full Process or as COSMIC HPC is running during document review..
Managing and Viewing Highlights in Precision Hits Panel
Click Save to store updated tags and stay on the current document, or Save – next to save and proceed to the next document. Clicking on the … <options> button offers the option to automatically move on after Save.
The Precision Hits panel in the Thread Details pane contains tools to manage the display and coding of High Precision Hits. The small figure at the lower right end of a highlight indicates that it was created by a user. Save will save the highlight and its tag.
Once a highlight is saved and if the highlight is tagged with a positive tag, then the segment containing the highlight and the document containing the segment will be assigned a positive tag as well.
Notice too that the highlight is the same color as is shown for the Personal Communications model that is the Primary model for this Training round.
I can enable rapid highlighting mode by clicking the pen icon. An example of enabling rapid highlighting for Personal Communications is shown in the image below.
The Precision Hits controls for HPC models also include:
Precision Hits controls:
The ability to turn display of all Precision Hits model highlights on or off.
Hit-to-hit navigation within a document.
Enable or disable entity annotation for the document.
Precision Hit model controls:
The ability to turn display of model highlights on or off.
Enable rapid highlighting for the model.
Under the Options (…) menu –
Show only <model>
Customize color (for model highlighting).
Upgrade options
This applies to existing Storybooks that will be upgraded to Reveal AI 3.0. COSMIC HPC will be available only after text vectors are re-processed. Text vectors only processing can be initiated in the Administration section. If text vectors are not re-processed and the user selects the COSMIC HPC option or the Autodetect option, an error message will be displayed and COSMIC will not run. The same applies to using AI Model Library models created with COSMIC HPC. They require the text vectors to be re-processed.
If an existing Storybook is upgraded to 3.0 and additional data is added to it after that, COSMIC HPC will only be available after the upgrade if the text vectors were re-processed.
Library models and HPC
Reveal AI offers an AI Model Library with pre-built AI models.
Models created with COSMIC HPC can be imported from the Model Library and applied to documents. The full functionality of the COSMIC AI model management is supported for models created with COSMIC HPC. To apply a COSMIC HPC model to a storybook, follow the standard process of using AI Model library models. The names of the models from the Reveal AI Model Library will identify models created with COSMIC HPC.
To reference an AI Library model within a storybook:
Select Applied AI from the flyout menu at the upper left corner of the screen. This will open the Active Learning table in a new browser window.
Click Settings for the model that you wish to configure.
Open the Referenced Models tab for that model.
Click the drop-down arrow under Name and scroll down to select the model you wish to reference. (For more information about model access permissions users may reference System Admin > Tenant > Model Library.)
Select the Negative Set Type to be applied with this model:
Use External – employ training has been applied to this model previously.
Sample destination storybook – apply local training to this model.
Both – combine existing and local training to determine the negative set.
Click Add.
The referenced model will appear, along with any description, positive and negative count and other details, with a note above the table that you must Run Full Process under the Settings tab to run the referenced model with your local storybook model.
To complete this process, go to the Settings tab and click Run Full Process.
Autodetect mode
Autodetect is a COSMIC process that runs in the background as models are trained. Autodetect determines if COSMIC HPC is beneficial for the current model and turns it on and off, as appropriate.
Possible scenarios in which using COSMIC HPC may not be beneficial are when the wrong portion of a document is highlighted, or a wrong label is assigned to highlights, or the highlighted snippets are too long or too short for the current model.
Autodetect compares the model created with two COSMIC modes: COSMIC HPC On and COSMIC HPC Off. In other words, it compares the model created with document tags and the model created with documents tags and highlights. This comparison is done every three rounds of COSMIC when a sufficient number of additional documents is tagged. The outcome is recorded and if one mode is better three times in a row the autodetect switches to that mode. The autodetect is configured not switch on every fluctuation in the outcome; however, if one mode is consistently better, the autodetect will switch to that mode.
One expected consequence of the Autodetect approach is that a document score report may not show both the COSMIC and COSMIC HPC score until several rounds, possibly as many as 6 or 9, have been completed. This is normal and not a cause for concern.
Score report
The score report provides insight into the scoring process and shows the scoring details for one document with respect to the focus model. When COSMIC HPC is ON or COSMIC Autodetect is used, the report shows the scoring details for the following:
Document-Only Score - the model created with document tags.
Document+Segment Score - document + segment tags (if segment tags are provided).
Document+Highlights Score - the model created with documents tags and highlights.
When COSMIC HPC is off, the score report does not show the score details for Document+Highlights.
COSMIC score reports are generated by default at each round of COSMIC. The option to generate score reports can be turned on and off.
Score reports offer a downloadable top-level summary where users may analyze the details of COSMIC and COSMIC HPC scoring, see the top words and the top highlighted snippet of the scoring process. The green checkmark shows the mode that produces the final score assigned to the document. Each section in the report provides the insights into the model and also into the top scoring words for each document and snippet.
The option to turn score report generation on and off is available under Reports and Settings within the Training screen.
The first level is a summary showing the final score assigned to the document and the scores produced by each option considered.
From here, users may drill down for details by opening either Model Report, or by clicking the plus (‘+’) sign next to the desired model.
By clicking on the plus (‘+’) of any snippet listed in the report the user can see further detail about the words comprising the snippet:
Tradeoffs
There is a speed trade off with using different options for highlights generation and for the score report generation.
If the option "All highlights" is selected, it might take longer for the COSMIC scores and model highlights to be available. In addition, displaying highlights for all snippets with non-zero scores may be noisy. We recommend the option of generating only highlights for the snippets with scores at or above the threshold.
Generating score reports also takes some additional time.