9. COSMIC (Cognitive Machine Coding)
COSMIC (Cognitive Machine Coding) is the next generation of technology assisted review from NexLP. The Story Engine COSMIC feature is an optional feature. In order to utilize COSMIC, there should be at least one COSMIC Group created in the storybook.
A. Create COSMIC Group
Users can add a new COSMIC Group two ways after creating or opening a storybook.
Use the COSMIC SETTINGS tab
Click the COSMIC SETTINGS tab to open the COSMIC Settings page:
Click “Create COSMIC Group” to build a new COSMIC group:
Use the Models dropdown in the thread viewer under the TRAINING tab:
To create a new COSMIC Group, enter the coding name for the COSMIC Group and select the Training mode. Also make sure Available in Storybook is checked.
Click Save to add this COSMIC group to the target storybook.
Available in Storybook: Check/uncheck this option to enable/disable the COSMIC group for the selected storybook.
Name: Name the COSMIC file, for example, “Relevant”.
Training Mode: By default, Story Engine uses “Active Learning”. A user can switch to “Infinite Learning”. The differences between Active Learning and Infinite Learning are subtle, but powerful.
In COSMIC Active Learning, the goal is to train the classifier with as little work as possible. The COSMIC queue is curated to provide documents for review that will teach the classifier the most, limiting the overall amount of time required to train the classifier.
Infinite Learning is akin to Continuous Active Learning. In this case, we provide a queue of high-scoring unreviewed documents. The goal in this mode is to encourage reviewers to look at only relevant documents, and to continue review until all relevant documents are tagged by human reviewers.
The system allows users to create multiple COSMIC groups.
Once a COSMIC Group is created, it will become visible in the Thread Viewer.
With at least one COSMIC Group created, the Thread Viewer will have the COSMIC Panel now available. Your Thread Viewer should now look like this:
See Section 9.C -- Reviewing & Tagging Documents in COSMIC for more details on COSMIC Panel features.
B. COSMIC Queues
Reviewers can train the COSMIC model by reviewing documents using different queues. COSMIC queues are designed to provide documents to reviewers for tagging.
Click the TRAINING tab on the main page to open COSMIC Queue Selector:
The options on the COSMIC Queue Selector page are:
Focus Model:
Under Focus Model select which COSMIC Group will be primary for purposes of tagging.
Document List:
Select which document set you would like to work with.
Current Results: this option will randomly select documents from the currently loaded documents set.
Model-specific queues:
COSMIC AI Queue: the training queue used to interactively work with the COSMIC service. When using this queue, the COSMIC service automatically classifies documents every time reviewers finish tagging a certain number of documents. Samples will be drawn based on current COSMIC scores and sent to reviewers for confirmation. This is the recommended way to review documents for the bulk of the COSMIC review.
Tip
The training queue will only be available after a reviewer provides a minimum number of “positive” documents plus one negative document. The minimum positive documents required is defined in the Minimum Positive Examples setting in the COSMIC Mission Control option.
Cluster Queue: this queue is only available when clustering is enabled; each cluster is guaranteed to have at least one sample document. This queue is typically used at the beginning of the COSMIC process when a reviewer needs to do a walkthrough of the dataset to enable the training queue.
Random Queue: the random queue is used to randomly pull documents from the population; this queue is typically used at the beginning of the COSMIC process when a reviewer needs to provide a minimum number of documents to enable the training queue.
Control Set Queue: this queue provides Control Set documents.
Saved Search Queue: this queue is based on a saved search that a user creates; the system will randomly draw document samples based on the saved search population.
Click Get Count to get number of documents available for that queue. Click Apply to enter the thread viewer to start reviewing documents.
C. Reviewing & Tagging Documents in COSMIC
You can use the COSMIC panel to review & tag documents. Numbered sections identified below.
Notice for documents not qualified for COSMIC, a warning message will appear on the top of the document:
Primary COSMIC Group Selector
Click the dropdown to select the primary COSMIC group when you have more than one COSMIC group in the current storybook.
Primary COSMIC Group Tags
Sets of tags used for the primary COSMIC Group, based on the labels set for the COSMIC group. Note these tags are at a document level.
Review the documents and tag each document as Yes, No or Skip (based on labels set for the COSMIC group). Notice that when a reviewer tags a segment as Yes at a segment level, the system automatically assigns Yes at a document level. It is recommended that reviewers use a segment tag to identify Yes documents, if possible.
If the document is selected as a control set (see Control Set Percentage setting for more details), the Primary COSMIC Group Selector button will be labelled as COSMIC Group (Queue) CS:
User can also turn COSMIC models on or off, or add new COSMIC models on-the-fly, by clicking the Models drop down below:
Note
By default, AutoSave feature is on. Click ”…“ to turn on or off the AutoSave feature. When AutoSave is on, clicking Yes, No or Skip automatically saves the choice and moves to next document in the queue:
If AutoSave is off, when a reviewer selects one of the choices for the primary COSMIC Group, the reviewer will need to click Save or Save + next to save the choice.
Secondary COSMIC Groups
In the next row(s) appear tagging buttons for the other available COSMIC groups. These allow you to set tags to the current document without switching the focus of the primary COSMIC group.
Click any tags in the secondary tagging panel to tag for a secondary COSMIC model. The color of the tag automatically shows Green (Yes) when clicked once, Red (No) when clicked twice, or White (not tagged) when the tag button is left alone or clicked three times.
For example, the picture below shows a document being coded for the Sports COSMIC Group. At the top level this document has been tagged Yes for for this group. At the secondary level below, Personal has been clicked once to set the green Yes for this COSMIC model, and Junk has been clicked twice to set the red No for this COSMIC model to further define the training offered by this document:
Include/ Exclude Selector
By default, all documents are included in a user’s model. However, if the user believes a document or a set of documents are irrelevant in building a model, they may choose to exclude documents from the model.
Segment Level COSMIC Tags
Reviewers can also assign a tag at the segment level for the primary COSMIC Group. The segment level COSMIC tag is located on the top right corner of each segment. Notice the system only allows a Yes tag. When a reviewer tags a segment as Yes, the whole document will also be automatically tagged as Yes.
The Reset button is used to clear segment level COSMIC tagging.
The right bottom of the segment shows if the same segment has been tagged in another document. For example, the segment below has been tagged as Yes in enron000003 and No in enron0000011.
COSMIC Cycle Information
By clicking the COSMIC SETTINGS tab you can review the current status of the COSMIC projects.
Cycle:
Pre-Training: no cycle has started.
Cycle #: The current cycle number.
Post-Training: This appears when the COSMIC group is in Disabled or Standby status.
Progress: The progress section shows number of documents tagged in current cycle/total required documents for current cycle.
Classifier Status:
Idle: The classifier is waiting for documents.
Submitted: Tagged documents have been submitted to the classifier.
Thinking: The classifier is classifying documents.
Disabled: Occurs when COSMIC group is set to Disabled.
Standby: Occurs when COSMIC group is set to Standby.
Error-Service Stopped: Service is stopped.
Error: Service is Errored.
Not Enough Data Provided: the COSMIC group runs without enough positive and negative samples. This only happens when user kicks off Run Full Process without enough samples.
COSMIC Notifications
When a reviewer finishes the last document in the current queue, the system will display a notification based on the readiness of the model:
The model has completed its initialization phase and COSMIC AI Queue is ready:
The model is still being trained, and has not reached stability:
Training has completed and stability has been reached:
D. COSMIC Mission Control
User can access COSMIC Settings in the Navigation Bar.
Click Report or Settings to open COSMIC Mission Control.
COSMIC Report
The COSMIC report tab shows the status of COSMIC, including the Cycle Info, Statistics table and the control set.
Cycle Info: shows the current cycle info.
Cycle: Current cycle number, or Pre-Training.
Progress: Number of positive samples identified for the current cycle so far.
Classifier Status: Current status of the classifier.
Classifier Detailed Status: Status of classifier with any available details.
Current Stability Status: Unknown, instable, or stable.
Referenced Models: If the storybook uses any external model, it will show up in the “Referenced Models” line.
Training Mode: Active Learning or Infinite Learning.
Model Type: Identifies where model is assigned.
Last Refreshed: The last time the classifier was run.
Print: Open the COSMIC report in a printable format.
Statistics (Display Mode): displays the current status of documents in the storybook. The user can select to display All Docs or Inclusive Docs. Detailed statistics are provided for the following Ranges:
High (60% to 100%)
Medium (40% to 60%)
Low (0% to 40%)
Scored (Total) (0 to 100%)
Uncertain – No Model Features (-500): When a document does have some metadata or text features but doesn’t have those features in the current run of the COSMIC classification model; in other words, if the model doesn’t have the set of features that a document has, we cannot score the document against the model so it gets marked as Empty.
Unclassified (-100): These are the documents that get a probability score of 50 from the COSMIC classifier. This means they are classified as neither positive nor negative as per the current round of COSMIC classification.
No Score – Errored Documents (-200): An error has occurred during the 2nd pass of the processing for these documents.
No Score – Empty Documents (-300): When a document is found to have no text and metadata representation in the form of vectors; this almost never happens for emails as there is metadata in the form of the fields From, Sent, Subject, etc… But this can happen for an attachment that has a unique filename.
TIP: In order to reduce this number, tag more documents and train the model. As the model grows stronger it will reduce the number of empty documents by being able to recognize the metadata and text features in order to give the documents a proper score.
No Score – Missing Text Vectors (-400): this number reflects the number of documents for which our system cannot locate Text Vectors. This can occur if something abnormal happened during the processing or if any vector files have changed location at any point.
Threshold Slider: allows you to adjust the threshold for Relevancy cut-off; it affects the numbers showing up in the Scored Set numbers and graph below:
Control Set: shows the control set, how the control set documents are tagged, and if they are currently scored as Positive/Negative by the system.
Download Control Set History: reviewers can keep track of the Control Set history by clicking the Download Control Set History button. The downloaded CSV report contains Cycle, Precision, Recall, F1, Richness Labeled, Precision Range (+/-), Recall Range (+/-), Richness Labeled Range (+/-), Labeled Positive, Labeled Negative, Labeled Skip and Had Gap info for each cycle:
Precision/Recall Rates: shows Precison, Recall, F1 scores and Richness of the data.
Settings
The SETTINGS tab provides various settings used to configure COSMIC:
The settings available on this page include:
COSMIC Functions:
Run Full Process: Use this function to force the system to re-classify the whole document set at any time.
Run Warmup Process: Send a request to the COSMIC service to warmup the target storybook. Administrators should run this once at the beginning of COSMIC. This step will reduce the time required when you run an initial classification.
Reset COSMIC Group: Sends a request to COSMIC service to clean current COSMIC tags and scores assigned to the documents.
Warning
Use this option with caution; the system does not keep a copy of the existing COSMIC tags and scores. We recommend creating a backup of the tags and scores before reset.
Update Metadata Vectors: Allows you to update metadata vectors for this storybook. This button gives the user the option to incorporate all new entity examples into the COSMIC metadata vector, even if they are not in a model that has been run.
Note
The metadata vectors are updated whenever an entity model is run.
Available in Storybook: Check/uncheck this option to enable/disable the COSMIC group in the storybook.
COSMIC Configuration:
Name: Labeled Name of COSMIC Tag set.
Training Mode: Active Learning or Infinite Learning. Infinite Learning reduces the need to review non-relevant documents. Active Learning aims at limiting the human coding of even relevant documents.
Checkout Size: Number of documents sent to reviewers at one time when entering the queue.
Retraining Interval: The amount of non-control set documents that must be coded for re-classification in the document universe.
Training Queue Size: The approximate number of documents to be added to the training queue after reclassification.
Minimum Positive Examples: The minimum number of documents tagged as Yes before a reviewer can start training COSMIC.
Control Set Percentage: The portion of random sample documents to be set aside per training queue for the purpose of statistical measurement. The Percentage can range from 0% to 100% of the training queue.
Is Inclusive Only: Tells the classifier to classify and select for training All documents or Inclusive (Inclusive emails, attachments, loose eFiles) documents only.
* See Appendix A for more details about “Is Inclusive Only” setting.
Autotune Enabled: Allows the system to automatically adjust weights based on current results to achieve best results (by default: On).
Autotune Cutoff Threshold: Maximum number of documents that will be used for autotuning (by default: 1000).
Stability Threshold: Minimum number of document tags added or deleted that will trigger stability to recalculate.
Auto Submit Status:
Enabled: System will automatically submit newly coded documents to the classifier when the retraining interval is reached.
Disabled: System will not submit newly coded documents to the classifier.
Infinite | Override - Continue Training with Stable Model: Continue submitting new documents to classifier even after stability is reached.
Standby: System will automatically set Auto Submit Status to “Standby” once it reaches stability. Under this status, the classifier will only run when a previously reviewed document was selected as a Control Set.
Warning
If a user chooses to set the option to “Infinite | Overwrite Continue Training with Stable Model”, the newly coded documents may set the system back to “Still Learning” status.
Labels:
Training Queue Tag Name: Name of the training queue to be used for COSMIC.
Positive Tag Name: This label is for the Responsive coding button.
Negative Tag Name: This is label is for the not-responsive coding button.
Skip Tag Name: This label is for the skip button to allow reviewers to temporarily “pass” on a document.
Classifier Configuration Weights:
The Classifier Configuration Weights table displays the weights given to each feature (beyond the word content as such) of each document segment. The features include enriched natural language processing (NLP) and metadata often implying dynamics outside the literal content of the documents themselves – such as emotionally charged communications, indications of pressure applied or endured, or the over-use of capitalization or personal pronouns. Also, the feature list includes both standard and custom entities.
The weights range from 0 to 1. By default, each feature is given a 0.1 weight. If a feature is given a higher weight, it will have more impact on the classifier. The Autotune feature may assign a different weight to the feature if it is enabled, or a user can manually adjust the weights.
Package:
See Section 9.E. -- COSMIC Model Library below.
Stability History, Agreement and Review Actions
Stability History
The STABILITY HISTORY tab shows the results each time the system recalculates stability. The results can be Still Learning, Approaching Stability, Stabilized and Still Learning (Data Load).
For the system to reach stability, it has to encounter Approaching Stability 3 times in a row.
The system automatically sets stability to Still Learning (Data Load) each time new data has been added to the storybook.
The following example shows the stability history of a COSMIC review project. “Download Stability History” allows a user to download a copy of the history report, including Status, Pos+Neg Count, Cycle and Stability Score for each calculation. The View Details link shows the documents tagged in that cycle.
Agreement Report:
The AGREEMENT tab shows the level of agreement between the model and the reviewers for each cycle.
Cycle Start: the starting review cycle for analyzing agreement.
Cycle End: the ending review cycle for analyzing agreement.
Threshold: the score threshold for positive (Yes).
Display type: choose either “List” or “Matrix”.
Analyze Button: press to run new report.
When the display type is set to use “List”, the report below shows the following columns:
Cycle: the review cycle under examination.
Run Date: the time when COSMIC runs classification for the cycle.
Stability Cycle: the stability cycle which contains this review cycle.
Stability Cycle Status: the status of the stability cycle which contains this coding cycle.
Stability Cycle Score: the stability score of the stability cycle which included this coding cycle.
Reviewer Yes/Model Yes: documents tagged as Yes by reviewers, and among these documents how many are also with COSMIC scores equals to or above the threshold.
Agreement Yes Rate: the percentage of documents tagged as Yes by reviewers and also with COSMIC scores equals to or above the threshold, vs. the total number of documents tagged as Yes by reviewers in this cycle. For example, in cycle 15 above, there are 5 documents tagged as Yes by reviewers, 4 of them are also with COSMIC scores equals to or above the threshold, the Agreement Yes Rate will be 80% (4 out of 5).
Reviewer No/Model No: documents tagged as No by reviewers, and among these documents how many are also with COSMIC scores below the threshold.
Agreement No Rate: the percentage of documents tagged as No by reviewers and also with COSMIC scores below the threshold, vs. the total number of documents tagged as No by reviewers in this cycle. For example, in cycle 15 above, there are 4 documents tagged as No by reviewers, 3 of them are also with COSMIC scores below the threshold, the Agreement No Rate will be 75% (3 out of 4).
Overall Labelled Count: total documents tagged in the review cycle.
Overall Agreement Rate: the average of Agreement Yes Rate and Review No Rate.
View Details: provides a link to show detailed reviewer actions.
When the display type is set to use “Matrix”, the following additional settings become available:
Score Width: the interval of scores to display in matrix.
Cycle Group size: how many review cycles to group into one cycle for analyzing.
Below shows an example of the agreement report using “Matrix” display type, with score width set to “10” and groups each 3 review cycles into one (to access this, change the Display Type from “List” to “Matrix”):
Each cell in the report displays the number of documents in the score range tagged by a reviewer, among these docs how many are scored as Yes or No (based on the 55% threshold set above), and the percentage of agreement between human coding and the COSMIC score.
For example, note the indicated cell in the report above. For cycle 14-16, under column “0.40”, it shows “67% (2/3).” This indicates from review cycle 14 to 16 three documents with a COSMIC score between 0.3 and 0.4 were tagged, two of which were tagged No by reviewers. Because this score of 0.3 and 0.4 is below the 0.55 threshold, this shows a 67% agreement between the reviewers and COSMIC score.
Reviewer Actions:
The REVIEWER ACTIONS tab shows the associated reviewer tagging actions:
Reviewer: the name of the reviewer.
Tag:Positive, Negative, Skip, Exclude.
Score: score assigned to the document at the time of tagging.
Agreed?
Yes: system agrees with the tag assigned.
No: system does not agree with the tag assigned.
Date: date and time at the time of tagging.
Cycle: # of cycle at the time of tagging.
Copy Id: ID of the document tagged.
Control Number: control number of the document tagged.
Added or Removed from Training Set:
Added: tagged Yes.
Removed: tagged No.
Removed (Auto added to Control set): when a document was first tagged as a seed document and later selected as a Control Set document, the system automatically removes the document from training.
Included (Exclude tag removed): User tagged a previously Excluded document as Included.
Excluded (Exclude tag added): User tagged a document as Excluded.
In Training Queue?
Yes: tagged from within any COSMIC queue.
No: not tagged from within a COSMIC queue.
COSMIC Group Queue
Not in COSMIC Queue: when tags are assigned from outside a COSMIC queue.
[COSMIC GROUP]: primary COSMIC Group when tagging occurred.
Queue Type: one of the COSMIC queue types, see Section 5.B -- COSMIC Queues for possible COSMIC queue types.
Assigned To Report
The ASSIGNED TO REPORT tab shows documents currently assigned to any reviewers.
In case a reviewer is removed from the review team, the System Administrator can use the Unassign button to check in documents originally assigned to that reviewer so they can be returned to the queue.
Correlation Matrix
Correlation Matrix provides a cross examination of COSMIC scores from all COSMIC groups.
Score Range ([COSMIC GROUP]): the scope to examine; a user can select a Low, Medium or High scope to examine.
Display Mode:Inclusive Docs or All Docs.
Click the Analyze button to run the reports based on the two options above.
The matrix reports the percentage and number of documents with Low, Medium and High scores for each COSMIC group. A user can also click the link to load the corresponding documents into the thread viewer.
E. COSMIC Model Library
Admin rights are required to publish a model to the COSMIC model library. See Admin Guide Section 1. Storybook Settings > F. COSMIC Groups. All other COSMIC model access is available to the end-user.
Accessing the COSMIC Model Library
You access the COSMIC Model Library through the User dropdown menu by choosing Model library:
The Model library page appears:
NexLP Pre-trained Starter Models
Story Engine comes with a growing number of pre-trained starter reference models. These models are ready out of the box to apply to your data. The list of models is augmented as new versions of Story Engine are released.
Import COSMIC Model
Import a model to the existing model library if you want to use the model in a different Story Engine environment. Click the Model Library tab and then the Import button to show the Model Library Import window:
Select the Choose File button to browse in order to locate the NPS file downloaded in the step above.
Click the Mode tab, find the storybook to which you would like to import the COSMIC model, then click View to view existing models as well as add new models:
A user can set negative sets by using the negative set type options:
Use External: COSMIC will use the negative examples provided in the imported model file. All the negative examples from the external model will be used for training the classifier.
Sample Destination Storybook: COSMIC will randomly draw samples from the current storybook. No negative examples will be used from the external model. A random sample of negative documents from the current storybook will be generated by the classifier based on the number of positive examples present in the external model. Documents that are too short will be excluded. Non-inclusive documents and documents with a pre-existing COSMIC tag will also be excluded.
Both: COSMIC will use both external examples and random samples from the current storybook.
Important
Option to select “Both” is highly recommended.
Note
All positive sets are automatically included from a referenced model and applied to the target COSMIC model.
Once the COSMIC model is successfully added, you will see it in the table.
Next, you must Run Full Process under the settings tab in order to successfully apply the model on top of your data.