3. Searching and Filtering Documents
Story Engine provides many different options to search and find relevant documents. A user can use traditional keywords, topics and people to search for documents, or AI augmented filtering methods such as COSMIC scores and Emotional Intelligence.
Users can either type directly into the natural language search bar, assisted by our artificial intelligence, or use one of the dropdown menus to return more specific results. You can use Term reports to run multiple string searches and reports.
A. Keyword Search
Story Engine provides full text searching capabilities by using the keywords search tab. Users can expand the area by clicking the Keyword drop-down button, and enter the keywords they want to search:
By default, Story Engine searches the email subject, email body and the full text of attachments/loose documents content. Email header lines (except email subject), greetings, signatures and disclaimers are normally excluded from this search. This can be verified in any document by viewing in the Story layout of the thread viewer. The excluded text is presented in Story layout in grey type instead of black. See Section 8 - Thread Viewer below for more information on thread viewing.
User can specify keywords to retrieve threads where they are mentioned. By choosing Reload topics, or after choosing Apply, users can also use custom sliders to adjust the weight of words related to the search to either find instances where they are mentioned together or exclude instances where they are mentioned together.
Each slider has 3 settings:
Far left (exclude),
Center/Default (neutral), and
Far right (expand results).
Requiring the presence or absence of an additional term will increase or decrease the number of search results accordingly.
For example, when we search for “accounting w/10 fraud” we may want to expand the search results to also include “auditor” or “financial”, but none where “energy” is the only concept in the search hit.
The following screen shows how a user would represent this.
Furthermore, for each specific term, users can add more “related” concepts. Clicking the Related button for “auditor” brings up the following screen, populated with concepts related to “auditor”. These can be utilized to expand the search even further – by leveraging the user’s knowledge with the relevant concepts.
The following operators are currently supported only by Story Engine Keyword search:
*: wildcard, for example, “fund*” returns “funding”, “funded”, etc.
AND: additive condition connector; for example, “fund AND research” returns email segments with both words in the content.
OR: multiple optional condition connector; for example, “fund OR research” returns email segments with either word in the content.
AND NOT: negative additive condition connector; for example, “fund AND NOT research” returns email segments with “fund” but not “research” in the content.
W/{number} - "Within" operator. The number represents the maximum number of characters allowed between the occurrence of two terms. For example, “fund w/5 research” returns email segments with both words in the content and the two words must be within 5 words of each other.
Note
This operator strongly favors finding proximate words within a single sentence. Accordingly, the space between sentences counts as 8 additional words. A line break (or the space between paragraphs) counts as 128 additional words.
AND and OR operators can be used together with the W/{number} connector, however, each key word needs to be connected to the W/{number} connector using parentheses. For example, review the following successful search strategy:
(Fun w/10 happy) or (Suspect w/2 happy)
However, the following search will not work:
(Fun or Suspect) w/2 happy
B. Search by Person
When searching for a person in the global search, users can define whether they want to search documents going To, From, To or From, To and From or Discussed the person of interest. This feature is useful when trying to pinpoint communications between specific people, or find documents discussing specific people.
From: filter to return only emails written by the person.
To: filter to return only emails received by the person.
To or From: filter to return emails written or received by the person
To and From: filter to return emails written and received by the same person
Discussed: filter returns threads where the person is discussed within the content of the document.
To search communications between two specific people, select To to specify that you would like to retrieve emails received by the individual and then click the Apply button.
Then, add another person to the search bar using the Person filter, specifying From. Using the AND connector means that you will return results sent to person A and from person B.
In the graphic below we visualize documents sent TO Vincent Kaminski AND FROM Shirley Crenshaw.
You can add another person to the To field. By default, Story Engine uses the OR operator to connect each of the specified To persons.
C. Search by Date
Click Date dropdown to open Date searching popup.
Date Operators use one of these searching operators.
Before: Return all documents dated before the selected date.
On: Return all documents dated on the selected date.
After: Return all documents dated after the selected date.
Between: Return all documents between the specified start and ending dates.
Work shift: Users can select one of the work shift options to further limit search results. The options available are All shifts, Business Hours, Evening Business Hours or After Hours.
Inclusive/Non-Inclusive:
By default the searching is set to return more precise results (fewer documents) by searching only inclusive documents.
User can specify to include non-inclusive documents in the search results by moving the precision slide bar to right (less precise).
Customize date searching options:
Search Metadata: This option allows user to specify whether to search metadata at segment level or thread level.
Search Mentions: Users can also search for mentions of the specific date in the body by selecting Search Mentions checkbox.
Partial ranges: Use this option if you want to include documents with date mentions that represent a range, which falls partially in the target searching dates.
D. Search by COSMIC Score
When filtering by COSMIC Score, documents are returned based on how well they fit the probability model as determined by a custom Machine Learning model. Users train a “model” data set by tagging for COSMIC groups, and then the algorithm works through the remaining documents based on the user trained data, with each term's relevance probability returned as a number between 0 and 100. This is what the COSMIC Mission Control Statistics table underlying this process looks like:
This enables Story Engine to return similarly classified documents based on the probability of matching the model.
The COSMIC Score drop-down button is used to interactively weigh COSMIC scores by their relative probability. In the Settings, users can choose Low, Medium or High probability as well as no score, errors, and custom ranges.
Users can define a number of thresholds for any COSMIC model in their storybook. For example, a user can check the ALL option, and use a High probability of “Issues: Finance” and low probability of “Issues: HR” in conjunction to find documents that only fit both thresholds. Users can also choose the ANY option, to require just one of the conditional thresholds be met to return results, instead of ALL which would mean that all COSMIC models selected must be present to return the document.
E. Search by Emotions
When searching by Emotions, users can find models by keyword searching. Users can also specify minimum or maximum thresholds for Intent Score, Opportunity Score, Pressure Score, Rationalization Score, Positivity Score and Negativity Score.
For each emotion, users can specify one or a range of threshold options:
Any score
No score
Low
Medium
High
Custom
By combining a number of these filter options users can combine options to search for documents with a medium intent score and high negativity score, for example.
F. Search by Thread Intelligence
Users can select Threads to search by Thread Intelligence to find documents and threads with a certain number of recipients, number of email segments (length of the conversation), or reciprocal ratio (social status).
“Social status” condition represents emails sent or received by a person with high or low social status as determined by reciprocal ratio. A large reciprocal ratio (>5) often indicates that the person associated with the email address may be mass marketing or spamming, while a reciprocal ratio of less than 1 means that the individual receives more messages than they send.
G. Search Metadata
In the global search bar, users click on the Metadata tab to search by metadata.
In the drop down, click Select… and choose from Control number, ID, Group, Thread, Random, or External IDs, Record type and Inclusive Only.
Record type: Search Record Type is used to determine what kinds of files were being sent in what manner. Search options include Emails, attachments, and E-files such as PDFs.
Email: parent email.
Attachment: attachment to an email. Note that an email attached to another email would be treated as an attachment not an email.
EFile: stand-alone parent document such as MS Office docs, PDF files, etc.
H. Search Domain
Searching by domain is used to find out which people associated with which domains were sending and receiving messages. Domain is accessible from the “All filters” dropdown menu.
Exclude a domain by placing your mouse over the “Include/Exclude” button. In the dropdown menu, select “Exclude”.
Include/Exclude: exclusively return or filter out specific domains.
From: returns emails written by the person associated with the domain on the baseball card.
To: returns emails received by the person associated with the domain on the baseball card.
To or From: returns emails written and received by the person associated with the domain on baseball card.
These options are available at any time from the global search and filtering bar. Similar to communication filtering, users can use domain filtering to find emails sent from domain A to domain B. The graphic below shows emails from people associated with the citicorp.com domain sent to people associated with the enron.com domain:
I. Search Tags
Searching by tag is available from the “Tags” dropdown search bar. Once accessed, users can search tags by name, or simply choose from tags listed in the dropdown menu.
Document Choice Filter: click the “plus sign” to select the document choice to be filtered.
Operators available: Any of these, None of these, Is set, Is Not Set.
Choices available: Yes, No, Skip, Control Set.
Under the “Control Set” drop-down, user can select if they want to return documents “In control set” or “Not in control set”.
Under the Model Options drop-down, user can select if they want to return documents “Included in model” or “Excluded from model”.
J. Search Entity
In addition to the filtering options provided in the filtering panel, a user can also search entities or their mentions using the entity searching function listed under the “Entities” dropdown menu:
Which entities are displayed is determined by admin configuration.
You may also create new Custom Entity types meeting your requirements and keying off your data. See Section 5 - Custom Entity Types for details.
Click any of the entity types to bring up the search window. Notice that underneath the search window is the Detection filter.
Choose +Detection and you see the following list of possible methods through which entities have been discovered:
Entities may be found by:
Term Report
Entity Model
User
Each of these is described in Section 5 - Custom Entity Types.
Users can type any keywords in the search window to search mentions associated with the targeted entity type.
An entity can have multiple mentions. For example, the Topic entity “audit” might contain mentions like “audit”, “audit committee”, “audit consideration”. Once an entity is selected in the search bar, the user can choose “+Filter mentions” and then further narrow down the search results by checking/unchecking the mentions associated with the entity.
Click on the Apply button to confirm the selection.
K. Search Languages
User can search for different languages that appear in documents.
From the search bar, users can immediately see the list of different languages detected, as well as the number of documents detected in each language.
User can narrow down search results by selecting different levels of prevalence of a language within the documents. Prevalence of a language is calculated by the number of characters within a document, and is assigned at the document level.
See Appendix E for specifics on language capabilities.
L. Save Search Results
Search results can be saved to a Saved Search. Click the button to open Save Search drop down and select Save this search to invoke the saved search dialog box:
In the dialog box, type the name you want to use for this saved search. If you want to replace an existing search, choose Overwrite a saved search and click Save to save the search.
The “Make available for training” option is available if you want to use this search for training queue. (See “Story Engine 2.0 COSMIC Guide” for more details.)
The “Make available for Insights modules” option indicates you want this search to appear as a choice for criteria in creating an Insights graphic. See Section 2 - Story Engine Exploring Page > C. Insights for more details.
M. Term Report
You can use Term reports for complex searching and reporting. Start by choosing the term reports wand icon and then press Create report.
The following panel appears:
By selecting the question mark icon next to “Term report type” you may review the descriptions of your two choices:
Search term reports (detailed in this section) are based on NexLP’s search technology. Understand where your hits exist in your data and drill down into documents to see their context.
Search for:
Keywords
Wildcard words
Proximity expressions
Boolean expressions
Entity search and extract reports (detailed in Section 5 - Custom Entity Types) extract entities from search lists, enable entity annotation across documents for hits, and create entity models from hits.
Search for:
Keywords
Wildcard words
Regular expressions (See Appendix F: Examples of Useful Regular Expressions).
In this section we will describe the Search term reports. The other choice, Entity search and extract, is used in Custom entity creation described in Section 5 - Custom Entity Types.
Choose Search term.
Name*: Name your search
Term report type*: You have chosen Search term. (The other choice, Entity search and extract, is used in custom entity creation explained in Section 5 - Custom Entity Types.)
Enter search term. One search string per line*: Each discrete search string for which you want specific reporting should be on a separate line. Each line is connected by an implied Boolean OR.
Run based on saved search: Click arrow for drop down and choose from:
Top Five Custodians
All Custodians
Global View
[Any saved searches]
Notes: Add notes as needed.
You may choose either Create or Create and runreport.
Choose the Create button to create a search without running a report.
The created search(es) along with search specifications will be displayed.
To run a report, choose the triangle icon under Run full report.
Notifications of search status are accessed through the bell icon next to your name...
Which provides information such as term report title, the terms, the storybook and status:
Alternately, to create a search and run the report immediately choose Create and run report.
Whether you run the report immediately or later, clicking on the resulting report name…
will provide the hit report per search string.
The following options and information are provided:
+ Add Term: Choose this to add additional lines to your search.
Run Full Report: This runs a report on all search terms.
The report columns:
Keyword: The separate search strings that comprise the search.
Documents: The number of documents found.
Documents with group: The number of documents in the families of the documents that were found.
Unique Hits: The number of documents which were returned for only this search string and for no other search string.
NOTE: “Unique hits” may be helpful in assessing whether this particular search string is over-inclusive. For example, in an anti-trust dispute between competing pharmaceutical companies there may be a list of search strings which in various combinations might indicate objectionable intention. One of those might be “the pharma industry” as in the sentence “There is too much competition in the pharma industry.” But the documents where the phrase “the pharma industry” is the onlyhit might be reasonably suspected of being false positives. Therefore, the “unique hits” report may be used to negotiate a refinement in the search terms in the interest of efficiency and cost control.
Status: Successful, Never Run, Error or Out-of-date.
Last run time: The last time this search string was run.
Run report: You can rerun the report per line.
Actions: You may delete an individual search string.