Skip to main content

Reveal Review Publication

Rebuild Data

You may need to generate or update certain data after initial processing. Processing updates for three items have been consolidated under Rebuild data.

The data which may be generated or rebuilt here are:

  1. AI Vectors

    AI vector generation allows you to build vectors at some point after processing is complete. A typical use for this might be that during processing you did not see any need for the creation of NexBERT vectors but have now changed your mind.

    60638c6a8b8e1.png

    Choose which of the two vector types you wish to build. The Notification Icon in the image below shows that COSMIC text vector creation has been completed.

    60638c6cd6001.png
  2. Near Duplicates

    Near duplicate processing to identify similar documents may be run from here either if skipped during initial ingestion or if linked updates are to be consolidated. Click Generate near duplicate documents and see the checkmark on a green Notifications icon at the upper right of the control bar.

    60638c6eef2b5.png
  3. Cluster Sets

    Clustering analyses textual documents and groups conceptually similar documents. In Reveal AI 1.14.04 and later versions, Cluster Queues are automatically created after each data processing is complete.

    To manually adjust the parameters and re-create clusters follow the instructions below:

    60638c70a7040.png
    • Enable: check this box to enable clustering.

    • Name: the name of the cluster you’d like to create.

    • Queue Sample Size: determines how large the sample size will be for the cluster queue.

    • Max Cluster Count: maximum number of clusters the system will create, recommend 2.

    Click Save to save the settings: the Run Cluster Set button will appear. Click Run Cluster Set button to kick off clustering.

    Notice the clustering processing is a backend service. To check if clustering is complete, open Viz and click “Treemap” button below:

    6042335018dcb.png

Also see Appendix D for a list of system generated clusters.