Skip to main content

Reveal Review Publication

Why is BERT and MBERT Important for eDiscovery?

BERT can understand the meaning of words from their context, similar to how humans understand language. BERT achieves state-of-the-art performance on over 11 natural language understanding tasks, further confirming the power of its level of language understanding.

AI models that are based on MBERT need to be trained in one language to be able to score documents in any of the supported languages. The underlying MBERT model was trained on large amounts of data in 104 languages simultaneously and encodes the combined knowledge of these languages.

BERT and MBERT are specifically exciting for legal and compliance teams because it has outperformed other classification techniques on a variety of tasks. Many of those tasks are similar to the model building process in eDiscovery and in proactive solutions. The Reveal AI Data Science team confirmed that BERT-based models achieve an improvement in the F1 score over traditional classification in a variety of tests.

Languages Supported by MBERT

Afrikaans

Albanian

Arabic

Aragonese

Armenian

Asturian

Azerbaijani

Bashkir

Basque

Bavarian

Belarusian

Bengali

Bishnupriya Manipuri

Bosnian

Breton

Bulgarian

Burmese

Catalan

Cebuano

Chechen

Chinese (Simplified)

Chinese (Traditional)

Chuvash

Croatian

Czech

Danish

Dutch

English

Estonian

Finnish

French

Galician

Georgian

German

Greek

Gujarati

Haitian

Hebrew

Hindi

Hungarian

Icelandic

Ido

Indonesian

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kirghiz

Korean

Latin

Latvian

Lithuanian

Lombard

Low Saxon

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Marathi

Minangkabau

Mongolian

Nepali

Newar

Norwegian (Bokmal)

Norwegian (Nynorsk)

Occitan

Persian (Farsi)

Piedmontese

Polish

Portuguese

Punjabi

Romanian

Russian

Scots

Serbian

Serboo-Croatian

Sicilian

Slovak

Slovenian

South Azerbaijani

Spanish

Sundanese

Swahili

Swedish

Tagalog

Tajik

Tamil

Tatar

Telugu

Thai

Turkish

Ukrainian

Urdu

Uzbek

Vietnamese

Volapük

Waray-Waray

Welsh

West Frisian

Western Punjabi

Yoruba