Why is BERT and MBERT Important for eDiscovery?
BERT can understand the meaning of words from their context, similar to how humans understand language. BERT achieves state-of-the-art performance on over 11 natural language understanding tasks, further confirming the power of its level of language understanding.
AI models that are based on MBERT need to be trained in one language to be able to score documents in any of the supported languages. The underlying MBERT model was trained on large amounts of data in 104 languages simultaneously and encodes the combined knowledge of these languages.
BERT and MBERT are specifically exciting for legal and compliance teams because it has outperformed other classification techniques on a variety of tasks. Many of those tasks are similar to the model building process in eDiscovery and in proactive solutions. The Reveal AI Data Science team confirmed that BERT-based models achieve an improvement in the F1 score over traditional classification in a variety of tests.
Languages Supported by MBERT
Afrikaans | Albanian | Arabic | Aragonese |
Armenian | Asturian | Azerbaijani | Bashkir |
Basque | Bavarian | Belarusian | Bengali |
Bishnupriya Manipuri | Bosnian | Breton | Bulgarian |
Burmese | Catalan | Cebuano | Chechen |
Chinese (Simplified) | Chinese (Traditional) | Chuvash | Croatian |
Czech | Danish | Dutch | English |
Estonian | Finnish | French | Galician |
Georgian | German | Greek | Gujarati |
Haitian | Hebrew | Hindi | Hungarian |
Icelandic | Ido | Indonesian | Irish |
Italian | Japanese | Javanese | Kannada |
Kazakh | Kirghiz | Korean | Latin |
Latvian | Lithuanian | Lombard | Low Saxon |
Luxembourgish | Macedonian | Malagasy | Malay |
Malayalam | Marathi | Minangkabau | Mongolian |
Nepali | Newar | Norwegian (Bokmal) | Norwegian (Nynorsk) |
Occitan | Persian (Farsi) | Piedmontese | Polish |
Portuguese | Punjabi | Romanian | Russian |
Scots | Serbian | Serboo-Croatian | Sicilian |
Slovak | Slovenian | South Azerbaijani | Spanish |
Sundanese | Swahili | Swedish | Tagalog |
Tajik | Tamil | Tatar | Telugu |
Thai | Turkish | Ukrainian | Urdu |
Uzbek | Vietnamese | Volapük | Waray-Waray |
Welsh | West Frisian | Western Punjabi | Yoruba |