What is BERT and MBERT?
BERT is the recent Deep Learning technology that was developed at Google to work specifically with natural language data. BERT stands for Bidirectional Encoder Representations from Transformers. This is the first Deep Learning technology that essentially learns a language. BERT can understand the meaning of words from their context, similar to how humans understand language. Google uses BERT to go beyond key words and better understand the context of user searches.
MBERT stands for multilingual BERT and is the next step in creating models that understand the meaning of words in context. MBERT is a deep learning model that was trained on 104 languages simultaneously and encodes the knowledge of all 104 languages together. MBERT allows us to build models that can be applied out-of-the-box to data in 104 languages.