Status: Read
Author: Fangxiaoyu Feng, Yinfei Yang
Topic: Attention, Siamese Network, Text , Transformers
Category: Embeddings
Conference: arXiv
Year: 2020
Link: https://arxiv.org/abs/2007.01852
Summary: A BERT model with multilingual sentence embeddings learned over 112 languages and Zero-shot learning over unseen languages.
- The authors introduce a training scheme which allows BERT transformer model to learn multilingual embeddings over 112 languges.
- Use of Monolingual data to pre-train the model for different languages. This approach has shown to give decent results in the past papers.
- Use of Paired language data for fine-tuning model for learning multilingual embeddings.
- Use of Dual encoder BERT model with shared parameters initialized by the pre-trained model on monolingual data. One encodes source language text and other encodes the target language text.
- The source and target CLS token embeddings are treated as sentence embeddings and
Additive Margin lossis applied to bring the similar embeddings close to each other.
- Additive Margin loss can be used in Siamese based networks.
- Use the multilingual embeddings for models where multiple languages are possible.
- Good embedding model for low/no resource languages.