Conneau and Lample 2018 – Word Translation without Parallel Data
Notes for conneau17_word_trans_without_paral_data
1. Background
- This builds on an old idea from Mikolav et al 2013
- The idea goes something like this: We have deep word embeddings for two languages. Maybe we can align the embedding spaces to produce a dictionary
- How should we align the spaces? Pick 5000 anchor points of aligned words and find a mapping \(W\) from source \(X\) to target \(Y\) that minimizes \(|WX-Y|\) across the anchor points
2. What's the innovation in this paper?
- Let's build on that idea, but do away with the anchor points
- Instead, let's learn \(W\) with an adversarial approach. A discriminator tries to distinguish betwen points sampled from \(WX\) and points sampled from \(Y\)
- A generator tries to find a \(W\) to fool the discriminator
- This works and even outperforms supervised aligners