Dual Word Embedding for UBLI

This repository is the official Python and CuPy implementation of our following TASLP paper:

H. Cao, L. Li, C. Zhu, M. Yang and T. Zhao, "Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2606-2615, 2023. [pdf]

We propose a novel approach to making full use of both input and output vectors for more robust and strong UBLI. We discover the Common Difference Property that one orthogonal transformation can connect not only the input vectors of two languages but also the output vectors. Therefore, we can learn just one transformation to induce two different dictionaries from the input and output vectors, respectively. Between these two quite different dictionaries, a more accurate lexicon with less noise can be induced by taking the intersection of them in UBLI procedure. Our method is based on the VecMap.

Run the Code

The main part is map_embeddings_wc.py.

You can run map_embeddings_wc.py for training cross-lingual word embedding:

python map_embeddings_wc.py --unsupervised src_word_embeddings.vec src_context_embeddings.vec trg_word_embeddings.vec trg_context_embeddings.vec ./mapped_src.vec ./mapped_src_context.vec ./mapped_trg.vec ./mapped_trg_context.vec -v --cuda

For evaluating, you can run eval_translation.py:

python eval_translation.py ./mapped_src.vec ./mapped_trg.vec -d /path/of/test_dictionary --retrieval csls --cuda

--cuda is optional depending on your GPU.

Citation

Please cite our paper if you find this repository useful.

@ARTICLE{10167848,
  author={Cao, Hailong and Li, Liguo and Zhu, Conghui and Yang, Muyun and Zhao, Tiejun},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction}, 
  year={2023},
  volume={31},
  number={},
  pages={2606-2615},
  doi={10.1109/TASLP.2023.3290425}}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scripts		scripts
cupy_utils.py		cupy_utils.py
embedding space.png		embedding space.png
embeddings.py		embeddings.py
eval_translation.py		eval_translation.py
map_embeddings_wc.py		map_embeddings_wc.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dual Word Embedding for UBLI

Run the Code

Citation

About

Releases

Packages

Languages

liguo-cs/Dual-BLI

Folders and files

Latest commit

History

Repository files navigation

Dual Word Embedding for UBLI

Run the Code

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages