Skip to content

atnafuatx/masakhane_xqa

 
 

Repository files navigation

Cross-lingual Question Answering for African Languages

Environment and Repository Setup

  • Set up a virtual environment using Conda or Virtualenv or

    conda create -n xor_qa_venv python=3.9 anaconda
    conda activate xor_qa_venv

    or

    python3 -m venv xor_qa_venv
    source xor_qa_venv/bin/activate
  • Clone the repo

    git clone https://github.com/ToluClassics/masakhane_xqa --recurse-submodules
  • Install Requirements

    pip install -r requirements.txt

Source Data

The English and French passages for this project are drawn from Wikipedia snapshots of 2022-05-01 and 2022-04-20 respectively, and are downloaded from the Internet Archive to enable open-domain experiments. The raw documents can be downloaded from the following URLS:

Processing dumps

The already processed dumps are available on HuggingFace 😊, It is recommended to use this exact corpora to be able to reproduce the baseline results. To download:

However, to run the processing pipeline yourself; We adopt the same processing used in the Dense Passage Retriever Paper. The pipeline has been bundled into this script. You can run using the code provided below:

bash scripts/generate_process_dumps.sh /path/to/dir_containing_dumps

However, this document provides a detailed break down of the individual steps.

Retriever

BM25

About

Crosslingual Question Answering for African Languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 91.0%
  • Shell 9.0%