Skip to content

huiyegit/MatchXML

Repository files navigation

MatchXML

This is the official repo of the paper MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification

Install the environment

  • Create a virtual environment
    # We recommend you to use Anaconda to create a conda environment 
    conda create --name matchxml python=3.8
    conda activate matchxml
  • Install the required software:
    pip install -r requirements.txt

Prepare Data

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

  • Download six XMC datasets from XR-Transformer

  • Download our trained label embeddings from Google Drive and save them to xmc-base/{dataset}

  • Download our static text features(static sentence embeddings + TF-IDF features) from Google Drive and save them to xmc-base/{dataset}/tfidf-attnxml, replace the original TF-IDF features.

Train MatchXML and evaluation

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

bash run.sh {dataset}

Train label2vec

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

bash ./label2vec_run/{dataset}.sh

Generate static text features

python sentence_embedding.py

Pre-trained models

  • Our pre-trained models can be downloaded from Google Drive

Citation

If you find this work useful in your research, please consider citing:

@article{ye2024matchxml,
  title={MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification},
  author={Ye, Hui and Sunderraman, Rajshekhar and Ji, Shihao},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2024},
  publisher={IEEE}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages