MatchXML

This is the official repo of the paper MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification

Install the environment

Create a virtual environment

# We recommend you to use Anaconda to create a conda environment 
conda create --name matchxml python=3.8
conda activate matchxml

Install the required software:
```
pip install -r requirements.txt
```

Prepare Data

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

Download six XMC datasets from XR-Transformer
Download our trained label embeddings from Google Drive and save them to xmc-base/{dataset}
Download our static text features(static sentence embeddings + TF-IDF features) from Google Drive and save them to xmc-base/{dataset}/tfidf-attnxml, replace the original TF-IDF features.

Train MatchXML and evaluation

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

bash run.sh {dataset}

Train label2vec

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

bash ./label2vec_run/{dataset}.sh

Generate static text features

python sentence_embedding.py

Pre-trained models

Our pre-trained models can be downloaded from Google Drive

Citation

If you find this work useful in your research, please consider citing:

@article{ye2024matchxml,
  title={MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification},
  author={Ye, Hui and Sunderraman, Rajshekhar and Ji, Shihao},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2024},
  publisher={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MatchXML

Install the environment

Prepare Data

Train MatchXML and evaluation

Train label2vec

Generate static text features

Pre-trained models

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

MatchXML

Install the environment

Prepare Data

Train MatchXML and evaluation

Train label2vec

Generate static text features

Pre-trained models

Citation