MatchXML

This is the official repo of the paper MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification

Install the environment

Create a virtual environment

# We recommend you to use Anaconda to create a conda environment 
conda create --name matchxml python=3.8
conda activate matchxml

Install the required software:
```
pip install -r requirements.txt
```

Prepare Data

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

Download six XMC datasets from XR-Transformer
Download our trained label embeddings from Google Drive and save them to xmc-base/{dataset}
Download our static text features(static sentence embeddings + TF-IDF features) from Google Drive and save them to xmc-base/{dataset}/tfidf-attnxml, replace the original TF-IDF features.

Train MatchXML and evaluation

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

bash run.sh {dataset}

Train label2vec

# eurlex-4k, wiki10-31k, amazoncat-31k, wiki-500k, amazon-670k, amazon-3m

bash ./label2vec_run/{dataset}.sh

Generate static text features

python sentence_embedding.py

Pre-trained models

Our pre-trained models can be downloaded from Google Drive

Citation

If you find this work useful in your research, please consider citing:

@article{ye2024matchxml,
  title={MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification},
  author={Ye, Hui and Sunderraman, Rajshekhar and Ji, Shihao},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2024},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
label2vec_run		label2vec_run
params		params
README.md		README.md
cl_loss.py		cl_loss.py
ensemble_evaluate.py		ensemble_evaluate.py
label2vec.py		label2vec.py
matcher.py		matcher.py
model.py		model.py
predict.py		predict.py
requirements.txt		requirements.txt
run.sh		run.sh
sentence_embedding.py		sentence_embedding.py
smat_util.py		smat_util.py
train.py		train.py
train_and_predict.sh		train_and_predict.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MatchXML

Install the environment

Prepare Data

Train MatchXML and evaluation

Train label2vec

Generate static text features

Pre-trained models

Citation

About

Releases

Packages

Contributors 2

Languages

huiyegit/MatchXML

Folders and files

Latest commit

History

Repository files navigation

MatchXML

Install the environment

Prepare Data

Train MatchXML and evaluation

Train label2vec

Generate static text features

Pre-trained models

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages