DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

Dataset

All eight dynamic text-attributed graphs provided by DTGB can be downloaded from here.

Data Format

Each graph is preserved through three files.

edge_list.csv: stores each edge in DyTAG as a tuple. i.e., (u, v, r, t, l). u is the id of the source entity, v is the id of the target entity, r is the id of the relation between them, t is the occurring timestamp of this edge, l is the label of this edge.
entity_text.csv: stores the mapping from entity ids (e.g., u and v) to the text descriptions of entities.
relation_text.csv: stores the mapping from relation ids (e.g., r) to the text descriptions of relations.

Usage

After downloading the datasets, they should be uncompressed into the DyLink_Datasets folder.
Run get_pretrained_embeddings.py to obtain the Bert-based node and edge text embeddings. They will be saved as e_feat.npy and r_feat.npy respectively.
Run get_LLM_data.ipynb to get the train and test set for the textual relation generation task. They will be saved as LLM_train.pkl and LLM_test.pkl respectively.

Reproduce the Results

Future Link Prediction Task

Example of training DyGFormer on GDELT dataset without text attributes:

python train_link_prediction.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0 --use_feature no

Example of training DyGFormer on GDELT dataset with text attributes:

python train_link_prediction.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0 --use_feature Bert

The AP and AUC-ROC metrics on the test set (both transductive setting and inductive setting) will be automatically saved in saved_resuts/DyGFormer/GDELT/DyGFormer_seed0no.json
The best checkpoint will be saved in saved_resuts/DyGFormer/GDELT/ folder, and the checkpoint will be used to reproduce the performance on the node retrieval task.

Destination Node Retrieval Task

After obtaining the best checkpoint on the Future Link Prediction Task. The Hits@k metrics of the Destination Node Retrieval Task can be reproduced by running:

python evaluate_node_retrieval.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --negative_sample_strategy random --num_runs 5 --gpu 0  --use_feature no

The negative_sample_strategy hyper-parameter is used to control the candidate sampling strategies, which can be random and historical.
The use_feature hyper-parameter is used to control whether to use Bert-based embeddings, which can be no and Bert.

Edge Classification Task

Example of training DyGFormer on GDELT dataset without text attributes:

python train_edge_classification.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0 --use_feature no

The Precision, Recall, and F1-score metrics on the test set will be automatically saved in saved_resuts/DyGFormer/GDELT/edge_classification_DyGFormer_seed0no.json

Textual Relation Generation Task

After obtaining the LLM_train.pkl and LLM_test.pkl files. You can directly reproduce the performance of original LLMs by running

python LLM_eval.py -config_path=LLM_configs/vicuna_7b_qlora_uncensored.yaml -model=raw

You can change the LLMs through the config_path hyper-parameter.
The generated text will be saved in s_his_o_des_his_result_vicuna7b.pkl.

And then to get the Bert_score metrics, you should change the file path in LLM_metric.py and run:

python LLM_metric.py

If you want to fine-tune the LLMs, you should run:

python LLM_train.py LLM_configs/vicuna_7b_qlora_uncensored.yaml

and then reproduce the performance of the fine-tunned LLMs by running

python LLM_eval.py -config_path=LLM_configs/vicuna_7b_qlora_uncensored.yaml -model=lora

Contact

For any questions or suggestions, you can use the issues section or contact us at ([email protected]).

Acknowledge

Codes and model implementations are referred to DyGLib project. Thanks for their great contributions!

Reference

@article{zhang2024dtgb,
  title={DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs},
  author={Zhang, Jiasheng and Chen, Jialin and Yang, Menglin and Feng, Aosong and Liang, Shuang and Shao, Jie and Ying, Rex},
  journal={arXiv preprint arXiv:2406.12072},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
LLM_configs		LLM_configs
models		models
utils		utils
.gitignore		.gitignore
LLM_eval.py		LLM_eval.py
LLM_metric.py		LLM_metric.py
LLM_train.py		LLM_train.py
README.md		README.md
data_visual.ipynb		data_visual.ipynb
evaluate_edge_classification.py		evaluate_edge_classification.py
evaluate_models_utils.py		evaluate_models_utils.py
evaluate_node_retrieval.py		evaluate_node_retrieval.py
get_LLM_data.ipynb		get_LLM_data.ipynb
get_pretrained_embeddings.py		get_pretrained_embeddings.py
requirements.txt		requirements.txt
train_edge_classification.py		train_edge_classification.py
train_link_prediction.py		train_link_prediction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

Dataset

Data Format

Usage

Reproduce the Results

Future Link Prediction Task

Destination Node Retrieval Task

Edge Classification Task

Textual Relation Generation Task

Contact

Acknowledge

Reference

About

Releases

Packages

Languages

szyszyzys/DTGB

Folders and files

Latest commit

History

Repository files navigation

DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

Dataset

Data Format

Usage

Reproduce the Results

Future Link Prediction Task

Destination Node Retrieval Task

Edge Classification Task

Textual Relation Generation Task

Contact

Acknowledge

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages