Skip to content

KDD23 - Classification of Edge-Dependent Labels of Nodes in Hypergraphs

Notifications You must be signed in to change notification settings

young917/EdgeDependentNodeLabel

Repository files navigation

Classification of Edge-dependent Labels of Nodes in Hypergraphs

We provide (1) datasets and source code for (2) benchmark task, (3) downstream task and (4) ablation study of WHATsNET in the paper: Classification of Edge-Dependent Labels of Nodes in Hypergraphs, Minyoung Choe, Sunwoo Kim, Jaemin Yoo, and Kijung Shin, KDD 2023.

(1) Datasets

We provide six real-world datasets for our new benchmark task(/dataset/) and preprocessing code (/dataset/PreprocessCode/)

  • Co-authorship : DBLP and AMinerAuthor
  • Email : Enron and Eu
  • StackOverflow: Biology and Physics
# File Organization

|__ hypergraph.txt              # used for constructing hypergraph; i-th line indicates i-th hyperedge includes v_1, v_2, ...
|__ hypergraph_pos.txt          # used for edge-dependent node labels; i-th line indicates depending on i-th hyperedge v_1's label, v_2's label, ... (same order as hypergraph.txt)
|__ [valid/test]_hindex_0.txt   # used for splitting train/valid/test

(2) Benchmark Task

We provide source code for running WHATsNET as well as nine competitors in all the above benchmark datasets

  • BaselineU and BaselineP
  • HNHN, HGNN, HCHA, HAT, UniGCNII, HNN
  • HST, AST
  • WHATsNET

(3) Downstream Task

We apply our benchmark task on the following downstream tasks,

(4) Reproducing ALL results in Paper

  • Ablation Studies of WHATsNET
  • w/o WithinATT and WithinOrderPE
  • WHATsNET-IM
  • Positional encodings schemes
  • Replacing WithinATT in updating node embeddings
  • Number of inducing points
  • Types of node centralities
  • Visualization of WHATsNET
  • Evaluation on Node Label Distribution Preservation of WHATsNET

How to Run

Preprocessing

Before training WHATsNET, calculating node centralities is required

cd preprocess
python nodecentrality.py --algo [degree,kcore,pagerank,eigenvec] --dataname [name of dataset]

Run WHATsNET

You can

  • train WHATsNET
  • evaluate WHATsNET on JSD of node-level label dist.
  • predict edge-dependent node labels by trained WHATsNET
  • analysis node embeddings for visualization: concatenated embeddings of a node and hyperedge pair, node embeddings before/after WithinATT

by following below code,

python train.py/evaluate.py/predict/analysis.py  --vorder_input "degree_nodecentrality,eigenvec_nodecentrality,pagerank_nodecentrality,kcore_nodecentrality" 
                                                 --embedder whatsnet --att_type_v OrderPE --agg_type_v PrevQ --att_type_e OrderPE --agg_type_e PrevQ 
                                                 --dataset_name [name for dataset]
                                                 --num_att_layer [number of layers in WithinATT]
                                                 --num_layers [number of layers] 
                                                 --bs [batch size]
                                                 --lr [learning rate]
                                                 --sampling [size of sampling incident hyperedges in aggregation at nodes]
                                                 [--analyze_att  when running analysis.py]
                                                 --scorer sm --scorer_num_layers 1 --dropout 0.7 --optimizer "adam" --k 0 --gamma 0.99 --dim_hidden 64 --dim_edge 128 --dim_vertex 128 --epochs 100 --test_epoch 5

Run Benchmark Tasks

You can run all ten models for each dataset(DBLP,AMinerAuthor,emailEnron,emailEu,StackOverflowBiology,StackOverflowPhyscis) by

cd run
./run_[DBLP,AMinerAuthor,emailEnron,emailEu,StackOverflowBiology,StackOverflowPhyscis].sh

We set hyperparameters of each model chosen by the best mean of Micro-F1 and Macro-F1 from the search space

Run Downstream Tasks

We provide edge-dependent node labels predicted by WHATsNET as well as AST and HST in train_results/

We also provide shell scripts for all-in-one process (train, predict and evaluate on the downstream task) in run/DownstreamTask/

You can run three downstream tasks with WHATsNET and baselines by

  • Ranking Aggregation: In the RankingAggregation directory, run ranking.py for Halo2 game dataset and run aminer_ranking.py for AMiner dataset with author H-index
  • Clustering: In the Clustering directory, run clustering.py for DBLP and run clustering_aminer.py for AMiner
  • Product Return Prediction: In the ProductReturnPred directory, make synthetic dataset by makedata/Simulate data.ipynb and prepare dataset for training models by our benchmark task through makedata/MakeHypergraph.ipynb. After training models, run makedata/prepare_predicted.py and evaluate them by script/main_prod.py

Run Ablation Studies

You can also run all ablation studies of WHATsNET by

cd run
./run_ablation.sh
./run_ablation_centrality.sh

Environment

The environment of running codes is specified in requirements.txt Additionally, install required libraries following install.sh

About

KDD23 - Classification of Edge-Dependent Labels of Nodes in Hypergraphs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published