Demonstration of REIGNN - REcommender Inductive Graph Neural Network | Version 1

Welcome to the official repo of the REIGNN demonstration app -- web service for scientific collaborations assessment powered by a GNN-based recommender system. Here we present the source code for the CIKM'22 paper "Demonstration of REIGNN: R&D Team Management with Graph Machine Learning".

Artyom Sosedka, Anastasia Martynova, Vladislav Tishin, Natalia Semenova, Vadim Porvatov

arXiv PDF: to be added.

Prerequisites

For the REIGNN model:

numpy==1.19.5
pandas==1.3.0
torch==1.8.1
torch-sparse==0.6.12  
torch-scatter==2.0.8
torch-cluster==1.5.9
torch-spline-conv==1.2.1
torch-geometric==2.0.3
wandb==0.12.9

Attention -- installing PyTorch Geometric for GPU can be tricky.

For backend:

python = "~3.8"
fastapi = "^0.73.0"
pydantic = { extras = ["dotenv"], version = "^1.8.2" }
uvicorn = "^0.17.4"
cookiecutter = "^1.7.3"
loguru = "^0.6.0"
pandas = "^1.4.0"
scipy = "^1.8.0"
networkx = "^2.6.3"
mypy = "^0.931"
ujson = "^5.1.0"

For frontend:

TBD

Dataset

We propose a unique dataset that could be used for the evaluation of the REIGNN model. The initial data was gathered from the Semantic Scholar Open Research Corpus and SCImago Journal & Country Rank website.

CS1021_general
Network type	Co-authorship	Citation
Nodes	2798829	2504381
Edges	30796749	17934023
Clustering	0.714	0.138

CS1021_demo
Network type	Co-authorship	Citation
Nodes	3808	12618
Edges	31332	34975
Clustering	0.644	0.199

In order to obtain the full dataset, it is required to download additional files via download.sh. The final revision of the file structure includes general and local parts of the dataset.

General part

We use the large subgraph CS1021_general extracted from Semantic Scholar Corpus as the basis for the CS1021_demo dataset. All papers belong to the period from January 1st, 2011 to December 31st, 2021, and related to the area of Computer Science.

SSORC_CS_2010_2021_authors_edge_list.csv - common graph edge list.
SSORC_CS_2010_2021_authors_edges_papers_indices.csv - common table describing relations between edges in a co-authorship graph (collaborations) and nodes in a citation graph (papers).
SSORC_CS_2010_2021_authors_features.csv - table with one-hot encoded authors' research interests.
SSORC_CS_2010_2021_papers_features_vectorized_compressed_32.csv - table with vectorized via Universal Sentence Encoder abstracts of papers.

Local part

<...>authors.edgelist - edge list of a dataset citations graph.
<...>papers.edgelist - edge list of a dataset co-authorship graph.
<...>authors_edges_papers_indices.csv - table describing relations between edges in a co-authorship graph (collaborations) and nodes in a citation graph (papers).
<...>papers_targets.csv - target values for each auxiliary task regarding edges in a co-authorship graph.

App running

In order to create .venv and install dependencies for the app running, you need to execute .backend/setup.sh. For server launch, it is required to use .backend/start.sh.

Model running

The multitask version of the REIGNN model is available at model/REIGNN.py. If you want to perform tests in a separate environment, you can use the following code:

import torch
import torch.nn as nn

from model.dataloader import get_data
from model.utils import run
from model.REIGNN import REIGNN

root_dir = '../'
dataset_name, split_name, split_number = 'CS1021small', '5_0.1', 0
citation_graph, train_data, val_data, test_data, authors_to_papers, batch_list_x, batch_list_owner = get_data(root_dir, dataset_name, split_name, split_number)

# Global
epochs_per_launch, lr = 15000, 0.001
device = 'cuda:0'

# Local
c_conv_num, c_latent_size, a_conv_num, a_latent_size = 2, 128, 3, 384
operator, link_size, heads = "hadamard", 128, 1 

# Multitask weights
mt_weights = [0.05, 0.05, 0.05, 0.05]

# W&B parameters
wandb_output, project_name, entity, group  = False, 'REIGNN', 'test_entity', 'test_group'

# define the model
model = REIGNN(citation_graph.to(device), heads, device,\
                            train_data.to(device), val_data.to(device), test_data.to(device),
                            authors_to_papers,
                            cit_layers = c_conv_num, latent_size_cit = c_latent_size,
                            auth_layers = a_conv_num, latent_size_auth = a_latent_size,
                            link_size = link_size).to(device) 

optimizer, criterion = torch.optim.Adam(model.parameters(), lr=lr), nn.L1Loss()
run(wandb_output, project_name, group, entity, mt_weights, model, optimizer, criterion, operator, batch_list_x, batch_list_owner, epochs_per_launch)

Contact us

If you have some questions about the code, you are welcome to open an issue or send me an email, I will respond to that as soon as possible.

License

Established code released as open-source software under the MIT license.

Citation

To be added

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
datasets/CS1021demo		datasets/CS1021demo
frontend		frontend
model		model
resources		resources
LICENSE		LICENSE
README.md		README.md
download.sh		download.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demonstration of REIGNN - REcommender Inductive Graph Neural Network | Version 1

Prerequisites

Dataset

General part

Local part

App running

Model running

Contact us

License

Citation

About

Releases

Packages

Contributors 2

Languages

License

4d5645/REIGNN_demo

Folders and files

Latest commit

History

Repository files navigation

Demonstration of REIGNN - REcommender Inductive Graph Neural Network | Version 1

Prerequisites

Dataset

General part

Local part

App running

Model running

Contact us

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages