Project for Machine Learning on Real World Networks
- The
docs
folder contains all of the reports - M0, M1, M2, M3- The final report is under the
M3
folder
- The final report is under the
- The
graph_creation
folder contains code that creates the5-Day Contact Network
and analyzes itmain.ipynb
creates the contact networkanalysis.ipynb
analyzes the contact network- The code can be run by executing the code blocks in the
.ipynb
files
- The
link_prediction
folder contains code for static link prediction as well as node feature generation for the nodes in the5-Day Contact Network
feature_selection_method1
contains code to generateaverage locations travelled to per day
andaverage distance travelled per day
for each nodefeature_selection_method2
contains code to extractage
,gender
, andSAG score
for each nodefeature_combination
contains code that combines all features and normalizes themmain.ipynb
contains the code to train the static link prediction model on the 5-Day Contact Networklink_prediction_next_day.ipynb
contains the code to perform link prediction on the 6th day using the static link prediction model- The code can be run by executing the code blocks in the
.ipynb
files
- The
link_prediction_graph_creation
folder contains code that creates theTemporal Contact Networks
, performs analysis on those networks, generates node features for nodes in theTemporal Contact Networks
, formats the 62 contact networks to be able to be used by the TGN model, and performs an SIR simulation on theTemporal Contact Networks
main.py
creates the 62 contact networks, one for each day from July 1st, 2020 to August 31st, 2020.feature_selection_method1
contains code to generateaverage locations travelled to per day
andaverage distance travelled per day
for each nodefeature_selection_method2
contains code to extractage
,gender
, andSAG score
for each nodefeature_combination
contains code that combines all features and normalizes themgenerate_link_pred_input.ipynb
contains code that utilizes the Temporal Contact Networks and their features to generate a data format that can be used by the TGN link prediction framework (convert.gexf
files tocsv
andnpy
files)simulation.py
contains code that uses theTemporal Contact Networks
to run an SIR simulation- The code can be run by executing the code blocks in the
.ipynb
files
- The
tgn
folder contains the code (which is open source and created by the authors or TGN) that is used for the temporal link prediction- Original TGN GitHub:
https://github.com/twitter-research/tgn
- Original TGN Paper:
https://arxiv.org/abs/2006.10637
data
contains data used by the framework, which includes edge data and node data- Data, such as node features and edge information in csv format, was created in the
link_prediction_graph_creation/generate_link_pred_input.ipynb
and imported to thistgn/data
directory
- Data, such as node features and edge information in csv format, was created in the
evaluation
contains code what evaluates the model. We modified this to plot the ROC AUC curve and generate the confusion matrix that can be used to determine recall- We modified this file to generate additional result statistics, such as the confusion matrix, and plot the results
modules
contains code for the main modules in the TGN frameworkmemory.py
- stores memory for the nodesmessage_function.py
- generates messages (information) regarding each node interaction (such as creation of edges)message_aggregator.py
- combines a batch of messages to keep one per nodememory_updater.py
- updates the memory using the current memory and new messagesembedding_module.py
- generates node embeddings by using the node interaction (such as creation of edges) and the node memory
model
contains code for the model that is used by the embedding module to create node embeddingstrain_self_supervised.py
contains the code to train and test the temporal link prediction model- The model can be trained and tested by running the following command:
python3 train_self_supervised.py -d data --use_memory --prefix tgn-attn-data --n_runs 10
- The model can be trained and tested by running the following command:
generate_visuals.ipynb
contains the code to generate charts and graphs such as ROC Curve and Precision-Recall Curve to visualize the results- The code can be run by executing the code blocks in the
.ipynb
files
- Original TGN GitHub: