Graph Structure Learning Experiments

Toy project for evaluating the applicability of Graph Structure Learning on different tasks:

Metro-lines traffic prediction
Individual whale movement
Whale search trends across countries

Graph Structure Learning

Graph Structure Learning (GSL) is a method that aims at jointly learning the structure of a graph and an objective that relies on it.

This short article explains it in theory on my blog.

🚉 Metro-lines traffic prediction

Data

The data for this task has been generated to allow more freedom and clarity on the role of GSL in the obtained results.

A Markov model framework has been used to generate the data using a pre-defined graph structure:

Define a graph structure: create edges & nodes, and normalize edge out-weights to sum to 1 (including self edges).

Define an initial state: the number of people at each station.
For n iterations simulate the Markov process: For each node, redistribute the population to other nodes according to weights.

🐋 Whale search trends

In this task, we will try demonstrating we can predict whale search trends using the MTGNN model from Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks, a time-series forecasting paper learning a graph on the given multi-series.

To showcase that ability, we will use Google Trends data on the search frequency of the subject: Whale (Animal).

This should give a proxy on whale apparitions on the coasts of different countries.

We hope the model can learn to predict search frequency in different locations and learn a relevant underlying graph structure along the way.

📈 The Data

The data we use is both noisy and a naive approximation of real whale apparition frequency due to both bias in Google use in different places, accessibility and popularity of whale observation. However, the data is rich enough and allows for a fun exploration of Graph Structure Learning 😄

Data Acquisition

Google Trends data is publicly available and can be scrapped using its API. We use the PyTrends library to facilitate the gathering of this data in a pandas data frame.

The frequency of acquisition varies and averages around a point every 5 days.

Data Processing

Data from Google Trends is already pre-processed for taking values in a [0-100] interval:

Les résultats reflètent la proportion de recherches portant sur un mot clé donné dans une région et pour une période spécifiques, par rapport à la région où le taux d'utilisation de ce mot clé est le plus élevé (valeur de 100). Ainsi, une valeur de 50 signifie que le mot clé a été utilisé moitié moins souvent dans la région concernée, et une valeur de 0 signifie que les données pour ce mot clé sont insuffisantes.

For our learning objective, we will add some more processing:

Standardization: We standardize the series location by location, not only to center data around 0, but also in an attempt to remove bias over different Google usage at different locations.
Moving average for smoothing values.
Upsampling: we artificially increase the number of ticks in the dataset from a ~5d frequency to an hourly frequency.

🤖 The Model

The model in use is the MTGNN as implemented in the PyTorch Geometric Temporal library.

PyTorch Geometric Temporal is "a temporal graph neural network extension library for PyTorch Geometric.". It provides direct access to this model, from its catalog.

Hyperparameters

One of the biggest challenges for us is overfitting. The model, by default, is quite large compared to our dataset.

We will therefore use different methods to try and regularize our model:

Adam optimizer with weights decay (L2 regularization)
Large dropout value
Upsampling of the data as mentioned above
Small hidden dimensions

📊 Results

The model manages to learn the time series easily knowing that it has a very periodic behavior.

However, the model's learned graph does not have to show cluster behavior to succeed since any time series does bring the same kind of information.

🐋 Whale movement

A second task will be to try and predict the movement of individual whales in the ocean.

To do that, we will simplify the context, and use the h3 library to discretize movement into hexagons.

Data

Data Acquisition

The data comes from the Movebank database.

Movebank is a free, online database of animal tracking data hosted by the Max Planck Institute of Animal Behavior.

📊 Results

[TODO]

Article

These experiments helped me in writing this article on the practical use of GSL for Metro Network Structure Learning.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
imgs		imgs
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Structure Learning Experiments