Neural State Machine implemented in PyTorch as presented here. This is the first code implementation of the model and is based on V1 of the paper on arxiv. As can be expected, the code is incomplete and makes several assumptions where the paper wasn't clear enough. For the time being this code is not ready to run and several steps are needed for it to train on any VQA dataset. Principal among these is the need for a functioning graph-rcnn to generate the scene graphs.
In the meantime, I hope the code helps readers understand better the paper, and I'm open to any collaborators who wish to help with features or efficiency.
We use Facebook's implementation of maskrcnn: maskrcnn-benchmark. To compile this code use the following snippet.
# pytorch, apex and maskrcnn_benchmark must be compiled with the same version of CUDA.
cd maskrcnn_benchmark
python setup.py build develop