Skip to content

JoshuaPlacidi/fun-gai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Fun-GAI: hallucinating mushrooms with generative AI

Authors: Joshua Placidi, Sara Sabzikari, Vincenzo Incutti, Ka Yeon Kim

Introduction

This is a project originally built in 12 hours for a Biology + Generative Artifical Intelligence Hackathon. We trained a variational auto-encoder to learn a latent space represenation of the physiology of mushrooms, the fruiting body of fungi.

It has been estimated that more than 90% of all fungal species have yet to be described by science 1. We built and trained a VAE from scratch to synthesis what new, yet undiscovered, mushrooms could look like. This project was built as a fun exploration into how an auto-encoder model learns to represent images of mushrooms in a latent space. The culmination of our work can be seen in the gifs at the top of the page.

Biological

In our project, we delved into the intriguing world of fungi, specifically focusing on the physiology of their fruiting bodies - mushrooms.

We recognized that mushrooms, as the visible fruiting body of fungi, offered a more accessible means of differentiating mushroom-producing species compared to solely examining mycelium/mycelial networks. Some distinctive physical characteristics are:

  • shape
  • color
  • gill type

These are valuable markers for identifying and categorizing different species. However, relying solely on morphology for fungal classification may overlook substantial biological information inherent in these organisms. Therefore, acknowledging the limitations of morphology-based classification, we recognise incorporation of genomic, environmental and other data would provide a more accurate means of classifying mushrooms, and exploring viable hypothetical species in the latent space.

We demonstrate the feasibility and potential of using the latent space representation of physiological variables as a proof of concept. This approach opens up exciting possibilities for exploring unknown species and broadening our understanding of the diverse world of fungi.

Technical

VAEs learn in a self-supervised manner to predict their own input, given an input $X^{3 \times 224 \times 224}$ the model produces an output $\hat{X}^{3 \times 224 \times 224}$ with the objective of minimising the difference between $X$ and $\hat{X}$. The model has an encoder-decoder structure with a bottle neck in the middle, the bottle neck forces the encoder to learn to compress the input into a latent representation $z = encoder(X)$ which is then given to the decoder to try and project back to the original input $\hat{X} = decoder(z)$. The idea is that the VAE has to learn extract useful information to store in $z$. We train the model to minimize the reconstruction loss which is measured as the mean-squared error between $X$ and $\hat{X}$. Additionally VAEs add a KL-divergence term to the loss function, encouraging the model to learn normally distributed latents, thus giving a more coherent latent space and enabling generative sampling.

To generate new samples, $Y$, using our learned model we simply pass a randomly initialised latent $z_i \sim \mathcal{N}(0,1)$ to the decoder: $Y = decoder(z)$.

We used two datasets:

We ran a pretraining cycle using the Danish Fungi dataset, and then finetuned the model on the Mushroom Common Genus dataset. We used the following hyperparameters:

  • batch_size: 64
  • initial_learning_rate: $1 \times 10^{-4}$
  • num_epochs: 20
  • split_ratio: 0.9