Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensornet representation #167

Closed
wants to merge 33 commits into from
Closed

Conversation

MarshallYan
Copy link
Collaborator

@MarshallYan MarshallYan commented Jun 24, 2024

Description

Implement modelforge TensorNetRepresentation module, which matches the output of TensorEmbedding forward. In original implementation, TensorEmbedding is used by TensorNet class, where forward is called whenever updating TensorNet. The output of TensorEmbedding corresponds to "X" tensor in the TensorNet paper.

Todos

  • Set up TensorNet embedding module
  • Implement modelforge TensorNetRepresentation
  • Rename parameters from TensorNet with more intuitive names
  • Match output X tensors

Questions

  • static_shapes
  • What does the steps mean when static_shapes==True?

Status

  • Ready to go

chrisiacovella and others added 13 commits June 17, 2024 13:48
…y, the array of atomic species was assumed to be unique; now we consider the order of the species array as well. Only entries with the same atomic species array that appear in order are considered to be the same molecule for purposes of grouping conformers. That is, if the same species array is encountered later in the array, it is treated as a new molecule. Molecule names are not just the string made from their species array, but also have a molecule number appended to them.
This revises how unique molecules are determined for ANI2x and updates dataset yaml files.
@MarshallYan MarshallYan requested a review from wiederm June 24, 2024 00:05
@MarshallYan
Copy link
Collaborator Author

@wiederm
How edge_index, edge_vec, edge_weight are calculated are uncertain. The problem being:

  1. edge_weight may not have units as distances, which causes confusion in the unit transformations in RSF.
  2. The calculation is defined in torchmdnet.extensions.__init__, where torch.ops.torchmdnet_extensions.get_neighbor_pairs is called. I am unable to further trace back how this function is implemented.

@MarshallYan
Copy link
Collaborator Author

@wiederm
It seems to me that input wrappers (NNPInput and *NeuralNetworkData) are trying to generalize the information needed to form a model in separated layers. However, I can't find where box and batch in TensorNet should be put. Which data structure should be modified? Or how do we manage input that couldn't be generalized in designed ways?

Just as a reminder, edge_index, edge_weight, edge_vec = self.distance(pos, batch, box) is in the forward function of class TensorNet in torchmd-net.

@MarshallYan
Copy link
Collaborator Author

@wiederm
TensorNet uses pairlist with duplicated permute indices (0-1 and 1-0 are both included) and has a different order of these indices. But other than that, it seems that edge_weight==d_ij; edge_vec==r_ij. But is there an easier way to verify that? torch.allclose doesn't work directly since they are not the same in terms of dimensions and order.

@MarshallYan
Copy link
Collaborator Author

MarshallYan commented Jul 2, 2024

@wiederm
I added a representation unit system to calculate the model with both angstrom and nanometer. Angstrom is used to compare the results with TensorNet, while I am also curious whether using nanometer inside makes a difference during training.

The only change I made that may affects 'main' branch is that 'CosineCutoff' now has a 'representation_unit' parameter that is set default with 'unit.nanometer'. (So I would assume that this won't even change anything outside tensornet.)

  • Does ANI use nanometer by their original implementation? If not, how did you manage to test ANI without using some inside unit transformation?

@wiederm
Copy link
Member

wiederm commented Jul 3, 2024

It seems to me that input wrappers (NNPInput and *NeuralNetworkData) are trying to generalize the information needed to form a model in separated layers. However, I can't find where box and batch in TensorNet should be put. Which data structure should be modified? Or how do we manage input that couldn't be generalized in designed ways?

We discussed this offline: edge_weight is the pairwise distance, and edge_vec is the pairwise distance vector. Box vector and batch are passed to the input and don't need to be generated. The data structure is the NNPInput dataclass.

TensorNet uses pairlist with duplicated permute indices

So does modelforge! All pairwise interactions are listed, that means every atom $i$ encounters every atom $j$ withing.a neighborhood.

@wiederm
Copy link
Member

wiederm commented Jul 3, 2024

ANI use nanometer by their original implementation

No, ANI uses angstrom. Testing is pretty simple, if you have a class that implements some transformation you can simply do:

input_data_in_angstrom = torch.abs(torch.randn(5,1)) * 5 # generate random data on interval [0,5]
reference_implementation = RefClass()
modelforge_implementation = MfClass()

# the rbf is defined on an interval [min_distance, max_distance]. 
# you can convert between rbfs in different length units by scaling the rbfs, 
# therefore you can convert from nm to A by multiplying with 10   

assert torch.allclose(modelforge_implementation(input_data_in_angstrom/10)*10, reference_implementation(input_data_in_angstrom)) 

@MarshallYan
Copy link
Collaborator Author

@wiederm
I may found a bug in TensorNet. They did apply cutoff function twice in their code, which is inconsistent with their paper. I currently sticked to they way they coded it, but we may need a further discussion on this issue.

In class 'ExpNormalSmearing':

def forward(self, dist):
    dist = dist.unsqueeze(-1)
    return self.cutoff_fn(dist) * torch.exp(
        -self.betas
        * (torch.exp(self.alpha * (-dist + self.cutoff_lower)) - self.means) ** 2
    )

The output of this forward function is called 'edge_attr' in 'TensorEmbedding'. This should be $e_k^{RBF}$ in their paper, without cutoff function applied.

However, in 'TensorEmbedding':

def _get_tensor_messages(
    self, Zij: Tensor, edge_weight: Tensor, edge_vec_norm: Tensor, edge_attr: Tensor
) -> Tuple[Tensor, Tensor, Tensor]:
    C = self.cutoff(edge_weight).reshape(-1, 1, 1, 1) * Zij
    eye = torch.eye(3, 3, device=edge_vec_norm.device, dtype=edge_vec_norm.dtype)[
        None, None, ...
    ]
    Iij = self.distance_proj1(edge_attr)[..., None, None] * C * eye
    Aij = (
        self.distance_proj2(edge_attr)[..., None, None]
        * C
        * vector_to_skewtensor(edge_vec_norm)[..., None, :, :]
    )
    Sij = (
        self.distance_proj3(edge_attr)[..., None, None]
        * C
        * vector_to_symtensor(edge_vec_norm)[..., None, :, :]
    )
    return Iij, Aij, Sij

Cutoff function C is timed again to I, A, and S.

@MarshallYan
Copy link
Collaborator Author

@wiederm
The other thing that bothers me is that mathematically, Iij shouldn't be initialized with identity matrix, rather, it should be $\frac{1}{3}\rVert v\lVert^2\cdot I$ based on my calculation.

@wiederm I may found a bug in TensorNet. They did apply cutoff function twice in their code, which is inconsistent with their paper. I currently sticked to they way they coded it, but we may need a further discussion on this issue.

In class 'ExpNormalSmearing':

def forward(self, dist):
    dist = dist.unsqueeze(-1)
    return self.cutoff_fn(dist) * torch.exp(
        -self.betas
        * (torch.exp(self.alpha * (-dist + self.cutoff_lower)) - self.means) ** 2
    )

The output of this forward function is called 'edge_attr' in 'TensorEmbedding'. This should be ekRBF in their paper, without cutoff function applied.

However, in 'TensorEmbedding':

def _get_tensor_messages(
    self, Zij: Tensor, edge_weight: Tensor, edge_vec_norm: Tensor, edge_attr: Tensor
) -> Tuple[Tensor, Tensor, Tensor]:
    C = self.cutoff(edge_weight).reshape(-1, 1, 1, 1) * Zij
    eye = torch.eye(3, 3, device=edge_vec_norm.device, dtype=edge_vec_norm.dtype)[
        None, None, ...
    ]
    Iij = self.distance_proj1(edge_attr)[..., None, None] * C * eye
    Aij = (
        self.distance_proj2(edge_attr)[..., None, None]
        * C
        * vector_to_skewtensor(edge_vec_norm)[..., None, :, :]
    )
    Sij = (
        self.distance_proj3(edge_attr)[..., None, None]
        * C
        * vector_to_symtensor(edge_vec_norm)[..., None, :, :]
    )
    return Iij, Aij, Sij

Cutoff function C is timed again to I, A, and S.

@MarshallYan
Copy link
Collaborator Author

@wiederm
The pairwise indices used in torchmd-net is different from that in tensornet, resulting in different dimensions inputed into nn.Linear, causing different output.

In modelforge, if we have 3 atoms [0, 1, 2, 3], pair_indices is: [[0, 0, 0, 1, 1, 2], [1, 2, 3, 2, 3, 3]]; meanwhile, in torchmd-net, edge_index is: [[1, 2, 2, 3, 3, 3, 0, 0, 1, 0, 1, 2], [0, 0, 1, 0, 1, 2, 1, 2, 2, 3, 3, 3]]. It is not a big problem since I just need to attach the other way around to the end. However, the radial symmetry vector corresponding to the indices needs to be input into a nn.Linear layer, and in the previous example, the output contains 6 v.s. 12 atom pairs.

The real problem here is that in torchmd-net, since the first 6 pairs are the same as the last 6 pairs, I assume the output of that linear layer should also be the case (the first 6 tensors are the same as the last 6 tensors), while it's not the case. Thus in later summation, results from two ways are different.

One way that can definitely fix this problem is to convert the modelforge index system to the torchmd-net one inside representation module. But I think that may not be what we want in modelforge. We should discuss how we can solve the problem.

@wiederm
Copy link
Member

wiederm commented Jul 5, 2024

Note the boolean set in the Pairlist:

def __init__(self, only_unique_pairs: bool = False):

If this is set to False (the default property) it will return the same number of pairs as the pairlist in tensornet. ANI is an outlier and requires this boolean to be set to True.

@wiederm
Copy link
Member

wiederm commented Jul 5, 2024

I am not compley sure that I follow this:

The real problem here is that in torchmd-net, since the first 6 pairs are the same as the last 6 pairs, I assume the output of that linear layer should also be the case (the first 6 tensors are the same as the last 6 tensors), while it's not the case. Thus in later summation, results from two ways are different.

Can you provide a minimum example that examplifies your concern?

@wiederm
Copy link
Member

wiederm commented Jul 5, 2024

If possible, it might be very useful to discuss this in a working test

…ensornet_representation

# Conflicts:
#	modelforge/potential/tensornet.py
#	modelforge/tests/test_tensornet.py
@MarshallYan
Copy link
Collaborator Author

Note the boolean set in the Pairlist:

def __init__(self, only_unique_pairs: bool = False):

If this is set to False (the default property) it will return the same number of pairs as the pairlist in tensornet. ANI is an outlier and requires this boolean to be set to True.

Thanks! It has been fixed by now.

@@ -10,11 +14,23 @@
from modelforge.potential.utils import TensorNetRadialSymmetryFunction
from modelforge.potential.utils import NeuralNetworkData
from modelforge.potential.utils import NNPInput
from .models import PairListOutputs

ATOMIC_NUMBER_TO_INDEX_MAP = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest importing this from potential/ani2x.py

self.rsf_projection_I = nn.Linear(
number_of_radial_basis_functions,
hidden_channels,
dtype=dtype,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't have to set dtype here. All parameters in sub-models are automatically registered as nn.parameters and device and type can be set for each of the parameter at runtime.

Comment on lines +172 to +191
self.rsf_projection_A = nn.Linear(
number_of_radial_basis_functions,
hidden_channels,
dtype=dtype,
)
self.rsf_projection_S = nn.Linear(
number_of_radial_basis_functions,
hidden_channels,
dtype=dtype,
)
self.atomic_number_i_embedding_layer = nn.Embedding(
max_atomic_number,
hidden_channels,
dtype=dtype,
)
self.atomic_number_ij_embedding_layer = nn.Linear(
2 * hidden_channels,
hidden_channels,
dtype=dtype,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to use a ModuleDict here: https://pytorch.org/docs/stable/generated/torch.nn.ModuleDict.html

self.rsf_projections = ModuleDict(
{ A : n.Linear(
            number_of_radial_basis_functions,
            hidden_channels,
            dtype=dtype,
        ),
   B: ..., 
)

Comment on lines +194 to +197
for _ in range(3):
self.linears_tensor.append(
nn.Linear(hidden_channels, hidden_channels, bias=False)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure that for each lineare layer bias=False?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and no activation function is used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to use append operations for `ModuleList. It makes it difficult to identify all to components of the list at one glance. In that case I'd suggest to use something like:

 self.linears_tensor = nn.ModuleList(
[nn.Linear(hidden_channels, hidden_channels, bias=False) for _ in range(3)]
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe the Dense implementation of modelforge?

Comment on lines +199 to +205
self.linears_scalar.append(
nn.Linear(hidden_channels, 2 * hidden_channels, bias=True, dtype=dtype)
)
self.linears_scalar.append(
nn.Linear(2 * hidden_channels, 3 * hidden_channels, bias=True, dtype=dtype)
)
self.init_norm = nn.LayerNorm(hidden_channels, dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this as:

self.linear_layers_for_scalar_properties = nn.ModuleList(
 nn.Linear(hidden_channels, 2 * hidden_channels, bias=True),
nn.Linear(2 * hidden_channels, 3 * hidden_channels, bias=True)
)

which is slightly easier to read.

GaussianRadialBasisFunction(),
representation_unit: unit.Quantity = unit.angstrom,
):
self.representation_unit = representation_unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is necessary. All internal units are in nanometer, but for certain operations it might be useful to convert to angstrom.

Comment on lines +812 to +813
_max_distance = self.max_distance.to(self.representation_unit).m
_min_distance = self.min_distance.to(self.representation_unit).m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert everything to nanometer

):
# TensorNet evenly distribute centers in log space
start_value = torch.exp(
torch.scalar_tensor(
-_max_distance_in_nanometer*10 + _min_distance_in_nanometer*10, # in angstrom
-_max_distance + _min_distance, # in angstrom
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this exactly as outlined here:

def calculate_radial_basis_centers(

@@ -822,31 +857,35 @@ def calculate_radial_basis_centers(
return centers

def calculate_radial_scale_factor(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first glance this should also match the PhysNetRadialBasisFunction. But please double check that claim!

Comment on lines +884 to +887
_max_distance = self.max_distance.to(self.representation_unit).m
_min_distance = self.min_distance.to(self.representation_unit).m
alpha = 5.0 / (_max_distance - _min_distance)
d_ij = torch.exp(alpha * (-d_ij + _min_distance))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d_ij should always be the pairwise distance. If transformations are performed that are specific for the radial basis functions I think these should be moved inside the radial basis function implementation (and shouldn't overwrite d_ij).

@MarshallYan MarshallYan mentioned this pull request Jul 9, 2024
5 tasks
@ArnNag
Copy link
Collaborator

ArnNag commented Jul 20, 2024

@wiederm I may found a bug in TensorNet. They did apply cutoff function twice in their code, which is inconsistent with their paper. I currently sticked to they way they coded it, but we may need a further discussion on this issue.

In class 'ExpNormalSmearing':

def forward(self, dist):
    dist = dist.unsqueeze(-1)
    return self.cutoff_fn(dist) * torch.exp(
        -self.betas
        * (torch.exp(self.alpha * (-dist + self.cutoff_lower)) - self.means) ** 2
    )

The output of this forward function is called 'edge_attr' in 'TensorEmbedding'. This should be ekRBF in their paper, without cutoff function applied.

However, in 'TensorEmbedding':

def _get_tensor_messages(
    self, Zij: Tensor, edge_weight: Tensor, edge_vec_norm: Tensor, edge_attr: Tensor
) -> Tuple[Tensor, Tensor, Tensor]:
    C = self.cutoff(edge_weight).reshape(-1, 1, 1, 1) * Zij
    eye = torch.eye(3, 3, device=edge_vec_norm.device, dtype=edge_vec_norm.dtype)[
        None, None, ...
    ]
    Iij = self.distance_proj1(edge_attr)[..., None, None] * C * eye
    Aij = (
        self.distance_proj2(edge_attr)[..., None, None]
        * C
        * vector_to_skewtensor(edge_vec_norm)[..., None, :, :]
    )
    Sij = (
        self.distance_proj3(edge_attr)[..., None, None]
        * C
        * vector_to_symtensor(edge_vec_norm)[..., None, :, :]
    )
    return Iij, Aij, Sij

Cutoff function C is timed again to I, A, and S.

Was this resolved? It seems that they wanted to make sure that the outputs of intermediate linear layers also decay to 0 at the cutoff distance, even before they multiply by the cutoff-filtered distances.

@wiederm
Copy link
Member

wiederm commented Jul 23, 2024

@wiederm I may found a bug in TensorNet. They did apply cutoff function twice in their code, which is inconsistent with their paper. I currently sticked to they way they coded it, but we may need a further discussion on this issue.
In class 'ExpNormalSmearing':

def forward(self, dist):
    dist = dist.unsqueeze(-1)
    return self.cutoff_fn(dist) * torch.exp(
        -self.betas
        * (torch.exp(self.alpha * (-dist + self.cutoff_lower)) - self.means) ** 2
    )

The output of this forward function is called 'edge_attr' in 'TensorEmbedding'. This should be ekRBF in their paper, without cutoff function applied.
However, in 'TensorEmbedding':

def _get_tensor_messages(
    self, Zij: Tensor, edge_weight: Tensor, edge_vec_norm: Tensor, edge_attr: Tensor
) -> Tuple[Tensor, Tensor, Tensor]:
    C = self.cutoff(edge_weight).reshape(-1, 1, 1, 1) * Zij
    eye = torch.eye(3, 3, device=edge_vec_norm.device, dtype=edge_vec_norm.dtype)[
        None, None, ...
    ]
    Iij = self.distance_proj1(edge_attr)[..., None, None] * C * eye
    Aij = (
        self.distance_proj2(edge_attr)[..., None, None]
        * C
        * vector_to_skewtensor(edge_vec_norm)[..., None, :, :]
    )
    Sij = (
        self.distance_proj3(edge_attr)[..., None, None]
        * C
        * vector_to_symtensor(edge_vec_norm)[..., None, :, :]
    )
    return Iij, Aij, Sij

Cutoff function C is timed again to I, A, and S.

Was this resolved? It seems that they wanted to make sure that the outputs of intermediate linear layers also decay to 0 at the cutoff distance, even before they multiply by the cutoff-filtered distances.

Yes, this has been resolved! We came to the same conclusion and left it in the code.

@wiederm
Copy link
Member

wiederm commented Jul 23, 2024

Closing PR since it is superseeded by PR #181

@wiederm wiederm closed this Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants