-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse Molecules #169
Comments
Looks like we can do this: from typing import List
from pydantic import BaseModel
class LazyModel(BaseModel):
symbols: List[int]
mass_: List[float] = None
class Config:
fields = {'mass_': 'mass'}
@property
def mass(self) -> List[int]:
mass = self.__dict__.get('mass_')
if mass is None:
mass = [42] # [MASS_LOOKUP[k] for k in values['symbols']]
return mass Example: m = LazyModel(symbols=[0], mass=[10.3])
print(m.mass)
#> [10.3]
print(m.dict(exclude_unset=True, by_alias=True))
#> {'symbols': [0], 'mass': [10.3]}
m = LazyModel(symbols=[0])
print(m.mass)
#> [42]
print(m.dict(exclude_unset=True, by_alias=True))
#> {'symbols': [0]} This makes things a bit bulky but workable for this somewhat special class. We would automatically set This would skip the need for a |
That's looking very promising. There's complications with, say, mass coming not just from the mass field but from the label field, but that's a problem on my side, not with the pydantic integration. Yay for alias fields. |
@property
def labels(self):
return [LABEL_LOOKUP[x] for x in self.symbols]
@property
def mass(self):
return [MASS_LOOKUP[x] for x in self.labels] This should correctly chain. |
I think the total number of lazy fields are:
Keeping the booleans around as they are lightweight. I can start in on this soon. @loriab Do you have thoughts on how to keep |
|
No, nothing that we can regenerate should hang around we should go all in. It is both time and data size, I think we should cache the values however. |
|
We can also strip data size down by only filling fields that were passed in: validated_kwargs = validate(kwargs)
kwargs = {k: validated_kwargs[k] for k in kwargs.keys() This might make it easy to let everything else be easy. It is safe to say the 90% case is symbols + geometry and we need to fill in multiplicity + charge. The 9% case is where we need to do that for fragments, everything else likely falls in the 1%. |
Closed by #191. |
Is your feature request related to a problem? Please describe.
Molecules are typically 3x as heavy as they need to be as many items like mass/real/isotope are generally needed. This causes issues on both object creation speed and serialized size at a number of layers.
Describe alternatives you've considered
Several options are possible:
Molecule.mass = None
all values are defaulted to None instead of created values. We could have acreate/strip
functions that would fill/remove these fields on the fly..dict()
would remove all created fields in the serialization.Molecule.mass
is a lazy evaluated field unless provided. This may not be possible due to metaclass issues, but worth a try.SimpleMolecule
class which contains a subset ofMolecule
fields. Not my favorite solution, but a quick one.This is becomes a fairly serious issue for the Database and handling large numbers of Molecules.
@loriab @mattwelborn
The text was updated successfully, but these errors were encountered: