Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mass validation & mtol #262

Open
anabiman opened this issue Apr 28, 2021 · 4 comments
Open

Mass validation & mtol #262

anabiman opened this issue Apr 28, 2021 · 4 comments
Labels
invalid This doesn't seem right

Comments

@anabiman
Copy link
Collaborator

anabiman commented Apr 28, 2021

Describe the bug
Cannot construct a valid molecule with specified masses.

To Reproduce

oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[15.996])
oxy.mass_numbers -> array([-1], dtype=int16)

QCElemental couldn't guess the mass number to be 16 likely due to the relatively small mtol (1e-3) set in reconcile_nucleus() and from_schema(), so if we specify the mass number, we get a validation error:

oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[15.996], mass_numbers=[16]) 
-> qcelemental.exceptions.ValidationError: Inconsistent or unspecified mass: A: 16, Z: None, E: O, mass: 15.996, real: None, label: None

# mass = 15.994 or 15.995 would work with mtol=1e-3.

Expected behavior
To get the right mass number or avoid any errors, we can either turn off validation (which is not recommended), or add a keyword arg mtol to Molecule.__init__ and increase the value of this number, but neither approach is elegant or adequate for passing "oxy" to qcengine for e.g. (since mtol is not part of the Molecule schema or used by qcengine AFAIK).

What is the easiest way to solve this with minimal effort? Turn off mass validation by default? Make mtol adaptive?

Additional context
Passing the keyword nonphysical to Molecule does not solve the problem.
qcel v0.19.0.

@anabiman anabiman added the invalid This doesn't seem right label Apr 28, 2021
@anabiman
Copy link
Collaborator Author

Or at least maybe print a warning msg, or raise an exception when mass_number = -1 i.e. qcel could not guess the "most probable" mass number?

@loriab
Copy link
Collaborator

loriab commented Apr 28, 2021

Don't turn off validate. That's giving a generic per atom

oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[12.999], validate=False) #mtol=1.e-6)

print(oxy.mass_numbers)
# returns [16], wrong!

What is the use case here? Are you populating from a program with an old, fixed set of atomic masses? and then mass number is also needed? If what you want is modern masses while still preserving isotope info, perhaps initialize with mass_number=int(old_mass).

@loriab
Copy link
Collaborator

loriab commented Apr 28, 2021

The correct way to make this work is with something like the below. (That gets you through creation, but more tweaks needed for to_string, etc.) I'd still like to hear more about how this situation is coming about.

oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[15.999])
print(oxy.mass_numbers)  # [-1]
oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[15.999], mtol=1.e-2)
print(oxy.mass_numbers)  # [16]
diff --git a/qcelemental/models/molecule.py b/qcelemental/models/molecule.py
index 72e3e53..9e160c2 100644
--- a/qcelemental/models/molecule.py
+++ b/qcelemental/models/molecule.py
@@ -333,8 +333,9 @@ class Molecule(ProtoModel):
             # original_keys = set(kwargs.keys())  # revive when ready to revisit sparsity
 
             nonphysical = kwargs.pop("nonphysical", False)
+            mtol = kwargs.pop("mtol", 1.e-3)
             schema = to_schema(
-                from_schema(kwargs, nonphysical=nonphysical), dtype=kwargs["schema_version"], copy=False, np_out=True
+                from_schema(kwargs, nonphysical=nonphysical, mtol=mtol), dtype=kwargs["schema_version"], copy=False, np_out=True
             )
             schema = _filter_defaults(schema)
 
diff --git a/qcelemental/molparse/from_schema.py b/qcelemental/molparse/from_schema.py
index d1775b4..aec087a 100644
--- a/qcelemental/molparse/from_schema.py
+++ b/qcelemental/molparse/from_schema.py
@@ -7,7 +7,7 @@ from ..util import provenance_stamp
 from .from_arrays import from_arrays
 
 
-def from_schema(molschema: Dict, *, nonphysical: bool = False, verbose: int = 1) -> Dict:
+def from_schema(molschema: Dict, *, nonphysical: bool = False, mtol: float = 1.e-3, verbose: int = 1) -> Dict:
     r"""Construct molecule dictionary representation from non-Psi4 schema.
 
     Parameters
@@ -85,7 +85,7 @@ def from_schema(molschema: Dict, *, nonphysical: bool = False, verbose: int = 1)
         # tooclose=tooclose,
         # zero_ghost_fragments=zero_ghost_fragments,
         nonphysical=nonphysical,
-        # mtol=mtol,
+        mtol=mtol,
         verbose=verbose,
     )

@anabiman
Copy link
Collaborator Author

What is the use case here? Are you populating from a program with an old, fixed set of atomic masses? and then mass number is also needed? If what you want is modern masses while still preserving isotope info, perhaps initialize with mass_number=int(old_mass).

Yes, that is exactly my case. I am using a molecule created by a bunch of different codes, convert it into qcelemental.Molecule, which is then passed to QCEngine. I don't need the mass numbers for now so I ended up removing them from my pipeline after I realized I was getting negative values.

Doing mtol = kwargs.pop("mtol", 1.e-3) in Molecule solves my problem within qcel but might require more changes for it to work with qcengine since mtol is lost (and I'm gonna guess qcengine re-instantiates a new Molecule object or maybe reruns the validation with the default mtol=1e-3). In any case, the simpler solution IMO would be to throw an exception here, so I am wondering why mass_number = -1 is allowed? If the caller is passing mass_numbers and an atomic mass outside the "mtol" range, why not consider this a failure? Or allow mass_number = -1 only when nonphysical=True?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants