Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecule masses from symbol issue #225

Closed
dgasmith opened this issue May 5, 2020 · 8 comments
Closed

Molecule masses from symbol issue #225

dgasmith opened this issue May 5, 2020 · 8 comments

Comments

@dgasmith
Copy link
Collaborator

dgasmith commented May 5, 2020

Describe the bug
Nucleon symbols and values are in the periodictable information, but do not propagate their values into a Molecules mass. For example symbol D or He3 do not correctly review masses of 2 or 3, respectively.

To Reproduce

>>> qcel.models.Molecule(symbols=["He3"], geometry=[0,0,0]).masses
array([4.00260325413])

Expected behavior
The mass of the Helium3 to be 3 and change.

Additional context
The issues comes from reconcile_nucleus where everything is validated off of atomic number rather than the input symbol. This looks like a non trivial fix to that logic.

@loriab
Copy link
Collaborator

loriab commented May 5, 2020

did you try 3He? that should recognize the isotope. He3 is read as a user-convenience tag of 3 on regular helium.

@dgasmith
Copy link
Collaborator Author

dgasmith commented May 5, 2020

qcelemental.exceptions.NotAnElementError: 3He

There are unambiguous issues as well with known isotopes:

>>> qcel.models.Molecule(symbols=["D"], geometry=[0,0,0]).masses
array([1.00782503])

@loriab
Copy link
Collaborator

loriab commented May 5, 2020

ok. it's probably in the handing over of field symbol to molparse -- may need a different field. it shouldn't be deep, though, since this is literally the problem reconcile_nucleus was built to handle. https://github.com/MolSSI/QCElemental/blob/master/qcelemental/molparse/nucleus.py#L381

@loriab
Copy link
Collaborator

loriab commented May 5, 2020

the below works (though it's not set up to recognize D, T). understandably you'd like to not have so many guts showing to make a simple deuterium.

>>> qcel.models.Molecule(**qcel.molparse.to_schema(qcel.molparse.from_arrays(elbl=["2H"], geom=[0,0,0]), dtype=2)).masses
array([2.01410177812])

The problem comes about pretty much because https://github.com/MolSSI/QCElemental/blob/master/qcelemental/molparse/from_schema.py#L82 . https://github.com/MolSSI/QCElemental/blob/master/qcelemental/molparse/from_arrays.py#L191-L195 I hesitate to flip it without really studying the consequences.

@dgasmith
Copy link
Collaborator Author

dgasmith commented May 5, 2020

Yea, I don't think this is either a priority or a quick fix. More of a "I was writing a test was and was somewhat surprised". Something to keep in mind I think.

@loriab
Copy link
Collaborator

loriab commented May 5, 2020

I don't know if it's so much a fix as just the current setup enforces schema field definitions. Below are a couple schema-approved ways to do your original problem. Redefining symbols where as input it's a spec label and as output it's a string is a new kettle of fish. But if that is wanted someday, the underlying tech is ready.

>>> qcel.models.Molecule(symbols=["He"], mass_numbers=[3], geometry=[0,0,0]).masses
array([3.01602932])
>>> qcel.models.Molecule.from_data("""3He 0 0 0""").masses
array([3.01602932])

@dgasmith
Copy link
Collaborator Author

dgasmith commented May 5, 2020

In this case perhaps symbols=["He3"] should throw an error that information in the symbols is overloaded?

@loriab
Copy link
Collaborator

loriab commented May 5, 2020

Sounds sensible at first. You have hit on an inconsistency, though. speclabel takes prefix mass number so that post can be label or mass, so 3He. meanwhile, the periodictable takes suffix mass number: He3.

So symbols is being broadly interpreted (6 lines below map to the same Mol) but only to extract the info that is properly contained in symbols. Probably something like an option on to_Z(..., strict=True) that restricts interpretation to element only.

# all same
qcel.models.Molecule(symbols=['He'], geometry=[0,0,0])
qcel.models.Molecule(symbols=['He3'], geometry=[0,0,0])
qcel.models.Molecule(symbols=['He4'], geometry=[0,0,0])
qcel.models.Molecule(symbols=['2'], geometry=[0,0,0])
qcel.models.Molecule(symbols=[2], geometry=[0,0,0])
qcel.models.Molecule(symbols=['helium'], geometry=[0,0,0])
# replies "you fool"
qcel.models.Molecule(symbols=['He300'], geometry=[0,0,0])

loriab added a commit to loriab/QCElemental that referenced this issue Jun 24, 2020
@loriab loriab closed this as completed in 2b11ac9 Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants