Detailed molecular basis set specification (output of a calculation, not input) #12

tovrstra · 2017-08-18T08:42:00Z

Rough idea

The current schema does not yet have standardized method for specifying a Gaussian (or Slater) AO basis that was used in a calculation for a given molecule. Just to be clear: I'm looking for a way to fully specify a basis set in the output of a calculation, after the calculation has been carried out. This is different from specifying a basis set in the input for a calculation, because in the output you want to know how to interpret quantities expressed in an AO basis. For that you need more information, namely:

Order of the basis functions within one shell.
Normalization and sign conventions for the basis functions.
The center for each shell of basis functions.

Because basis functions are not necessarily centered on nuclei, an array with centers must be included as well.

I think we can mostly JSON-ize the molden format (see [GTO], [STO] and [MO] sections of http:https://www.cmbi.ru.nl/molden/molden_format.html) plus some improvements:

Make the ordering of the basis functions within one shell (and pure versus Cartesian) explicit as follows:
```
{"conventions": {
  "d": ["xx", "yy", "zz", "xy", "xz", "yz"],
  "f": ["f0c", "f1c", "f1s", "f2c", "f2s", "-f3s", "-f3c"]
}}
```
Cartesian functions always match the x*y*z* regular expression. For pure functions, the string consists of four parts: sign (optional), angular momentum (letter code), magnetic quantum number (absolute value), s or c for sine- or cosine-like. The optional sign is needed because some codes (e.g. ORCA) have unusual sign conventions, different from most other QC codes. The line for the f-functions in the example is how ORCA orders basis functions (and flips signs) in AO arrays in a Molden file. (This is different from what the Molden program actually expects.) To fix the meaning of the strings for the pure functions completely, we should write out the mathematical form in the JSON schema.
Allow very high angular momenta, exhausting the whole alphabet (except j, see Wikipedia, more authoritative reference always welcome): ["s", "p", "d", "f", "g", "h", "i", "k", "l", "m", "n", "o", "q", "r", "t", "u", "v", "w", "x", "y", "z", "a", "b", "c", "e"].
Assume contractions are normalized already, i.e. do not assume that the program reading the contraction coefficients will fix the normalization for you. Instead, the program writiing out the contractions should take care of that. (The Molden format assumes contractions need to be normalized upon reading, which is not suitable for all cases.)
Support (very) generalized contractions, i.e. also compatible with basis sets from CP2K. See https://github.com/cp2k/cp2k/blob/master/cp2k/data/BASIS_MOLOPT (It does not get more general than that.) For every shell we should have a list of angular momenta, e.g. ["s", "p"], ["s", "s"] or ["s", "s", "s", "p", "p", "d"]. The last example is something you could encounter in a molopt basis set.
Keep a list of centers for the shells, which can be different from atomic positions, e.g. when using ghost atoms or doing other funny things.
Include pseudopotential specification? (I'm not an expert.)
Details for STO basis functions need to be worked out. (I'm not an expert on that either.)

Example for NH3

{"orbital_basis": {
  "type": "gto",
  "conventions": {
    "p": ["x", "y", "z"],
    "d": ["d0c", "d1c", "d1s", "d2c", "d2s"],
  },
  "shells": [
    [0, 8, ["s", "s"],
     [0.9046E+4, 0.1357E+4, 0.3093E+3, 0.8773E+2, 0.2856E+2, 0.1021E+2, 0.3838E+1, 0.7466],
     [0.6996174134E-3, 0.538605463E-2, 0.2739102119E-1, 0.103150592, 0.2785706633, 0.4482948495, 0.2780859284, 0.1543156123E-1],
     [-0.304990096E-3, -0.2408026379E-2, -0.1194444873E-1, -0.489259929E-1, -0.1344727247, -0.3151125777, -0.2428578325, 0.1094382207E+1],
    ],
    [0, 1, ["s"], [0.2248], [1.0]],
    [0, 1, ["s"], [0.6124E-1], [1.0]],
    [0, 3, ["p"],
     [0.1355E+2, 0.2917E+1, 0.7973],
     [0.5890567677E-1, 0.3204611067, 0.7530420618]],
    [0, 1, ["p"], [0.2185], [1.0]],
    [0, 1, ["p"], [0.5611E-1], [1.0]],
    [0, 1, ["d"], [0.817], [1.0]],
    [0, 1, ["d"], [0.23], [1.0]],
    [1, 3, ["s"],
     [0.1301E+2, 0.1962E+1, 0.4446],
     [0.3349872639E-1, 0.2348008012, 0.8136829579]],
    [1, 1, ["s"], [0.122], [1.0]],
    [1, 1, ["s"], [0.2974E-1], [1.0]],
    [1, 1, ["p"], [0.727], [1.0]],
    [1, 1, ["p"], [0.141], [1.0]],
    [2, 3, ["s"],
     [0.1301E+2, 0.1962E+1, 0.4446],
     [0.3349872639E-1, 0.2348008012, 0.8136829579]],
    [2, 1, ["s"], [0.122], [1.0]],
    [2, 1, ["s"], [0.2974E-1], [1.0]],
    [2, 1, ["p"], [0.727], [1.0]],
    [2, 1, ["p"], [0.141], [1.0]],
    [3, 3, ["s"],
     [0.1301E+2, 0.1962E+1, 0.4446],
     [0.3349872639E-1, 0.2348008012, 0.8136829579]],
    [3, 1, ["s"], [0.122], [1.0]],
    [3, 1, ["s"], [0.2974E-1], [1.0]],
    [3, 1, ["p"], [0.727], [1.0]],
    [3, 1, ["p"], [0.141], [1.0]],
  ],
  "centers": [
    [-0.0140883131, 0.0845903925, 0.1037711513]
    [1.4952113836, 0.0214187375, 0.0445603623]
    [-0.5919457779, -1.6621666211, 0.5350312215]
    [-0.7075176488, 0.4654193413, -2.0214243076]
  ]
}}

Details

The shells field is a list, where each item represents one generalized contraction stored as a list with the following items:

center index (counting from zero?)
the number of primitives
the angular momenta for the generalized contraction
the Gaussian exponents
one or more lists of contraction coefficients. (More are present in case of generalized contractions.)

To do

Units
Support different normalization conventions. Furthermore, Cartesian functions have different normalization constants within one shell, which is causing some ambiguity.

The text was updated successfully, but these errors were encountered:

tovrstra · 2017-08-18T11:16:38Z

I've updated the text above. The changes were mostly related to pure functions.

berquist · 2017-08-18T12:05:26Z

Allow very high angular momenta: s, p, d, f, g, h, i, j, k, l, m, n, o. (Not sure what should be done after o?)

From Wikipedia:

The letters after the f sub-shell just follow letter f in alphabetical order except the letter j and those already used.

so o, q, r, t, u, v, w, x, y, z, ...

tovrstra · 2017-08-18T12:09:41Z

@berquist Thanks! I'll fix this (and other remarks later) in the top post.

dgasmith · 2017-08-18T12:14:38Z

@twindus @wadejong Do you guys have a link to the EMSL BSE XML spec?

tovrstra · 2017-08-18T12:23:07Z

I think I found it: https://bse.pnl.gov/bse/docs/schemas/gaussian.xsd I'll take a look. I think this paper explains the xml spec to some extent: http:https://pubs.acs.org/doi/pdf/10.1021/ci600510j Are there other references?

There is also https://bse.pnl.gov/bse/docs/schemas/ecp.xsd for effective core potentials.

gkc1000 · 2017-08-18T13:14:20Z

Core potentials as well?

tovrstra · 2017-08-18T13:18:19Z

@gkc1000 Core potentials are not as urgent because they are not needed to visualize orbitals or to compute densities. They could be useful for more specialized post-processing, e.g. follow-up calculations with other levels of theory

dgasmith · 2017-08-18T13:20:18Z

The EMSL BSE XML has core potentials AFAIK. If its not in the document its in another.

wadejong · 2017-08-18T13:32:52Z

There are discussions that have started regarding the BSE, it's XML, the need for extending the format, and possibly having JSON. Just FYI.

…

On Aug 18, 2017, at 6:20 AM, Daniel Smith ***@***.***> wrote: The EMSL BSE XML has core potentials AFAIK. If its not in the document its in another. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

tovrstra · 2017-08-18T13:33:42Z

@wadejong : are these discussion done on an online forum, so we can follow up on them?

wadejong · 2017-08-18T13:38:34Z

Not at this point. We had a first preliminary discussion yesterday related to long term support for BSE and talking about simplification of the architecture. I discussed this schema effort too, seeing if we can connect the efforts and unify things. I will suggest an online discussion forum for this.

…

On Aug 18, 2017, at 6:33 AM, Toon Verstraelen ***@***.***> wrote: @wadejong : are these discussion done on an online forum, so we can follow up on them? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

tovrstra · 2017-08-18T13:42:26Z

Thanks for the info. I looked at the EMSL schema for basis sets. These are my main thoughts. I hope it can be of use:

the EMSL scheme It serves a different purpose, namely to describe a basis set for a list of elements in the periodic table. It is not meant to describe molecular basis sets. Still there is a lot of overlap, so we could have some useful exchange of ideas.
The support for generalized contractions is rather limited in the EMSL XML scheme. The very general contractions as in CP2K are apparently not supported.
The EMSL scheme does not say anything about ordering of basis functions within one shell because it is not relevant in that context.
The EMSL scheme does not enforce any kind of normalization conventions for the contractions. (We ran into examples of unnormalized contractions at some point: Horton integrals are normalized-weirdly. theochem/horton#39.) We really need this to correctly interpret orbitals and other data. Enforcing normalization conventions is not very important for the EMSL scheme.

As it is, the EMSL scheme is not a one-stop solution here for two reasons: some features are missing and it is not meant to be used for describing molecular basis sets.

wadejong · 2017-08-18T13:43:00Z

In our extended Chemical JSON format currently implemented in a side version of NWChem we do have a complete format for Gaussian basis sets. Will drop it in when I get into the office. I do like some of the ideas. Especially dealing with the generalized contractions. My format simply unpacks them one shell at a time, which leads to some replication. Using matrix formulations might be a good approach.

…

On Aug 18, 2017, at 1:42 AM, Toon Verstraelen ***@***.***> wrote: Rough idea The current schema does not yet have standardized method for specifying a Gaussian (or Slater) AO basis for a given molecule in full detail. This is needed when interpreting atomic orbitals or density matrices in the output. A simple string like "3-21G" is not enough because different codes use different conventions needed for the interpretation and because some basis set strings have not such a well-defined meaning. I think we can mostly JSON-ize the molden format (see [GTO], [STO] and [MO] sections of http:https://www.cmbi.ru.nl/molden/molden_format.html) plus some improvements: Make the ordering of the basis functions within one shell (and pure versus Cartesian) explicit as follows: {"conventions": { "d": ["xx", "yy", "zz", "xy", "xz", "yz"], "f": ["-3", "-2", "-1", "0", "+1", "+2", "+3"] }} Allow very high angular momenta: s, p, d, f, g, h, i, j, k, l, m, n, o. (Not sure what should be done after o?) Assume contractions are normalized already with some convention, i.e. do not assume that the program reading the contraction coefficients will fix the normalization for you. Instead, the program writiing out the contractions should take care. (Different programs use different conventions, which is information that you do not want to discard. The Molden format assumes contractions need to be normalized upon reading, which is not suitable for all cases.) Support (very) generalized contractions, i.e. also compatible with basis sets from CP2K. See https://github.com/cp2k/cp2k/blob/master/cp2k/data/BASIS_MOLOPT (It does not get more general than that.) For every shell we should have a list of angular momenta, e.g. ["s", "p"], ["s", "s"] or ["s", "s", "s", "p", "p", "d"]. The last example is something you could encounter in a molopt basis set. Keep a list of centers for the shells, which can be different from atomic positions, e.g. when using ghost atoms or doing other funny things. Include pseudopotential specification? (I'm not an expert.) Details for STO basis functions need to be worked out. (I'm not an expert on that either.) Example for NH3 {"basis": { "type": "gto", "conventions": { "p": ["x", "y", "z"], "d": ["0", "+1", "-1", "+2", "-2"], }, "shells": [ [0, 8, ["s", "s"], [0.9046E+4, 0.1357E+4, 0.3093E+3, 0.8773E+2, 0.2856E+2, 0.1021E+2, 0.3838E+1, 0.7466], [0.6996174134E-3, 0.538605463E-2, 0.2739102119E-1, 0.103150592, 0.2785706633, 0.4482948495, 0.2780859284, 0.1543156123E-1], [-0.304990096E-3, -0.2408026379E-2, -0.1194444873E-1, -0.489259929E-1, -0.1344727247, -0.3151125777, -0.2428578325, 0.1094382207E+1], ], [0, 1, ["s"], [0.2248], [1.0]], [0, 1, ["s"], [0.6124E-1], [1.0]], [0, 3, ["p"], [0.1355E+2, 0.2917E+1, 0.7973], [0.5890567677E-1, 0.3204611067, 0.7530420618]], [0, 1, ["p"], [0.2185], [1.0]], [0, 1, ["p"], [0.5611E-1], [1.0]], [0, 1, ["d"], [0.817], [1.0]], [0, 1, ["d"], [0.23], [1.0]], [1, 3, ["s"], [0.1301E+2, 0.1962E+1, 0.4446], [0.3349872639E-1, 0.2348008012, 0.8136829579]], [1, 1, ["s"], [0.122], [1.0]], [1, 1, ["s"], [0.2974E-1], [1.0]], [1, 1, ["p"], [0.727], [1.0]], [1, 1, ["p"], [0.141], [1.0]], [2, 3, ["s"], [0.1301E+2, 0.1962E+1, 0.4446], [0.3349872639E-1, 0.2348008012, 0.8136829579]], [2, 1, ["s"], [0.122], [1.0]], [2, 1, ["s"], [0.2974E-1], [1.0]], [2, 1, ["p"], [0.727], [1.0]], [2, 1, ["p"], [0.141], [1.0]], [3, 3, ["s"], [0.1301E+2, 0.1962E+1, 0.4446], [0.3349872639E-1, 0.2348008012, 0.8136829579]], [3, 1, ["s"], [0.122], [1.0]], [3, 1, ["s"], [0.2974E-1], [1.0]], [3, 1, ["p"], [0.727], [1.0]], [3, 1, ["p"], [0.141], [1.0]], ], "centers": [ [-0.0140883131, 0.0845903925, 0.1037711513] [1.4952113836, 0.0214187375, 0.0445603623] [-0.5919457779, -1.6621666211, 0.5350312215] [-0.7075176488, 0.4654193413, -2.0214243076] ] }} Details The shells field is a list, where each item represents one generalized contraction stored as a list with the following items: center index (counting from zero?) the number of primitives the angular momenta for the generalized contraction the Gaussian exponents one or more lists of contraction coefficients. (More are present in case of generalized contractions.) To do units — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

tovrstra · 2017-08-22T15:35:31Z

@wadejong Could you post the Gaussian basis JSON format please? No rush. Thanks!

wadejong · 2017-08-22T15:55:50Z

Today.

…

Sent from my iPhone

On Aug 22, 2017, at 8:35 AM, Toon Verstraelen ***@***.***> wrote: @wadejong Could you post the Gaussian basis JSON format please? No rush. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

wadejong · 2017-08-23T00:53:31Z

I added a directory Basis_Sets with an example Gaussian basis JSON file and also an example that has an ECP in it to the QC_JSON repository.

…

On Tue, Aug 22, 2017 at 8:35 AM, Toon Verstraelen ***@***.***> wrote: @wadejong <https://github.com/wadejong> Could you post the Gaussian basis JSON format please? No rush. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGa9cs-lOecMbsuiAWTC5lHyyRyZ6pXfks5savVGgaJpZM4O7Q7s> .

tovrstra · 2017-08-23T04:21:21Z

@wadejong Thanks!

It seems that these JSON files have a purpose comparable to the EMSL XML scheme, i.e. describing basis functions for atoms. This is typically the input one would give to a QC code, which is somewhat different from the problem I'm trying to address, how to represent the molecular basis set in the output of a QC calculation. (See comment above on XML scheme.)

After going through the https://github.com/MolSSI/QC_JSON_Schema/commit/ce3230167f4cc1276f1ff1128ff9c17b9759cdea, I wondered if it is preferable to work with dictionaries to represent the shells? I'm the idea above, I've just used a list in which the order of the elements fixes their meaning. That is not as self-explaining but it makes the JSON more compact.

FarnazH · 2019-04-07T16:06:43Z

@tovrstra I mihgt be missing sth, but didn't we decide to represent shells as a list of namedtuple('Shell', ['icenter', 'iatom', 'angmoms', 'exponents', 'contractions'])?

cryos · 2019-04-08T11:45:24Z

I have been working on this in Chemical JSON, and use a number of related arrays to arrange the data in linear arrays following the large array convention we decided to adopt. We have a basisSet object, and that has arrays for coefficients, primitivesPerShell, shellToAtomMap, shellTypes. There is an orbitals object too that has energies, moCoefficients (for alpha and beta optionally), occupations, and then we have an object for vibrations too. The example file has a perhaps too large molecule but a working implementation.

We have been able to use this along with coordinate sets and multiple MO coefficient arrays to animate a series of calculations where multiple calculations were performed on a system as it bonded.

tovrstra · 2019-04-08T12:15:09Z

@cryos Thanks for the information! I have a few small questions:

Is there a detailed description of this JSON format, e.g. also including normalization conventions of primitives and contracted basis functions?

Would you have a smaller example too? (E.g. for testing purposes.)

Which implementations can already write Chemical JSON or read it for visualization and post-processing?

How does Chemical JSON handle generalized contractions?

dgasmith · 2019-04-23T13:23:18Z

Coming back to this, we added a basis input specification from @bennybp which is a mimic of the data structure used in the new Basis Set Exchange. As a note this has been added here. This covers all use cases extant in the BSE and should provide good coverage of what is out there (@bennybp may have a few items to add).

I believe the only item that does not exist is ordering. For ordering I am leaning a bit towards simply providing ordering arrays as per CCLib. I believe the only other viable option would be the CCA standard?

twindus · 2019-04-23T14:22:07Z

Either of these options seems reasonable. There was a lot of support for just doing CCA at the recent ACS symposium - anecdotal, but....

tovrstra · 2019-04-23T14:38:47Z

@dgasmith Could you post a link to the CCA standard? Thanks.

dgasmith · 2019-04-23T14:58:54Z

The CCA paper can be found here. Same ordering as FCHK/Psi4 here. I think there may be some oddities in FCHK higher order cartesian IIRC.

tovrstra · 2019-04-23T15:08:22Z

Thanks. I'll come back to it later.

dgasmith · 2019-04-23T15:40:02Z

At the very least we could recommend the CCA ordering even if we support others. It would make things simpler in many cases.

tovrstra · 2019-04-24T08:15:55Z

I would not mind fixing the ordering of basis functions and related conventions in QCSchema, as long as it is all clearly defined. Something like "the same as in program ABC" is not good enough because most other programs never explicitly define their conventions. Here are some examples of potential sources of confusion from the codes mentioned above (not meant as criticism, just to illustrate):

The CCA paper only seems to mention the order of the Cartesian primitives, not pure ones, unless I missed something.
The FCHK format uses a different ordering than what PSI4 uses internally (for both pure and Cartesian functions), so I'm not sure what FCHK/PSI4 ordering means.
Libint can be configured to work with different orderings.
...

Can we include in the QCSchema documentation or definition the following information?

Standard ordering of Cartesian and pure basis functions within one shell, with equations to rule out any ambiguities. (For example, ORCA uses rather odd sign conventions for pure functions for m>=3. Such aspects should be fixed by convention.)
Standard normalization of primitives, with equations.
How to handle unnormalized contractions? (As far as I remember, not all basis sets on EMSL have normalized contractions. Most programs renormalize them when setting up the basis set.)

dgasmith · 2019-05-07T12:51:22Z

@twindus Do you have the CCA order for spherical basis sets? I think its 0, -/+ 1, ...

On that particular topic there is the fun question of what to do with p orbitals as the spherical transform does not effect the orbitals, just the order. Most programs do not bother with the transform.

I am leaning a bit back towards saying that the AO quantities must be in CCA order. It puts slightly more onus on the generation of the schema data, but makes everything downstream much easier to handle. We can supply translation routines in QCElemental/QCEngine that can handle this.

We can generate pure/cartesian orderings no problem. @tovrstra, can you expand a bit on the ORCA sign convention issue?
Do you have the normalization equations sitting around? I poked through libint's docs, but didn't find anything.
Does unnormalized contractions hurt us? The program should write out the basis and orbitals in whatever normalization was used internally and if they do/do not convert it should be encoded I believe.

tovrstra · 2019-05-07T14:01:43Z

Regarding ORCA, it could be specifically an issue in their routines to write out Molden files. When we read in Molden files with apparently incorrectly normalized orbitals, we run a series of checks to detect common deviations from the original file format. Orca has a sign change in pure AO functions from m>=3 on, which could (or not) reflect their internal conventions. Our settings for reading ORCA Molden files can be found here: https://github.com/theochem/iodata/blob/master/iodata/formats/molden.py#L393

dgasmith · 2019-06-04T19:13:38Z

From discussion with @wadejong and @cryos this week at Berkeley I think we are leaning towards having purely the CCA standard within the schema. This puts some additional effort upfront, but ultimately makes things far simpler.

Looking at the CCA standard (https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.20815) the spherical ordering is L, ..., 0, ..., -L. However, the libint documentation appears to describe the CCA ordering as -L, ..., 0, ..., L.

@evaleev @twindus Is there a misunderstanding here?

evaleev · 2019-06-04T20:23:32Z

the CCA specification of the solid harmonics ordering in our paper is incomplete and thus I cannot conclude whether what libint implemented is “correct” or “not”, e.g. it does not specify what m_l means.

according to this libint implements what the de-facto CCA standard defining code (MPQC2) defined: https://github.com/ValeevGroup/mpqc/blame/19ad332791d5966c98420d926364d68aeec859ae/src/lib/chemistry/qc/basis/transform.cc#L101

so I think this is what the de-facto CCA order is, since the specification in the paper is (unfortunately) incomplete

tovrstra · 2019-06-05T07:43:47Z

@dgasmith @evaleev Thanks for sharing the info.

I agree, even when stating the ordering is m_l = -L ... L or m_l = L ... -L, it remains unclear what this actually means. For real (not complex) solid harmonics, the meaning of the sign of m_l is rather arbitrary. The only way to make it unambiguous is to just include the equations of the real solid spherical harmonics in the schema documentation. While doing so, one could also include equations for Cartesian ones and the normalization conventions, just to nail it down.

dgasmith · 2019-06-05T15:41:54Z

@evaleev Can we borrow these definitions from your living wikipedia page? @tovrstra could you work on writing up the format if this is the case?

evaleev · 2019-06-05T19:03:25Z

Sure. Attribution would be helpful. When are you planning to release the schema?

…

On Wed, Jun 5, 2019, 11:42 AM Daniel Smith ***@***.***> wrote: @evaleev <https://github.com/evaleev> Can we borrow these definitions from your living wikipedia page? @tovrstra <https://github.com/tovrstra> could you work on writing up the format if this is the case? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12?email_source=notifications&email_token=AAQXIZ5GRL4XCGRD2XOBDMTPY7NELA5CNFSM4DXNB3WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAEB2I#issuecomment-499138793>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAQXIZZCA5EQEH45S7JYJK3PY7NELANCNFSM4DXNB3WA> .

tovrstra · 2019-06-06T07:10:38Z

I wouldn't mind and attribution is obviously not a problem. I'm not sure what is meant by "living wikipedia page"? Could someone post a link? Thanks.

dgasmith · 2019-06-06T17:30:44Z

That appears to be a autocorrection issue "libint wikipedia page", as seen here.

@evaleev We have had a minimal version 1 out which carries energy-based quantities and some properties-- we took some time to explore how that works in practice. You should be seeing a larger push from MolSSI to get more quantities into the schema.

loriab · 2019-10-03T21:37:51Z

@devinamatthews contributes a nice snippet for mapping basis function indices from Psi4 to CFOUR (C1 only).

    bas = wfn.basisset()

    with open("GENBAS", "w") as genbas:
        genbas.write(bas.genbas())

    maxl = bas.max_am()
    map = []
    lmap = [[0],
            [1,2,0],
            [0,4,1,3,2],
            [1,2,0,5,4,6,3],
            [0,4,1,7,6,3,8,5,2],
            [1,2,3,5,8,10,7,6,0,9,4],
            [11,4,9,7,10,3,12,5,8,0,6,2,1]]

    for i in range(mol.natom()):
        for l in range(maxl+1):
            for li in range(2*l+1):
                for j in range(bas.nshell()):
                    s = bas.shell(j)
                    if bas.shell_to_center(j) == i and s.am == l:
                        k = bas.shell_to_basis_function(j)
                        if k > 6:
                            m = l - li//2
                            k += 2*m + li%2
                        else:
                            k += lmap[l][li]
                        map.append(k)

devinamatthews · 2019-10-03T21:59:41Z

Thanks @loriab. While this is an example of a one-off mapping (and it was hard enough to figure out even in C1 with no reorientation), it seems like using the MOLDEN format or a JSON format as discussed here could make this much more portable.

Basically what I mean is that if someone were to write a tool that took two MOLDEN/JSON descriptions and generated the transformation matrix from one to the other, then reading in orbitals from another program would be as easy as reversing the MOLDEN-writer code and doing a matrix multiply. Such a tool would have to handle:

Rotation and possible inversion between the molecular frames.
Reordering of the atoms, contracted functions, and primitives.
Reuse of primitives between contracted functions. For example, CFOUR's cc-pVDZ basis for O has 9 primitives with non-zero coefficients for the 1s and 2s functions, where the 9th primitive is shared with the polarization function. Other programs only use the first 8 primitives for the 1s and 2s functions. You get the same energy but it's a different basis.
(optional maybe) Transformation between Cartesians and spherical harmonics

Maybe this is as easy as just evaluating a bunch of overlap integrals?

This was referenced Aug 18, 2017

Counting starts from 0 or 1? #13

Closed

Change ordering of basis functions within shells theochem/cgbasis#2

Open

Change ordering of basis functions within shells theochem/iodata#2

Closed

tovrstra changed the title ~~Detailed molecular basis set specification~~ Detailed molecular basis set specification (output of a calculation, not input) Aug 23, 2017

tovrstra mentioned this issue Feb 22, 2019

Basis issue orderings #45

Open

dgasmith mentioned this issue Jun 10, 2019

Wavefunction data #63

Closed

wilhadams mentioned this issue Jul 21, 2020

basisset renormalization MolSSI/QCElemental#230

Open

4 tasks

Detailed molecular basis set specification (output of a calculation, not input) #12

Detailed molecular basis set specification (output of a calculation, not input) #12

Comments

tovrstra commented Aug 18, 2017 • edited Loading

Rough idea

Example for NH3

Details

To do

tovrstra commented Aug 18, 2017

berquist commented Aug 18, 2017

tovrstra commented Aug 18, 2017

dgasmith commented Aug 18, 2017

tovrstra commented Aug 18, 2017

gkc1000 commented Aug 18, 2017

tovrstra commented Aug 18, 2017

dgasmith commented Aug 18, 2017

wadejong commented Aug 18, 2017 via email

tovrstra commented Aug 18, 2017

wadejong commented Aug 18, 2017 via email

tovrstra commented Aug 18, 2017

wadejong commented Aug 18, 2017 via email

tovrstra commented Aug 22, 2017

wadejong commented Aug 22, 2017 via email

wadejong commented Aug 23, 2017 via email

tovrstra commented Aug 23, 2017

FarnazH commented Apr 7, 2019

cryos commented Apr 8, 2019

tovrstra commented Apr 8, 2019

dgasmith commented Apr 23, 2019

twindus commented Apr 23, 2019

tovrstra commented Apr 23, 2019

dgasmith commented Apr 23, 2019

tovrstra commented Apr 23, 2019

dgasmith commented Apr 23, 2019

tovrstra commented Apr 24, 2019

dgasmith commented May 7, 2019

tovrstra commented May 7, 2019

dgasmith commented Jun 4, 2019

evaleev commented Jun 4, 2019

tovrstra commented Jun 5, 2019

dgasmith commented Jun 5, 2019

evaleev commented Jun 5, 2019 via email

tovrstra commented Jun 6, 2019

dgasmith commented Jun 6, 2019 • edited Loading

loriab commented Oct 3, 2019

devinamatthews commented Oct 3, 2019

tovrstra commented Aug 18, 2017 •

edited

Loading

dgasmith commented Jun 6, 2019 •

edited

Loading