program options beyond flat plain-old-data #70

loriab · 2019-05-21T19:29:46Z

At some point QCEngine will have to confront what options look like in non psi4/cfour/qchem-like programs. For example,

dft
  direct
  ...
end

is nwchem for boolean direct algorithm for dft. Another example is ESTATE=0/1/0/0 for an array variable in Cfour. even if the user knew the option and value they wanted (respectively, dft direct on and B1 state only in C2v), the settings of the keywords block in qcschema would be very different depending on whether they knew python best (dft_direct = True, estate=[0, 1, 0, 0]) or the target program domain language best. naturally, the input file must be formattable from the qcschema keywords dict.

My philosophy has been that the keyword RHS must be in natural python format (True, [0, 1, 0, 0]) and the LHS must be predictable by someone who knows the program DSL (domain specific lang) with double underscore being any module separator, so dft__direct and estate. That way, we’re only transforming, not making a new DSL. Somehow, will have to work Molpro into this.

This much, as I see it, is in the qcengine domain, not the qcdb (which is concerned with translating LHS options). Any concerns/disputes/that-doesn’t-belong-here-arguments before I act like this is qcng’s philosophy, too?

qcschema precedence is not fleshed out
On __-separated vs. nested dict: I used to use the nested dict but find the __ separator much easier on the user. Since nested dict is an intermediate in: __-sep-string --> nested-dict --> formatted-input, I can see allowing either at the qcng level.

The text was updated successfully, but these errors were encountered:

sjrl · 2019-05-21T20:20:06Z

How restrictive would this be for something like Molpro or Psi4? Would allowing the user to write custom inputs (handled through something like Jinja) go against the QCArchive framework?

loriab · 2019-05-21T20:50:48Z

I think jinja goes against QCA guarantees but not against QCA allowed entry points. So use at your own risk b/c not all input info is available for interpreting output.

This actually has to be the case b/c not all inputs fit in qcschema -- for example any multistage input violates the single driver field of qcschema.

What I personally would like to see is that no job that should be expressable in a single qcschema (like mp2c) needs jinja. But I acknowledge that it can be a step along the way for things that are easier to parse than express in an input.

sjrl · 2019-05-21T20:54:15Z

That sounds reasonable to me. For my projects (including my MolSSI Seed project) I'll probably need the ability to express a complicated input file, but then parse fairly standard items (energy and forces) from the output file.

dgasmith · 2019-05-21T20:58:41Z

Right, I think the idea is compute is and always will be pure schema. However, if you want to alter the steps of the object you should be allowed to. We have the following layers for every executor:

build_input - Takes in Schema and perhaps other data to form an input file.
execute - Takes in a command line and a dictionary of filename : file content pairs, returns the content of all generated files.
parse_output - Takes in a dictionary of filename : file content pairs and return schema-like objects
compute - Calls the other three sequentially

For @sjrl you case would likely call the execute and parse_output as those should be canonical values (knowing that fields in the result might be slightly wonky).

Allowing this to go through the full QCArchive framework (including Fractal) would take some thinking. Discussing this in other areas what we would propose is something like the following (using embedding as a target):

Build a custom executor and call it something like molpro-embedded that has an expanded input syntax. This executor would subclass the current Molpro executor and modify the build_input command.
Register this with Engine (we need to add some hooks)
Submit tasks through Fractal with the extended input framework and submit these to the molpro-embedded program.

dgasmith · 2019-05-21T21:02:28Z

No real preference on the dunder keywords vs nested dictionaries. Curious if anyone else has thoughts here.

loriab · 2019-10-24T02:01:00Z

from nwchem QA/tests

(a)  grid fine
(b)  grid lebedev H 350 18 I 350 18
(c)  grid ssf euler lebedev 75 11

from nwchem docs

 GRID [(xcoarse||coarse||medium||fine||xfine) default medium] \
      [(gausleg||lebedev ) default lebedev ] \
      [(becke||erf1||erf2||ssf) default erf1] \
      [(euler||mura||treutler) default mura] \
      [rm <real rm default 2.0>] \
      [nodisk]

three schematizing principles

The keywords section shall be approximately Dict[str, Union[bool, str, int, float, List, Tuple, Dict[str, Any]]]. Any module hierarchy shall be represented
by double underscore. Any qcprogram direction is by prefixing the program
with an underscore (or 2), so gamess_contrl__scftyp='rhf'
Keywords should be independent and granular such that they're approximately 1:1 with
other programs, not 5:1. That is, no single method option should cover convergence, algorithm, and target roots all wrapped together. Also, separate, yet mutually exclusive, options sets are a last resort (e.g., three independent boolean options for rhf/uhf/rohf).
A user familiar with native QC program input deck and the schematizing
principles should be able to readily write out the qcschema keywords section (say, 95% of the time) without consulting a glossary.

finally, the question

The one-word case (a) is simple: nwchem__dft__grid = 'fine'. The (b) case also representable at nwchem__dft__grid__lebedev = {'H': (350, 18), 'I': (350, 18)}. This already works in the formatter. Case (c) is the troublemaker.

The repr most consistent with principle 3 would be nwchem__dft__grid = ['ssf', 'euler', {'lebedev': (75, 11)}] but this violates principles 1 and 2.
The repr most consistent with principle 2 would be the below but this invents "partition" and "radial_quadrature", thereby violating principle 3.

nwchem__dft__grid__partition = 'ssf'
nwchem__dft__grid__radial_quadrature = 'euler'
nwchem__dft__grid__lebedev = (75, 11)  # or {'': (75, 11)}

The compromises I can think of are:
- the tuple (understandable w/o consulting nwc manual?)

nwchem__dft__grid = ('ssf', 'euler')  # = 'ssf' would also work if only one needed
nwchem__dft__grid__lebedev = (75, 11)

the booleans (discouraged in principle 2)

nwchem__dft__grid__ssf = True
nwchem__dft__grid__euler = True
nwchem__dft__grid__lebedev = (75, 11)

the schism. throwing out principles 1 & 2 and optimizing for the user with principle 3 since QCSchema.keywords can hold it. this will break the 1:1 between qcng and qcdb representations.

Personally, I'd like to avoid the schism since I think crafting the qcschema keywords repr with the three principles in mind hits a lot of use cases. But I'm glad to hear other thoughts. Especially with Molpro lurking. :-)

P.S. In all this, ignore the formatting of the repr back to an input file. That tends to be straightforward and general. In this post, only concerned with the user representation of instructions to the qcprog through schema.

@vivacebelles

dgasmith · 2019-10-24T13:13:15Z

@mattwelborn @sjrl Good to get comments from you as well I think.

mattwelborn · 2019-10-24T13:26:47Z

How would this proposal interact with nested commands found in entos? e.g.

dft(
  structure( molecule = methanol )
  xc = PBE
  ao = 'Def2-SVP'
  df = 'Def2-SVP-JFIT'
)

versus

optimize(
  structure( molecule = methanol )
  dft(
    xc = PBE
    ao = 'Def2-SVP'
    df = 'Def2-SVP-JFIT'
  )
)

vivacebelles · 2019-10-24T15:20:58Z

Thinking about the first compromise, it may break principle 3 since there are options that would be nested while others are allowed to be listed together, i.e. nwchem_dft__grid__lebedev vs nwchem_dft__grid as it may not be intuitive how that split happens for native users.

The booleans appeal to me, but that's also probably due to familiarity in some of the options we've been working on in qcdb.

loriab · 2019-10-24T16:51:25Z

@mattwelborn, I think since qcschema is single-calc focused, there isn't the possibility of that dft(...) section appearing twice, once in an opt and once in a sp. right? In that case, both you mentioned are the below. Is PBE and object or str?

entos__dft__xc = PBE
entos__dft__ao = 'Def2-SVP'
entos__dft__df = 'Def2-SVP-JFIT'

mattwelborn · 2019-12-18T22:25:16Z

entos EMFT won't work with this strategy:

emft(
  structure( molecule = methanol )
  active = [1,2]
  dft(
    xc = PBE
    ao = 'Def2-SVP'
  )
  dft(
    xc = LDA
    ao = 'STO-3G'
  )
)

mattwelborn · 2019-12-18T22:25:52Z

See also the nightmare that is Molpro: #198

loriab · 2019-12-19T03:39:29Z

Is the emft two calculations or one?

mattwelborn · 2019-12-19T12:44:33Z

It's one calculation with two coupled subsystems whose Fock builders are specified by the dft commands.

loriab mentioned this issue Dec 19, 2019

Complex, ordered multi-command input for Molpro #198

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

program options beyond flat plain-old-data #70

program options beyond flat plain-old-data #70

loriab commented May 21, 2019

sjrl commented May 21, 2019

loriab commented May 21, 2019

sjrl commented May 21, 2019

dgasmith commented May 21, 2019 •

edited

Loading

dgasmith commented May 21, 2019

loriab commented Oct 24, 2019

dgasmith commented Oct 24, 2019

mattwelborn commented Oct 24, 2019 •

edited

Loading

vivacebelles commented Oct 24, 2019

loriab commented Oct 24, 2019

mattwelborn commented Dec 18, 2019 •

edited

Loading

mattwelborn commented Dec 18, 2019

loriab commented Dec 19, 2019

mattwelborn commented Dec 19, 2019

program options beyond flat plain-old-data #70

program options beyond flat plain-old-data #70

Comments

loriab commented May 21, 2019

sjrl commented May 21, 2019

loriab commented May 21, 2019

sjrl commented May 21, 2019

dgasmith commented May 21, 2019 • edited Loading

dgasmith commented May 21, 2019

loriab commented Oct 24, 2019

from nwchem QA/tests

from nwchem docs

three schematizing principles

finally, the question

dgasmith commented Oct 24, 2019

mattwelborn commented Oct 24, 2019 • edited Loading

vivacebelles commented Oct 24, 2019

loriab commented Oct 24, 2019

mattwelborn commented Dec 18, 2019 • edited Loading

mattwelborn commented Dec 18, 2019

loriab commented Dec 19, 2019

mattwelborn commented Dec 19, 2019

dgasmith commented May 21, 2019 •

edited

Loading

mattwelborn commented Oct 24, 2019 •

edited

Loading

mattwelborn commented Dec 18, 2019 •

edited

Loading