Skip to content
Julien Steffen edited this page Dec 18, 2023 · 48 revisions

Introduction

This program generates and optimizes the parameters of EVB coupling terms. The coupling smoothly connects two QMDFFs, therefore these need to be present (generated with qmdffgen) when an EVB coupling term is optimized. In addition, some reference data describing the reaction must be present, serving as the training set for the optimization routine. The xyz-coordinates of the structures and the respective reference energies in a linewise format must be present. Usually, a reaction path, generated from a NEB or IRC calculation, would be used for this. For the RP-EVB and the related TREQ method, a reaction path is mandatory.

List of examples:

Example 1: dE-EVB

Folder: evbopt/dE-EVB

The simplest EVB coupling version is the energy-gap (dE)-EVB coupling. Here, the off-diagonal coupling term depends only on the energetic distances of both QMDFFs at the current coordinates (delta E or short dE). A bunch of different functions depending on dE can be used, the easiest are the constant coupling a with no dE-dependence at all and the 1g coupling C= a exp(-b deltaE^2) with a Gaussian shape.

The main input file is the \texttt{evbopt_de_1g.key} file, where all essentially all control parameters are listed:

pes de_evb
energies_ref ref.dat
coords_ref struc.xyz

qmdff {
   ffnames min1.qmdff min2.qmdff
   eshift  -272.22659664700754  -272.16798367875066
   shift_manual
}

de_evb{
   coupling sp
}

The first keyword is always needed and specifies the used class of PES functions. Here, we want to use the dE-EVB coupling (de_evb). Other options (for evbopt) are dQ-EVB (dq_evb) and DG-EVB (dg_evb), see below.

Both the energies_ref and coords_ref keywords are needed to provide the needed reference data for the reaction of interest that shall be described with EVB. Usually this would be a one-dimensional reaction path, but higher dimensional sampling data could be used as well.

The keywords in the qmdff section: ffnames and eshift specifiy the properties of the single QMDFFs, both their filenames and the energy shifts, where shift_manual prevents the automatic correction of QMDFF shifts to the first/last given energy point of the reaction path. These parameters are written automatically if qmdffgen is executed first (see section qmdffgen).

The optimization is invoked by typing:

evbopt.x evbopt.key

A multi-start local-search Levenberg-Marquardt optimization is executed with the objective to minimize the error sum between the EVB-QMDFF energies and the reference energies listed in ref.dat, an usual output is:

...
Used off-diagonal-Term:
a*exp(-b*deltaE^2)+c*deltaE*exp(-d*deltaE^2)   (sp)
After Multi-Start-Local-Search step:           1
Local optimization finished, new best fitness:   8.3897511547073188E-004
After Multi-Start-Local-Search step:          15
Local optimization finished, new best fitness:   8.3877256721804922E-004
After Multi-Start-Local-Search step:          35
Local optimization finished, new best fitness:   8.3632257233213895E-004
.. .----. -- -.. --- -. . 
Multi start local optimization finished!
The best fitness-value is:   8.3632257233213895E-004
Look into evbopt.log for further details.
...

The total fitness is the central indicator of how good the reference reaction path is reproduced with the EVB-QMDFF model, however, in order to gain more detailed knowledge of the quality, one can look at the additional output, namely:

  • energies_final.dat: The EVB-QMDFF energies of all reference structures
  • single_qmdff.dat: The energies of both QMDFFs of all reference structure

Plotting the files together with the reference energies in ref.dat results in the figure shown in the figure:

drawing

The optimized EVB parameters (which are needed for subsequent calculations with other programs like dynamic), are written to file evb_pars.dat. The resulting file will look something like this (the exact values differ in each run due to the randomized parameter search algorithm):

** This file contains parameters for a EVB-dE-calculation **
  0.11580080347625064     
  5.7673300124919509     
  1.9401589887079447E-004
 -4.7760155261137269

Example 2: dQ-EVB

Folder: evbopt/dQ-EVB

Within coordinate-dependent or dQ-EVB, the coupling strength depends on the distance to a chosen point within the configuation space of the system. Usually, the transition state (TS) of the reaction (or a structure nearby) should be chosen to be that point. In order to get a rotationally invariant description, the distance to the TS is measured in internal coordinates. It is not required to define a full set of 3N-6 internal coordinates for this, instead, a small subset being representative for the reaction is usually sufficient and more effective.

The main input file is again the evbopt.key file, containing all general settings:

pes dq_evb
energies_ref energies_ref.dat
coords_ref path.xyz

qmdff {
   ffnames min1.qmdff  min2.qmdff
   eshift -195.19018649184099  -195.19422087959535
}

dq_evb {
   ts_file ts.xyz
   coupling sd2
}

Its structure is mostly identical to that for the dE optimization. The dQ coupling term is activated with the pes dq_evb keyword. Names and energy shifts of QMDFFs are read in from the qmdff section, as well as the structures and the energies of the reaction path serving as reference for the parameter optimization (energies_ref energies_ref.dat and coords_ref path.xyz).

The dq_evb section provides specific settings for the dQ coupling term: the coupling function (sd2) and the structure of the reference point, in this case also the TS: ts_file ts.xyz.

Besides evbopt.key and the other files speficied in it (min1.qmdff, min2.qmdff, energies_ref.dat, path.xyz and ts.xyz), the set of internal coordinates to be used must be defined manually and written to the file coord_def.inp, where one internal coordinate is defined in each line:

3 7
7 10
3 10
1 7

In this case, four bond lengths are the active set of internal coordinates, referring to the region most interesting for the reaction:

drawing

The optimization can be invoked as usual by typing

evbopt.x evbopt.key

The optimization will be finished after a fraction of a second, printing something like this:

...
Used off-diagonal-Term:
a*exp(-b*deltaQ^2)+c*deltaQ^2*exp(-d*deltaQ^2)+
              e*deltaQ^2*exp(-f*deltaQ^2)   (sd2)
 . ...- -... --.- -- -.. ..-. ..-.
After Multi-Start-Local-Search step:           1
Local optimization finished, new best fitness:   1.4688854696209078E-003
After Multi-Start-Local-Search step:           2   
Local optimization finished, new best fitness:   1.4679300705026375E-003
After Multi-Start-Local-Search step:           5
Local optimization finished, new best fitness:   1.4677972875365361E-003
After Multi-Start-Local-Search step:           7
Local optimization finished, new best fitness:   1.4675796326565325E-003
After Multi-Start-Local-Search step:          26
Local optimization finished, new best fitness:   1.4675447295631309E-003
.. .----. -- -.. --- -. . 
Multi start local optimization finished!
The best fitness-value is:   1.4675447295631309E-003
Look into evbopt.log for further details.
...

Besides the file energies_final.dat with the (optimized) EVB-QMDFF energies of the structures in path.xyz and single_qmdff.dat containing the energies of both QMDFFs for those structures, the optimized coupling parameters are again written to evb_pars.dat. The quality of the optimized coupling can again be accessed by plotting QMDFF and EVB-QMDFF energies besides the reference energies:

drawing

Example 3: DG-EVB

Folder: evbopt/DG-EVB

The distributed Gaussian (DG-EVB) coupling term enables the inclusion of QM-reference gradients and frequencies into the formulation of the transition region, offering a much better quality than dE-EVB or dQ-EVB.

The following explanation and the example assumes that an optimized 1D reaction path of the reaction of interest is given. In principle, however, arbitrary collections of reference data, e.g. from MD samplings, can be used as well.

Including gradients and frequencies to the used reference set needs somewhat more preparational work than using only energies of all structures of the reaction path. The first question is: how many and which points along the reaction path shall be chosen. In principle, we could simply choose all given structures and calculate gradients and frequencies on it. Depending on the resolution of the path, dozens to hundreds of frequency calculations might be needed, which almost always implies heavy computational demands. Further, the separate optimization of coupling basis function widths needs an additional set of energies along the path. In this example, the resulting input files are provided, such that this preparational part might be skipped. It serves, however, as a template for how to prepare an DG-EVB-QMDFF surface for other reactions of interest.... bla bla

For the manual preparation, an utility script for the selection and preparation of gradient/frequency reference is given in the scripts main folder. Copy the script dg_evb_ref.sh to the actual folder if you want to redo the reference calculations. It requires the reaction path stored in struc.xyz and a list of structure numbers on which the reference data shall be calculated: struc_num.txt. A file containing the header of the desired QM input (for Gaussan: gauss.com, for orca: orca.inp, for CP2K: run.inp) must be provided as well.

For convenience, the folder structure was already prepared for seven reference points distributed along the path such that the transition state and the important transition region is covered well. Gaussian is here the chosen QM program, the frequency calculations can be executed for all seven folders.

After this, the needed reference input file dg_ref.dat can be generated by executing the program gen_dg_evb.

The main input file for EVB optimizations is evbopt.key. It has the following entries:

pes dg_evb
energies_ref ref.dat
coords_ref struc.xyz

qmdff {
   ffnames min1.qmdff  min2.qmdff
   eshift  -132.2974220  -132.3079820
   shift_manual
}

dg_evb{
   mode 2
   points 7
   point_ref dg_ref.dat
   read_coord 
}

After specification of a DG-EVB potential energy surfaces, the information concerning the single QMDFFs and the used reference data for optimization is listed. All settings concerning the DG-EVB coupling term in particular are listed inside the dg_evb block. They have the following meaning:

  • mode: Which type of DG-EVB function built upon which kind of ererence shall be used. 1: only energies, 2: energies and gradients, 3 energies, gradients and Hessians.
  • points: Number of DG-EVB reference points on which the energies, gradients and Hessians were calculated.
  • point_ref: File in which the reference information for the DG-EVB points is stored.
  • read_coord: If given, the defined internal coordinates are read in from the file coord_def.inp, else, they are determined automatically.

Altogether, 4 or 5 different input files are needed to perform the optimization of a DG-EVB coupling term (named as above):

  • evbopt.key: Main input file
  • ref.dat: File with energies of the reaction path for Gaussian exponent optimization.
  • struc.xyz: File with structures of the reaction path.
  • dg_ref.dat: Reference information for the DG-EVB points.
  • coord_def.inp: Definition of internal coordinates for DG-EVB coupling (optional).

The dg_ref.dat file needs to have a predefined formate, which is generated by the gen_dg_ref program, given in the scripts folder. Essentially, energies, geometries, gradients and Hessians of all reference points are there listed in order, here for seven reference points along the path:

NPOINTS 7

NATOM  6

*POINT 1

ENERGY    -132.073392

GEOMETRY
  -0.2598966E+01  -0.1689562E+00  -0.2337807E-02  -0.2485496E+01   0.1654694E+01
   0.3883103E-01   0.2523122E+01  -0.1090708E+00  -0.8583873E-02   0.1909804E+01
   0.1018732E+00   0.1786691E+01   0.1266274E+01  -0.1242103E+01  -0.8934828E+00
   0.2439287E+01   0.1600681E+01  -0.8532495E+00
END

GRADIENT
   0.3391129E-02   0.5053883E-05  -0.3630294E-03   0.1683734E-03  -0.3711497E-05
   0.1950042E-03  -0.2993883E-02  -0.4142712E-05  -0.4566316E-04   0.3697710E-04
   0.4438535E-04   0.7573431E-04  -0.3227359E-03  -0.1882671E-04   0.1766393E-03
  -0.2798607E-03  -0.2275832E-04  -0.3868530E-04
END

HESSIAN
   0.1371910E-01   0.3332208E-01   0.2103093E-03  -0.4587440E-02  -0.3262215E-01
  -0.6211021E-03  -0.5506326E-02  -0.1298989E-02  -0.9317258E-05  -0.4769846E-03
....
END

*POINT 2
...

Depending on the chosen DG-EVB coupling mode, energies (and geometries) might be sufficient as well.

If internal coordinates for the coupling are given in the coord_def.inp file, a predefined format must be obeyed as well: One coordinate per line, each defined by the involved atom indices: Two for a bond, three for an angle, and four for dihedral/out of plane. For the latter one, a fifth number indicating if dihedral or out of plane must be added after the atom indices, such that each dihedral/out of plane line consists of five numbers.

In the example system, the internal coordinate set defined in coord\_def.inp consists of four bonds:

1 5
3 5
1 3
1 2

The choice of internal coupling coordinates is not exactly black-box. The usual rule is that coordinates involved in the reaction of interest are the most important, but the overall quality might significantly raise or fall if another assumingly unimportant coordinate is added or removed. In the evb_kt_driver program explained below a more systematic and quite well working approach of finding the internal coordinates for the there used TREQ PES is implemented.

The actual optimization of the DG-EVB coupling term is very simple: Just type

evbopt.x evbopt.key

and the MSLS-LM algorithm is executed to optimize the parameter. This will take some seconds to minutes.

The program output should look something like this:

...
Used DG-EVB mode: 3 - energies, gradients and hessians (E+G+H)
The dimension of the linear equation F=DB is          105
##################  QMDFFs!
 . ...- -... --.- -- -.. ..-. ..-.
DG-EVB-parameters are optimized with Multi Start Local Search.
We will start         100 single optimization runs.
The coefficient matrix D for DG-EVB will be calculated analytically!
Multi Start Local Search algorithm:
---------------------------------------------------
After Multi-Start-Local-Search step:           1
Local optimization not converged, new best fitness:   1.5411906000070418E-006
After Multi-Start-Local-Search step:           2
Local optimization finished, new best fitness:   1.5075939192383248E-006
After Multi-Start-Local-Search step:          10
Local optimization finished, new best fitness:   1.3147886030294805E-006
After Multi-Start-Local-Search step:          11
Local optimization finished, new best fitness:   1.2190281156368069E-006
After Multi-Start-Local-Search step:          16
Local optimization finished, new best fitness:   1.1965456308064260E-006
After Multi-Start-Local-Search step:          30
Local optimization finished, new best fitness:   1.1388462232211951E-006
After Multi-Start-Local-Search step:          78
Local optimization finished, new best fitness:   1.1059136690605482E-006
.. .----. -- -.. --- -. . 
Multi start local optimization finished!
The best fitness-value is:   1.1059136690605482E-006
Look into evbopt.log for further details.
...

The total fitness is the central indicator of how good the reference reaction path is reproduced with the EVB-QMDFF model, however, in order to gain more detailed knowledge of the quality, one can look at the additional output, namely:

  • energies.qmdff: The EVB-QMDFF energies of all reference structures
  • single_qmdff.dat: The energies of both QMDFFs of all reference structure

Plotting the files together with the reference energies in ref.dat results in the figure shown in the figure below.

drawing