Bad wB97X-D3BJ energies with MKL & in-core DFT #2279

jminuse · 2021-08-17T22:31:16Z

I am seeing large differences in wB97X-D3BJ energy between 1.4.0 and 1.3.2, and between different 1.4.0 installations. It seems that installing 1.4.0 with -c anaconda can cause the differences between 1.4.0 installations, possibly because it replaces the default linear algebra libraries with MKL versions. Such an installation runs 50% faster, but also gives wrong energies in some situations, sometimes by more than a Hartree.

I've only seen the problem with clusters and large basis sets, which suggests it's a numerical issue. I've tested PBE, M06-2X, and wB97X-D3BJ, and so far it only appears in wB97X-D3BJ. Also, the error goes away if less RAM is provided (say, 10 GB instead of 32 GB). This suggests it may be related to the new ability of Psi4 1.4.0 to do in-core omega integrals (#1749).

Working env: conda create --name psi4_v1.4.0 python=3.8 psi4 psi4-rt -c psi4 -y

Broken env: conda create --name psi4_v1.4.0_mkl python=3.8 psi4 psi4-rt -c psi4 -c anaconda -y

Example script: https://drive.google.com/file/d/1c0wZO47h9ooRXQMzTW9eETLWozo4MT_O/view?usp=sharing

To reproduce: install psi4 via conda with -c anaconda as shown, activate the env, then run the provided script python psi4_1.4.0_omega_issue.py. The energy should be approximately -1965.2319, but will instead give something like -1963.3023.

The text was updated successfully, but these errors were encountered:

susilehtola · 2021-08-17T23:26:43Z

The energy should be approximately -1965.2319, but will instead give something like -1963.3023.

How do you know the correct energy? Have you run the calculation with a completely independent implementation?

I think there might also be a difference in libxc between Psi4 1.3.2 and 1.4.0: range separated hybrids were basically fully rewritten in libxc 5.1.0, and should now be correct.

addendum: of course if the calculations match in small basis sets, it can't be a libxc issue

jminuse · 2021-08-18T00:01:49Z

With wB97X-D3BJ/6-31G* on the same system, the non-MKL install gives -1964.4305 Hartree for the same system, and the MKL version blows up, not converging within 100 SCF iterations (the non-converged energies are around 66325650 Hartree). So it looks like you don't need a big basis to observe this instability, which is good for testing. But, this is more evidence that the MKL install has a problem.

jminuse · 2021-08-18T00:11:08Z

If the argument to psi4.set_memory() is reduced to 2 GB (forcing the disk algorithm), the MKL install gives an energy for wB97X-D3BJ/6-31G* of -1964.4297 Hartree, which is reasonable. So I continue to believe that the in-core algorithm is implicated, or at least magnifying an existing problem.

susilehtola · 2021-08-18T03:28:09Z

If there’s a difference between the MKL and non-MKL version, I bet the culprit is a memory bug / wrong GEMM arguments in the in-core code.

loriab · 2021-08-18T05:10:25Z

mem_df definitely a culprit. thanks for the nice (and alarming) bug report.

import numpy as np
import os
import time
import psi4

if True:
    print('PSI_SCRATCH =', os.environ['PSI_SCRATCH'])
    print('psi4.__version__ =', psi4.__version__, flush=True)
    psi4.set_num_threads(8)

    name = 'VC3_PF6-disk-hcccl-'
    psi4.core.set_output_file(name + ".psi4", False)
    atomic_numbers = np.array([ 6,  8,  6,
        8,  6,  8,  1,  1, 
      #15,  9,  9,  9,  9,  9,  9,  
        6,  8,
        6,  8,  6,  8,  1,  1,  6,  8,  6,  8,  6,  8,  1,  1])
    xyz = np.array([
       [-2.12259645, -4.46179193, -4.79094066],
       [-1.36057145, -5.64933493, -5.09984066],
       [-0.06176545, -5.40039593, -4.86425666],
       [ 0.05314155, -4.09660393, -4.45182566],
       [-1.21369645, -3.60166993, -4.35368166],
       [ 0.86471855, -6.13126393, -4.96847966],
       [-1.29658945, -2.54273193, -4.07919166],
       [-3.17198045, -4.54625893, -4.99673066],
    #   [-2.26513845,  1.50526007,  2.66812335],
    #   [-1.97795045,  2.89970607,  3.70833935],
    #   [-2.73448445,  0.36314907,  1.64665735],
    #   [-3.68915045,  1.37638107,  3.54352335],
    #   [-3.10364045,  2.56795607,  1.63796935],
    #   [-1.58272645,  0.50145207,  3.74767835],
    #   [-0.85552445,  1.79343707,  1.96334535],
       [ 1.55245855, -0.55730393,  0.79347735],
       [ 1.57807055, -0.59638793, -0.58135765],
       [ 0.33353455, -1.00837593, -0.99842865],
       [-0.41241445, -1.33766393,  0.12858235],
       [ 0.35232355, -0.92987693,  1.24440635],
       [-0.00657745, -1.20355993, -2.13075665],
       [-0.14947445, -1.00267793,  2.20235835],
       [ 2.42478555, -0.18029193,  1.17515335],
       [ 2.85402655,  2.28807407,  1.17704435],
       [ 3.91722855,  3.22559407,  1.59750135],
       [ 4.48916555,  3.69442507,  0.51191135],
       [ 3.90246455,  3.14933007, -0.59139665],
       [ 2.91594555,  2.34754307, -0.13364465],
       [ 5.41812755,  4.46140507,  0.48393635],
       [ 2.40009555,  1.78156907, -0.90001365],
       [ 2.44725055,  1.74621807,  2.02374335]])
#    net_charge = -1
    net_charge = 0

    bigmol = psi4.core.Molecule.from_arrays(geom=xyz, molecular_charge=net_charge, elez=atomic_numbers)

    theory = "wB97X/6-31G*"
    psi4.set_options({
        'scf_type': 'disk_DF',
        #'scf_type': 'mem_DF',
        "guess": "sad",
        "maxiter": 20,
    })
    psi4.set_memory("10.0 gib")
    E = psi4.energy(theory, molecule=bigmol)
    print(E)
    psi4.compare_values(-1023.23875023455, E, 6, "matches disk_df")

mem_df

                           Total Energy        Delta E     RMS |[F,P]|

   @DF-RKS iter SAD:  -946.81473095199294   -9.46815e+02   0.00000e+00
   @DF-RKS iter   1:  -853.16313969519842    9.36516e+01   2.49164e-02 DIIS
   @DF-RKS iter   2:   205.11141330156607    1.05827e+03   9.66100e-02 DIIS
   @DF-RKS iter   3:    51.44551681524645   -1.53666e+02   8.25708e-02 DIIS
   @DF-RKS iter   4:    32.72889827880818   -1.87166e+01   8.07939e-02 DIIS
   @DF-RKS iter   5:    30.01651359226875   -2.71238e+00   8.03146e-02 DIIS
   @DF-RKS iter   6:    20.71735979257003   -9.29915e+00   7.94275e-02 DIIS
   @DF-RKS iter   7:    20.19073835153742   -5.26621e-01   7.92802e-02 DIIS
   @DF-RKS iter   8:    19.59649838936669   -5.94240e-01   7.91696e-02 DIIS
   @DF-RKS iter   9:    18.59642511624830   -1.00007e+00   7.89379e-02 DIIS
   @DF-RKS iter  10:    18.52526838311312   -7.11567e-02   7.89828e-02 DIIS
   @DF-RKS iter  11:    17.83802341987538   -6.87245e-01   7.89175e-02 DIIS
   @DF-RKS iter  12:    36.67699286161224    1.88390e+01   8.41411e-02 DIIS
   @DF-RKS iter  13:    40.04638640903951    3.36939e+00   8.47035e-02 DIIS
   @DF-RKS iter  14:    40.73405734710366    6.87671e-01   8.49685e-02 DIIS
   @DF-RKS iter  15:    40.85000106967441    1.15944e-01   8.49035e-02 DIIS
   @DF-RKS iter  16:    40.80704656058642   -4.29545e-02   8.49241e-02 DIIS
   @DF-RKS iter  17:    40.82344910011258    1.64025e-02   8.49174e-02 DIIS
   @DF-RKS iter  18:    40.81726341428607   -6.18569e-03   8.49197e-02 DIIS
   @DF-RKS iter  19:    40.81944989569433    2.18648e-03   8.49189e-02 DIIS
   @DF-RKS iter  20:    40.81866829927370   -7.81596e-04   8.49192e-02 DIIS

loriab · 2021-08-18T05:18:33Z

Whoa, and threading matters. above does not converge with 8 threads, but it converges and only differs from disk_df by 7e-5 with 1 thread.

susilehtola · 2021-08-18T09:06:52Z

Whoa, and threading matters.

Which could be caused by a memory bug, since you need more memory for more threads...

andysim · 2021-08-18T20:00:30Z

Thanks so much for the detailed error report. I can reproduce this error, and am getting closer to finding the culprit. In the meantime, if you add the option wcombine=False the error should disappear. If my initial tests are correct, it appears that the new in-memory algorithm that combines the omega exchange and conventional exchange terms into one calculation is incorrect if more than one thread is used.

tallakahath · 2021-09-30T16:28:16Z

Hi everyone,

I'm on 1.4 and have cherry-picked in the changes from 9163cbd. I'm building from source, as may be relevant to the MKL issues listed in #2283 I'm building with mkl/2019.0.117 (and stuck here for a bit because of my need for MKL_DEBUG_CPU_TYPE to continue to work). In addition, I have gcc/9.2.0 and icc/2020.2-108 going in my build env, with the C and CXX and Fortran compilers set to the intel compilers in my cmake config options.

I'm running calculations with wB97M-V and was noticing the same issues @jminuse was. After cherry-picking and recompiling the issue persists. Adding set wcombine false, as suggested by @andysim makes the issue go away.

Without set wcombine false:

    ==> Integral Setup <==
  
    DFHelper Memory: AOs need 4.329 GiB; user supplied 4.329 GiB. Using in-core AOs.
...
    ==> Iterations <==
  
                             Total Energy        Delta E     RMS |[F,P]|
  
     @DF-RKS iter SAD:  -306.90903258181044   -3.06909e+02   0.00000e+00 
     @DF-RKS iter   1:  -308.38044687527724   -1.47141e+00   1.27547e-03 DIIS
     @DF-RKS iter   2:  -308.40241691478502   -2.19700e-02   1.31335e-03 DIIS
     @DF-RKS iter   3:  -308.53663731037796   -1.34220e-01   4.92966e-04 DIIS
     @DF-RKS iter   4:  -308.55582354785412   -1.91862e-02   1.78690e-04 DIIS
     @DF-RKS iter   5:  -308.55841981379780   -2.59627e-03   4.24843e-05 DIIS
     @DF-RKS iter   6:  -308.55870799442886   -2.88181e-04   2.96248e-05 DIIS
     @DF-RKS iter   7:  -308.55881660865606   -1.08614e-04   1.50957e-05 DIIS
     @DF-RKS iter   8:  -308.55887443471090   -5.78261e-05   8.58740e-06 DIIS
     @DF-RKS iter   9:  -308.55891429039167   -3.98557e-05   5.23007e-06 DIIS                                                                                                                 
     @DF-RKS iter  10:  -308.55893438187678   -2.00915e-05   3.97102e-06 DIIS                                                                                                                 
     @DF-RKS iter  11:  -308.55897076821287   -3.63863e-05   3.09810e-06 DIIS                                                                                                                 
     @DF-RKS iter  12:  -308.55899251437626   -2.17462e-05   1.29222e-06 DIIS                                                                                                                 
     @DF-RKS iter  13:  -308.55899611302931   -3.59865e-06   5.63064e-07 DIIS                                                                                                                 
     @DF-RKS iter  14:  -308.55899645885273   -3.45823e-07   2.29741e-07 DIIS                                                                                                                 
    Energy and wave function converged.

And with set wcombine false:

    ==> Integral Setup <==
    
    DFHelper Memory: AOs need 6.464 GiB; user supplied 6.464 GiB. Using in-core AOs.
...
   ==> Iterations <==
 
                            Total Energy        Delta E     RMS |[F,P]|
 
    @DF-RKS iter SAD:  -306.90916690803959   -3.06909e+02   0.00000e+00 
    @DF-RKS iter   1:  -306.98727406041530   -7.81072e-02   1.83756e-03 DIIS
    @DF-RKS iter   2:  -307.13667252428678   -1.49398e-01   1.47236e-03 DIIS
    @DF-RKS iter   3:  -307.32880507487312   -1.92133e-01   1.25216e-04 DIIS
    @DF-RKS iter   4:  -307.32990958370664   -1.10451e-03   7.65745e-05 DIIS
    @DF-RKS iter   5:  -307.33036354773054   -4.53964e-04   1.70705e-05 DIIS
    @DF-RKS iter   6:  -307.33039129462924   -2.77469e-05   5.09682e-06 DIIS
    @DF-RKS iter   7:  -307.33039390006070   -2.60543e-06   1.20852e-06 DIIS
    @DF-RKS iter   8:  -307.33039428809019   -3.88029e-07   4.42881e-07 DIIS
   Energy and wave function converged.

Both jobs were run with 8 threads and 29337MB of memory, on the same machine (An Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz). As my be relevant, other settings are:

 guess sad
 fail_on_maxiter false
 dft_nuclear_scheme SBECKE
 dft_radial_scheme EM
 dft_radial_points 99
 dft_spherical_points 590
 dft_vv10_radial_points 50
 dft_vv10_spherical_points 194
 dft_pruning_scheme robust
 }

Should I have expected cherry-picking 9163cbd into 1.4 to have resolved this discrepancy? Or should I be making set wcombine false a standard part of my scripts for now?

Thanks!

loriab · 2021-10-07T17:12:13Z

@andysim's investigation yielded further oddities that Intel is working on. But yours is the first case that the cherry-pick hasn't fixed, @tallakahath . Would you post the input, please? Yes, it might be advisable to set wcombine false for now to be confident.

tallakahath · 2021-10-09T01:16:11Z

Ha, I was trying to come up with a minimal reproducible example and found the story somewhat more complicated. We're using a custom basis set, and it's nearest sibling (aug-cc-pvdz) doesn't reproduce the bug. Further, if I remove the mol.set_geometry call (which is just reloading in the same geometry, modulo some flutter from unit conversion and digit truncation (this was part of a bigger calc that keeps updating the geometry but only this section was required for reproduction), the bug goes away...

Here is the file, attached so it's not GIANT in the field here b/c it has the custom basis spec:

run.txt

And here are the results I got running with and without wcombine:
run_wcombine_out.txt
run_nowcombine_out.txt

I'm stumped here, especially by how hard it turned out to be to get the bug to happen. But it happens reliably if I run both of those files --memory 27.3GB -n8 . So... yeah.

I've been running a few thousand jobs through now with set wcombine false and haven't seen any suspicious energies (I'm comparing to PNO-LCCSD(T*)-F12b in a comparable basis set, and this was producing several-kcals-of-difference vs the usual sub-kcal I was seeing in cases that didn't run into the bug), so it looks like it's a valid workaround for me. But yeah, I'm a little... baffled.

tallakahath · 2021-10-09T22:56:26Z

An update - after a colleague rebooted the node I'd been using for testing with the noxsave kernel parameter (disabling AVX capability visibility in the CPU flags), I retried the same test files with the same settings and env on the same machine.

...and now I get the "correct" answer -- the one I'd get with set wcombine false.

So I think despite 9163cbd there is still a sketchy use of DGETRI or DGETRF somewhere getting pulled in by an edge-case I'm hitting (because, again, if I tweak the number of processes, or the memory, or the basis set, or the geometry ever so slightly, it goes away!).

I think I should flag @andysim here?

I'll continue with wcombine for my genrealized fix and I'm am happy to keep testing this weird job on the same machine as patches come down-the-line.

JonathonMisiewicz · 2022-08-03T18:22:28Z

https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Memory-data-race-problems-in-DGETRI/m-p/1313691#M32027

Andy's Intel report is officially listed as a false positive.

loriab added bug correctness-error For issues where Psi4 gives answers that are wrong. scf Involves general SCF: convergence algorithms, RHF/UHF/ROHF/CUHF... labels Aug 18, 2021

andysim mentioned this issue Aug 24, 2021

Make combined omega DF algorithm consistent with uncombined version. #2283

Merged

5 tasks

JonathonMisiewicz closed this as completed in #2283 Sep 28, 2021

JonathonMisiewicz reopened this Sep 30, 2021

hokru mentioned this issue Nov 22, 2021

disables wcombine option #2362

Merged

2 tasks

pablo-unzueta mentioned this issue Oct 5, 2022

Cannot reproduce wB97M-D3(BJ) energies and forces openmm/spice-dataset#50

Closed

pavankum mentioned this issue Jan 11, 2023

Weirdly small DFT total energies in SPICE dipeptide dataset openmm/spice-dataset#60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad wB97X-D3BJ energies with MKL & in-core DFT #2279

Bad wB97X-D3BJ energies with MKL & in-core DFT #2279

jminuse commented Aug 17, 2021

susilehtola commented Aug 17, 2021 •

edited

Loading

jminuse commented Aug 18, 2021

jminuse commented Aug 18, 2021

susilehtola commented Aug 18, 2021 via email

loriab commented Aug 18, 2021

loriab commented Aug 18, 2021

susilehtola commented Aug 18, 2021

andysim commented Aug 18, 2021

tallakahath commented Sep 30, 2021

loriab commented Oct 7, 2021

tallakahath commented Oct 9, 2021

tallakahath commented Oct 9, 2021

JonathonMisiewicz commented Aug 3, 2022

Bad wB97X-D3BJ energies with MKL & in-core DFT #2279

Bad wB97X-D3BJ energies with MKL & in-core DFT #2279

Comments

jminuse commented Aug 17, 2021

susilehtola commented Aug 17, 2021 • edited Loading

jminuse commented Aug 18, 2021

jminuse commented Aug 18, 2021

susilehtola commented Aug 18, 2021 via email

loriab commented Aug 18, 2021

loriab commented Aug 18, 2021

susilehtola commented Aug 18, 2021

andysim commented Aug 18, 2021

tallakahath commented Sep 30, 2021

loriab commented Oct 7, 2021

tallakahath commented Oct 9, 2021

tallakahath commented Oct 9, 2021

JonathonMisiewicz commented Aug 3, 2022

susilehtola commented Aug 17, 2021 •

edited

Loading