Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad wB97X-D3BJ energies with MKL & in-core DFT #2279

Open
jminuse opened this issue Aug 17, 2021 · 13 comments · Fixed by #2283
Open

Bad wB97X-D3BJ energies with MKL & in-core DFT #2279

jminuse opened this issue Aug 17, 2021 · 13 comments · Fixed by #2283
Labels
bug correctness-error For issues where Psi4 gives answers that are wrong. scf Involves general SCF: convergence algorithms, RHF/UHF/ROHF/CUHF...

Comments

@jminuse
Copy link

jminuse commented Aug 17, 2021

I am seeing large differences in wB97X-D3BJ energy between 1.4.0 and 1.3.2, and between different 1.4.0 installations. It seems that installing 1.4.0 with -c anaconda can cause the differences between 1.4.0 installations, possibly because it replaces the default linear algebra libraries with MKL versions. Such an installation runs 50% faster, but also gives wrong energies in some situations, sometimes by more than a Hartree.

I've only seen the problem with clusters and large basis sets, which suggests it's a numerical issue. I've tested PBE, M06-2X, and wB97X-D3BJ, and so far it only appears in wB97X-D3BJ. Also, the error goes away if less RAM is provided (say, 10 GB instead of 32 GB). This suggests it may be related to the new ability of Psi4 1.4.0 to do in-core omega integrals (#1749).

Working env: conda create --name psi4_v1.4.0 python=3.8 psi4 psi4-rt -c psi4 -y

Broken env: conda create --name psi4_v1.4.0_mkl python=3.8 psi4 psi4-rt -c psi4 -c anaconda -y

Example script: https://drive.google.com/file/d/1c0wZO47h9ooRXQMzTW9eETLWozo4MT_O/view?usp=sharing

To reproduce: install psi4 via conda with -c anaconda as shown, activate the env, then run the provided script python psi4_1.4.0_omega_issue.py. The energy should be approximately -1965.2319, but will instead give something like -1963.3023.

@susilehtola
Copy link
Member

susilehtola commented Aug 17, 2021

The energy should be approximately -1965.2319, but will instead give something like -1963.3023.

How do you know the correct energy? Have you run the calculation with a completely independent implementation?

I think there might also be a difference in libxc between Psi4 1.3.2 and 1.4.0: range separated hybrids were basically fully rewritten in libxc 5.1.0, and should now be correct.

addendum: of course if the calculations match in small basis sets, it can't be a libxc issue

@jminuse
Copy link
Author

jminuse commented Aug 18, 2021

With wB97X-D3BJ/6-31G* on the same system, the non-MKL install gives -1964.4305 Hartree for the same system, and the MKL version blows up, not converging within 100 SCF iterations (the non-converged energies are around 66325650 Hartree). So it looks like you don't need a big basis to observe this instability, which is good for testing. But, this is more evidence that the MKL install has a problem.

@jminuse
Copy link
Author

jminuse commented Aug 18, 2021

If the argument to psi4.set_memory() is reduced to 2 GB (forcing the disk algorithm), the MKL install gives an energy for wB97X-D3BJ/6-31G* of -1964.4297 Hartree, which is reasonable. So I continue to believe that the in-core algorithm is implicated, or at least magnifying an existing problem.

@susilehtola
Copy link
Member

susilehtola commented Aug 18, 2021 via email

@loriab
Copy link
Member

loriab commented Aug 18, 2021

mem_df definitely a culprit. thanks for the nice (and alarming) bug report.

import numpy as np
import os
import time
import psi4

if True:
    print('PSI_SCRATCH =', os.environ['PSI_SCRATCH'])
    print('psi4.__version__ =', psi4.__version__, flush=True)
    psi4.set_num_threads(8)

    name = 'VC3_PF6-disk-hcccl-'
    psi4.core.set_output_file(name + ".psi4", False)
    atomic_numbers = np.array([ 6,  8,  6,
        8,  6,  8,  1,  1, 
      #15,  9,  9,  9,  9,  9,  9,  
        6,  8,
        6,  8,  6,  8,  1,  1,  6,  8,  6,  8,  6,  8,  1,  1])
    xyz = np.array([
       [-2.12259645, -4.46179193, -4.79094066],
       [-1.36057145, -5.64933493, -5.09984066],
       [-0.06176545, -5.40039593, -4.86425666],
       [ 0.05314155, -4.09660393, -4.45182566],
       [-1.21369645, -3.60166993, -4.35368166],
       [ 0.86471855, -6.13126393, -4.96847966],
       [-1.29658945, -2.54273193, -4.07919166],
       [-3.17198045, -4.54625893, -4.99673066],
    #   [-2.26513845,  1.50526007,  2.66812335],
    #   [-1.97795045,  2.89970607,  3.70833935],
    #   [-2.73448445,  0.36314907,  1.64665735],
    #   [-3.68915045,  1.37638107,  3.54352335],
    #   [-3.10364045,  2.56795607,  1.63796935],
    #   [-1.58272645,  0.50145207,  3.74767835],
    #   [-0.85552445,  1.79343707,  1.96334535],
       [ 1.55245855, -0.55730393,  0.79347735],
       [ 1.57807055, -0.59638793, -0.58135765],
       [ 0.33353455, -1.00837593, -0.99842865],
       [-0.41241445, -1.33766393,  0.12858235],
       [ 0.35232355, -0.92987693,  1.24440635],
       [-0.00657745, -1.20355993, -2.13075665],
       [-0.14947445, -1.00267793,  2.20235835],
       [ 2.42478555, -0.18029193,  1.17515335],
       [ 2.85402655,  2.28807407,  1.17704435],
       [ 3.91722855,  3.22559407,  1.59750135],
       [ 4.48916555,  3.69442507,  0.51191135],
       [ 3.90246455,  3.14933007, -0.59139665],
       [ 2.91594555,  2.34754307, -0.13364465],
       [ 5.41812755,  4.46140507,  0.48393635],
       [ 2.40009555,  1.78156907, -0.90001365],
       [ 2.44725055,  1.74621807,  2.02374335]])
#    net_charge = -1
    net_charge = 0

    bigmol = psi4.core.Molecule.from_arrays(geom=xyz, molecular_charge=net_charge, elez=atomic_numbers)

    theory = "wB97X/6-31G*"
    psi4.set_options({
        'scf_type': 'disk_DF',
        #'scf_type': 'mem_DF',
        "guess": "sad",
        "maxiter": 20,
    })
    psi4.set_memory("10.0 gib")
    E = psi4.energy(theory, molecule=bigmol)
    print(E)
    psi4.compare_values(-1023.23875023455, E, 6, "matches disk_df")

mem_df

                           Total Energy        Delta E     RMS |[F,P]|

   @DF-RKS iter SAD:  -946.81473095199294   -9.46815e+02   0.00000e+00
   @DF-RKS iter   1:  -853.16313969519842    9.36516e+01   2.49164e-02 DIIS
   @DF-RKS iter   2:   205.11141330156607    1.05827e+03   9.66100e-02 DIIS
   @DF-RKS iter   3:    51.44551681524645   -1.53666e+02   8.25708e-02 DIIS
   @DF-RKS iter   4:    32.72889827880818   -1.87166e+01   8.07939e-02 DIIS
   @DF-RKS iter   5:    30.01651359226875   -2.71238e+00   8.03146e-02 DIIS
   @DF-RKS iter   6:    20.71735979257003   -9.29915e+00   7.94275e-02 DIIS
   @DF-RKS iter   7:    20.19073835153742   -5.26621e-01   7.92802e-02 DIIS
   @DF-RKS iter   8:    19.59649838936669   -5.94240e-01   7.91696e-02 DIIS
   @DF-RKS iter   9:    18.59642511624830   -1.00007e+00   7.89379e-02 DIIS
   @DF-RKS iter  10:    18.52526838311312   -7.11567e-02   7.89828e-02 DIIS
   @DF-RKS iter  11:    17.83802341987538   -6.87245e-01   7.89175e-02 DIIS
   @DF-RKS iter  12:    36.67699286161224    1.88390e+01   8.41411e-02 DIIS
   @DF-RKS iter  13:    40.04638640903951    3.36939e+00   8.47035e-02 DIIS
   @DF-RKS iter  14:    40.73405734710366    6.87671e-01   8.49685e-02 DIIS
   @DF-RKS iter  15:    40.85000106967441    1.15944e-01   8.49035e-02 DIIS
   @DF-RKS iter  16:    40.80704656058642   -4.29545e-02   8.49241e-02 DIIS
   @DF-RKS iter  17:    40.82344910011258    1.64025e-02   8.49174e-02 DIIS
   @DF-RKS iter  18:    40.81726341428607   -6.18569e-03   8.49197e-02 DIIS
   @DF-RKS iter  19:    40.81944989569433    2.18648e-03   8.49189e-02 DIIS
   @DF-RKS iter  20:    40.81866829927370   -7.81596e-04   8.49192e-02 DIIS

@loriab loriab added bug correctness-error For issues where Psi4 gives answers that are wrong. scf Involves general SCF: convergence algorithms, RHF/UHF/ROHF/CUHF... labels Aug 18, 2021
@loriab
Copy link
Member

loriab commented Aug 18, 2021

Whoa, and threading matters. above does not converge with 8 threads, but it converges and only differs from disk_df by 7e-5 with 1 thread.

@susilehtola
Copy link
Member

Whoa, and threading matters.

Which could be caused by a memory bug, since you need more memory for more threads...

@andysim
Copy link
Member

andysim commented Aug 18, 2021

Thanks so much for the detailed error report. I can reproduce this error, and am getting closer to finding the culprit. In the meantime, if you add the option wcombine=False the error should disappear. If my initial tests are correct, it appears that the new in-memory algorithm that combines the omega exchange and conventional exchange terms into one calculation is incorrect if more than one thread is used.

@tallakahath
Copy link
Contributor

Hi everyone,

I'm on 1.4 and have cherry-picked in the changes from 9163cbd. I'm building from source, as may be relevant to the MKL issues listed in #2283 I'm building with mkl/2019.0.117 (and stuck here for a bit because of my need for MKL_DEBUG_CPU_TYPE to continue to work). In addition, I have gcc/9.2.0 and icc/2020.2-108 going in my build env, with the C and CXX and Fortran compilers set to the intel compilers in my cmake config options.

I'm running calculations with wB97M-V and was noticing the same issues @jminuse was. After cherry-picking and recompiling the issue persists. Adding set wcombine false, as suggested by @andysim makes the issue go away.

Without set wcombine false:

    ==> Integral Setup <==
  
    DFHelper Memory: AOs need 4.329 GiB; user supplied 4.329 GiB. Using in-core AOs.
...
    ==> Iterations <==
  
                             Total Energy        Delta E     RMS |[F,P]|
  
     @DF-RKS iter SAD:  -306.90903258181044   -3.06909e+02   0.00000e+00 
     @DF-RKS iter   1:  -308.38044687527724   -1.47141e+00   1.27547e-03 DIIS
     @DF-RKS iter   2:  -308.40241691478502   -2.19700e-02   1.31335e-03 DIIS
     @DF-RKS iter   3:  -308.53663731037796   -1.34220e-01   4.92966e-04 DIIS
     @DF-RKS iter   4:  -308.55582354785412   -1.91862e-02   1.78690e-04 DIIS
     @DF-RKS iter   5:  -308.55841981379780   -2.59627e-03   4.24843e-05 DIIS
     @DF-RKS iter   6:  -308.55870799442886   -2.88181e-04   2.96248e-05 DIIS
     @DF-RKS iter   7:  -308.55881660865606   -1.08614e-04   1.50957e-05 DIIS
     @DF-RKS iter   8:  -308.55887443471090   -5.78261e-05   8.58740e-06 DIIS
     @DF-RKS iter   9:  -308.55891429039167   -3.98557e-05   5.23007e-06 DIIS                                                                                                                 
     @DF-RKS iter  10:  -308.55893438187678   -2.00915e-05   3.97102e-06 DIIS                                                                                                                 
     @DF-RKS iter  11:  -308.55897076821287   -3.63863e-05   3.09810e-06 DIIS                                                                                                                 
     @DF-RKS iter  12:  -308.55899251437626   -2.17462e-05   1.29222e-06 DIIS                                                                                                                 
     @DF-RKS iter  13:  -308.55899611302931   -3.59865e-06   5.63064e-07 DIIS                                                                                                                 
     @DF-RKS iter  14:  -308.55899645885273   -3.45823e-07   2.29741e-07 DIIS                                                                                                                 
    Energy and wave function converged.

And with set wcombine false:

    ==> Integral Setup <==
    
    DFHelper Memory: AOs need 6.464 GiB; user supplied 6.464 GiB. Using in-core AOs.
...
   ==> Iterations <==
 
                            Total Energy        Delta E     RMS |[F,P]|
 
    @DF-RKS iter SAD:  -306.90916690803959   -3.06909e+02   0.00000e+00 
    @DF-RKS iter   1:  -306.98727406041530   -7.81072e-02   1.83756e-03 DIIS
    @DF-RKS iter   2:  -307.13667252428678   -1.49398e-01   1.47236e-03 DIIS
    @DF-RKS iter   3:  -307.32880507487312   -1.92133e-01   1.25216e-04 DIIS
    @DF-RKS iter   4:  -307.32990958370664   -1.10451e-03   7.65745e-05 DIIS
    @DF-RKS iter   5:  -307.33036354773054   -4.53964e-04   1.70705e-05 DIIS
    @DF-RKS iter   6:  -307.33039129462924   -2.77469e-05   5.09682e-06 DIIS
    @DF-RKS iter   7:  -307.33039390006070   -2.60543e-06   1.20852e-06 DIIS
    @DF-RKS iter   8:  -307.33039428809019   -3.88029e-07   4.42881e-07 DIIS
   Energy and wave function converged.

Both jobs were run with 8 threads and 29337MB of memory, on the same machine (An Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz). As my be relevant, other settings are:

 guess sad
 fail_on_maxiter false
 dft_nuclear_scheme SBECKE
 dft_radial_scheme EM
 dft_radial_points 99
 dft_spherical_points 590
 dft_vv10_radial_points 50
 dft_vv10_spherical_points 194
 dft_pruning_scheme robust
 }

Should I have expected cherry-picking 9163cbd into 1.4 to have resolved this discrepancy? Or should I be making set wcombine false a standard part of my scripts for now?

Thanks!

@loriab
Copy link
Member

loriab commented Oct 7, 2021

@andysim's investigation yielded further oddities that Intel is working on. But yours is the first case that the cherry-pick hasn't fixed, @tallakahath . Would you post the input, please? Yes, it might be advisable to set wcombine false for now to be confident.

@tallakahath
Copy link
Contributor

Ha, I was trying to come up with a minimal reproducible example and found the story somewhat more complicated. We're using a custom basis set, and it's nearest sibling (aug-cc-pvdz) doesn't reproduce the bug. Further, if I remove the mol.set_geometry call (which is just reloading in the same geometry, modulo some flutter from unit conversion and digit truncation (this was part of a bigger calc that keeps updating the geometry but only this section was required for reproduction), the bug goes away...

Here is the file, attached so it's not GIANT in the field here b/c it has the custom basis spec:

run.txt

And here are the results I got running with and without wcombine:
run_wcombine_out.txt
run_nowcombine_out.txt

I'm stumped here, especially by how hard it turned out to be to get the bug to happen. But it happens reliably if I run both of those files --memory 27.3GB -n8 . So... yeah.

I've been running a few thousand jobs through now with set wcombine false and haven't seen any suspicious energies (I'm comparing to PNO-LCCSD(T*)-F12b in a comparable basis set, and this was producing several-kcals-of-difference vs the usual sub-kcal I was seeing in cases that didn't run into the bug), so it looks like it's a valid workaround for me. But yeah, I'm a little... baffled.

@tallakahath
Copy link
Contributor

An update - after a colleague rebooted the node I'd been using for testing with the noxsave kernel parameter (disabling AVX capability visibility in the CPU flags), I retried the same test files with the same settings and env on the same machine.

...and now I get the "correct" answer -- the one I'd get with set wcombine false.

So I think despite 9163cbd there is still a sketchy use of DGETRI or DGETRF somewhere getting pulled in by an edge-case I'm hitting (because, again, if I tweak the number of processes, or the memory, or the basis set, or the geometry ever so slightly, it goes away!).

I think I should flag @andysim here?

I'll continue with wcombine for my genrealized fix and I'm am happy to keep testing this weird job on the same machine as patches come down-the-line.

@hokru hokru mentioned this issue Nov 22, 2021
2 tasks
@JonathonMisiewicz
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug correctness-error For issues where Psi4 gives answers that are wrong. scf Involves general SCF: convergence algorithms, RHF/UHF/ROHF/CUHF...
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants