Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Converter Functionality #139

Closed
PaulWAyers opened this issue Apr 22, 2020 · 14 comments
Closed

File Converter Functionality #139

PaulWAyers opened this issue Apr 22, 2020 · 14 comments
Milestone

Comments

@PaulWAyers
Copy link
Member

Recently some of us have been talking about supporting file conversion, i.e., the ability to write different file types, so that one could load a file into iodata and then print it back out in a different format. Obviously not all functionality is supported by all file types.

I think we already support writing Molden files. It would be nice to support the JSON schema (already an issue).

Some key utilities for output are:

  1. *.fchk file specification
    https://wild.life.nctu.edu.tw/~jsyu/compchem/g09/g09ur/f_formchk.htm

  2. *.wfx format specification
    https://aim.tkgristmill.com/wfxformat.html

  3. the following text is a routine from AIMPAC which can be used to read/write *.wfn files. It is the file specification for that type. You can also look at my old *.f90 code for some format specifications. See the following link.
    https://www.dropbox.com/sh/5thdl9pb3rx63wk/AADHqaGHwdz9qmTlMoPcCocha?dl=0
    I can help people wade through the (old style Fortran) formatted read statements.

SUBROUTINE RDPSI
C
IMPLICIT DOUBLE PRECISION (A-H,O-Z)
C
CHARACTER80 WFNTTL,JOBTTL,LINE
CHARACTER
8 ATNAM
PARAMETER (MCENT=50, MMO=300, MPRIMS0=800, NTYPE=20)
COMMON /ATOMS/ XC(MCENT),YC(MCENT),ZC(MCENT),CHARG(MCENT),NCENT
COMMON /ORBTL/ EORB(MMO),PO(MMO),ROOTPO(MMO),NMO
COMMON /STRING/ WFNTTL,JOBTTL,ATNAM(MCENT),NAT
COMMON /UNITS/ ISRF,INPT,IOUT,IWFN,IDBG
COMMON /VALUES/ THRESH1,THRESH2,GAMMA,TOTE
COMMON /PRIMS0/ COOMX(MMO),SUM(MPRIMS0),DIV(MPRIMS0),
+COO(MPRIMS0,MMO),EXX(MPRIMS0),ICT(MPRIMS0),ITP(NTYPE),NPRIMS,
+ITYPO(MPRIMS0)
DIMENSION ITYPE(MPRIMS0),CO(MPRIMS0,MMO),ICENT(MPRIMS0),
$EX(MPRIMS0)
Save Zero,One,Two
Data Zero/0.0d0/,One/1.0d0/,Two/2.0d0/
C
C THE ITYPE ARRAY REPRESENTS THE FOLLOWING GAUSSIAN ORBITAL TYPES:
C S
C PX, PY, PZ
C DXX, DYY, DZZ, DXY, DXZ, DYZ
C FXXX, FYYY, FZZZ, FXXY, FXXZ, FYYZ, FXYY, FXZZ, FYZZ, FXYZ
C
READ(IWFN,101) WFNTTL
C
READ(IWFN,102) MODE,NMO,NPRIMS,NCENT
C
IF (NMO .GT. MMO) THEN
WRITE(IOUT,3300) NMO
STOP 'Too many molecular orbitals'
ENDIF
IF (NPRIMS .GT. MPRIMS0) THEN
WRITE(IOUT,3301) NPRIMS
STOP 'Too many total primitives'
ENDIF
IF (NCENT .GT. MCENT) THEN
WRITE(IOUT,3302) NCENT
STOP 'Too many atoms'
ENDIF
C
DO 100 I = 1,NCENT
READ(IWFN,103) ATNAM(I),J,XC(J),YC(J),ZC(J),CHARG(J)
100 CONTINUE
C
READ(IWFN,104) (ICENT(I),I=1,NPRIMS)
READ(IWFN,104) (ITYPE(I),I=1,NPRIMS)
READ(IWFN,105) (EX(I),I=1,NPRIMS)
C
DO 110 I = 1,NMO
READ(IWFN,106) PO(I),EORB(I)
READ(IWFN,107) (CO(J,I),J=1,NPRIMS)
110 CONTINUE
C
READ(IWFN,101) LINE
C
READ(IWFN,109) TOTE,GAMMA
C
K=0
DO 145 J=1,NPRIMS
ICNT=0
DO 146 L=1,NMO
IF(CO(J,L).EQ.ZERO)ICNT=ICNT+1
146 CONTINUE
IF(ICNT.NE.NMO)THEN
K=K+1
ITYPE(K)=ITYPE(J)
ICENT(K)=ICENT(J)
EX(K)=EX(J)
DO 147 LL=1,NMO
CO(K,LL)=CO(J,LL)
147 CONTINUE
ENDIF
145 CONTINUE
NPRIMS=K
C
N=0
DO 160 K=1,NTYPE
DO 170 J=1,NPRIMS
IF(ITYPE(J).EQ.K) THEN
N=N+1
EXX(N)=EX(J)
ICT(N)=ICENT(J)
SUM(N)=-Two*EX(J)
DIV(N)=One/SUM(N)
DO 180 L=1,NMO
COO(N,L)=CO(J,L)
ITYPO(N)=K
180 CONTINUE
ENDIF
170 CONTINUE
ITP(K)=N
160 CONTINUE
C
Do 148 L=1,NMO
temp=zero
temp1=dabs(po(l))
rootpo(l)=dsqrt(temp1)
Do 149 J=1,Nprims
if(dabs(coo(j,l)).gt.temp)temp=dabs(coo(j,l))
149 continue
coomx(l)=temp
148 continue
C
101 FORMAT (1A80)
102 FORMAT (4X,A4,12X,3(I3,17X))
103 FORMAT (1A8,11X,1I3,2X,3F12.8,10X,F5.1)
104 FORMAT (20X,20I3)
105 FORMAT (10X,5E14.7)
106 FORMAT (35X,F12.8,15X,F12.8)
107 FORMAT (5E16.8)
109 FORMAT (17X,F20.12,18X,F13.8)
3300 FORMAT(1I3,' molecular orbitals Redimension MMO')
3301 FORMAT(1I3,' total primitives Redimension MPRIMS0')
3302 FORMAT(1I3,' atoms in molecule Redimension MCENT')
RETURN
END

@tovrstra
Copy link
Member

A conversion script as such is present, see https://github.com/theochem/iodata/blob/master/iodata/__main__.py. (After install, it is available as a command-line tool iodata-convert.) It is just a matter of adding dump routines for some file formats and then this script will automatically pick them up.

@tovrstra
Copy link
Member

Well, "just" is maybe an understatement. It takes quite a bit of work to get these dump functions implemented and tested thoroughly.

@FarnazH
Copy link
Member

FarnazH commented Apr 24, 2020

I would be happy to help with this. The formats to consider for dump functionality (at least a good start): fchk, wfn, wfx, mkl, JSON. Here is a sample implementation of dump for molden format: https://github.com/theochem/iodata/blob/master/iodata/formats/molden.py#L574

@evohringer
Copy link
Contributor

I think the "easy" file formats as xyz, pdb, etc have already a dump function. But I don´t think that is what you are after, right?

@PaulWAyers
Copy link
Member Author

Michael found a useful link for outputing files (including *.wfn and *.wfx)
https://github.com/zorkzou/Molden2AIM/blob/master/src/molden2aim.f90

@PaulWAyers
Copy link
Member Author

There doesn't seem to be a clean and well-documented reference for the *.wfn format, so we should write s file specification for it.

@PaulWAyers
Copy link
Member Author

There is some recent work by Gaussian to enhance interfacing. May be interesting for *.fchk or other reasons.

https://gaussian.com/interfacing/

@PaulWAyers
Copy link
Member Author

PaulWAyers commented Jun 27, 2020

The MultiWFN format specification is also interesting. Not sure if we should support it....

https://chemrxiv.org/articles/Mwfn_A_Strict_Concise_and_Extensible_Format_for_Electronic_Wavefunction_Storage_and_Exchange/11872524

This should be easy to support as MultiWFN is free and open source, and there are *.f90 files (supposedly) for this conversion (according to the paper). However, after downloading three versions of Multiwfn, it's clear that the current online source code is not up to date. If someone is interested I can e-mail Tian Lu and ask for the code and example files. As it's a Fortran -> Python conversion (not a painful one; free format; sort of *.wfx like) it is not too hard. On the other hand, most (all?) of the objectives stated in the paper above are subsumed by JSON....

Edit: Just now (5 hours after I wrote this), the new MultiWFN came online. And there it is. I'll drag the relevant file here FYI. May be useful for various reasons.

(Github doesn't support *.f90 so I renamed as a *.txt file.)
fileIO.txt
[downloaded from https://sobereva.com/multiwfn/ ]
Hat-tip (obviously) Tian Lu at MultiWFN

@FarnazH
Copy link
Member

FarnazH commented Jul 3, 2020

@rayhe88 please be sure to support dump_many for fchk format when are done adding dump_one. Thanks.

@PaulWAyers
Copy link
Member Author

We should probably support the Matrix Element File from Gaussian 16. This seems to be a replacement (to some extent) for the old formatted checkpoint file.
https://gaussian.com/interfacing/

@PaulWAyers
Copy link
Member Author

A few useful IOp that will ensure that interesting data is in fchk files.
IOp(9/11=1) [store CC amplitudes]. Probalby also MP amplitudes and CISD coefficients, but ambiguous (= bad) documentation.

Unfortunately I could not find an option to print out MOs and DMs all along a trajectory/IRC.

@tovrstra
Copy link
Member

@PaulWAyers It seems the matrix element file is binary ("unformatted") and architecture dependent. That said, it contains a lot of potentially interesting data which is not present in an FCHK file. Example code to read these files can be found here: gauopen_v2.zip. It seems to be a Fortran routine compiled into a Python extension. A pure python interface would be nice but may be difficult because the binary format depends on architecture (and maybe also on the fortran compiler?)

@PaulWAyers
Copy link
Member Author

I doubt the binary format is compiler-dependent but I expect it is architecture dependent. I'm sure there is something online about reading binary files with Python; I'd be surprised if it weren't possible. But thanks for the analysis @tovrstra; we should definitely knock this down the priority list and probably at this stage should be focusing on finishing up the things we already have. I think the GAMESS file in either tamkin or molmod was the last one I'd identified as reasonably high priority (partly because it was easy). However, if someone wants to take a crack at the Gaussian matrix element file, I'm all for it. Given that the reference implementation is Fortran, probably it would be @msricher or @BradenDKelly

@tovrstra
Copy link
Member

I'll make a separate issue for this format, just not to overload this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants