Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interfacing extended XYZ readers and writers for AtomsBase #40

Closed
jameskermode opened this issue Mar 24, 2022 · 12 comments
Closed

Interfacing extended XYZ readers and writers for AtomsBase #40

jameskermode opened this issue Mar 24, 2022 · 12 comments

Comments

@jameskermode
Copy link

Hello! I'd like to contribute extended XYZ readers and writers to AtomsBase. I have an existing Julia package, ExtXYZ.jl, which calls a C library containing parser that is auto-generated from a formal grammar for the extended XYZ format that we've been working on recently, as well as a C writer.

There are two things I'd like to discuss before starting work ob this:

  • Package Dependencies. ExtXYZ.jl is already interfaced to @cortner's JuLIP.jl. There we made JuLIP depend on ExtXYZ, but since it seems AtomsBase is designed to contain the basic interface only I'm wondering if it would be better for me to make my ExtXYZ package depend on AtomsBase instead? Or perhaps there should be a new package which uses both of them?

  • Arbitrary Data. Extended XYZ format allows for arbitrary per-atom and per-config data. This is useful e.g. to store (perhaps multiple sets of) energies and forces associated with a structure when training machine learning potentials, or for visualisation of per-atom quantities with OVITO which can also read extxyz format. In ASE these are stored in the Atoms.arrays and Atoms.info dictionaries, respectively. In JuLIP we did something similar, with a single data dictionary. I see there is a data dict in the Atom type, but from what I can tell storing additional data associated with each structure is not included in the abstract AbstractSystem API. Is this something you've discussed already and ruled out? If so should I add interface routines which convert to/from (AtomsBase.AbstractSystem, Dict) pairs and the dictionaries that ExtXYZ works with? Or would it be better to define my own concrete subtype of AbstractSystem which includes e.g. per_atom_properties::Dict{Symbol,Any} and per_structure_properties::Dict{Symbol,Any} dictionaries, or similar?

@rkurchin
Copy link
Collaborator

Thanks for your interest in supporting AtomsBase, this is awesome!

To respond to your first point: our general vision is that other packages would depend on AtomsBase. The goal is for this package to stay very lightweight and provide a minimal set of functions that allow easy "translation" between different structure representations.

To your second point, there are two general "paradigms" by which you can introduce this dependency: by either using one of our two built-in system types (for an example of this very analogous to what you're doing, check out AtomIO.jl), or by defining a new system type and dispatching the necessary interface functions on it. Given your desire to support these extra data fields, I think this second approach (as you allude to) makes the most sense in the immediate term.

THAT BEING SAID, we have definitely discussed the possibility of something like this at the level of the interface, but hadn't (yet) converged upon a solution we felt was generic enough. So we would definitely be open to discussing introducing new interface functions to access these more "arbitrary" data fields in the future, but would want to get input from other folks to make sure that the way that we do it is sufficiently general.

@mfherbst and/or @jgreener64, feel free to chime in with further opinions! I suspect @cortner may have thoughts as well. :)

@cortner
Copy link
Member

cortner commented Mar 24, 2022

@jameskermode maybe we can play with a FlexAtom struct building on the ACE.State implementation. This would allow for fully flexible fields, we could even distinguish static and dynamic fields, but I'm a bit worried about dynamic fields re performance. This struct could then implement the AtomsBase interface. And for us it would be the first step towards retiring JuLIP and starting to incorporate the JuLIP functionality into other packages.

If you haven't seen the ACE.State implementation yet, here it is

https://github.com/ACEsuit/ACE.jl/blob/main/src/states.jl

There are a few minor changes I want to make, but overall it has proven remarkably flexible and the performance is more than good enough for ACE purposes at least.

@mfherbst
Copy link
Member

mfherbst commented Mar 24, 2022

@jameskermode many thanks for reaching out!

What you are describing is more or less what I had in mind with AtomIO.jl, namely to collect (under a common interface, which integrates with Julia's FileIO etc) parser packages for atomistic structures and expose the parsed data as an AtomsBase-compatible object. I have not really gotten started with it due to a lack of time, but if you are interested I'm happy to bounce ideas and hear your thoughts about it.
Also: In either case this does of course not prevent you to directly support the AtomsBase interface in ExtXYZ. In fact it might even make it easier at the AtomIO level.

About the arbitrary data. Yes, this was indeed the idea of the Atom data structure and the therein contained dictionary. We did not want to force this upon the generic interface just yet (to keep it lightweight), but in principle users are free to choose "their" atom type and Atom is offered as an option to attach arbitrary data.
I think having something similar for the AbstractSystem could indeed be useful, too. I am thinking about total charge or multiplicity, which for molecular systems is usually stored at the level of the system and not at the level of the atom as we enforce it right now. As far as I remember we have not really discussed this conclusively and I think it is worth to pick this up.

@rkurchin
Copy link
Collaborator

This probably veers towards "broader discussion" and might belong in another issue, but another related package that would definitely need support for "arbitrary data" type of fields (see e.g. this list) is Cclib.jl.

@mfherbst
Copy link
Member

I'm starting to collect these ideas at https://github.com/mfherbst/AtomIO.jl/issues. Feel free to add more packages if you know them.

@jameskermode
Copy link
Author

Thanks all for the encouraging responses. I’ll give @cortner’s suggestion of defining a flexible Atoms struct similar to ACE.State but supporting the AtomsBase interface a go within ExtXYZ.jl in the first instance and report back here.

I came up with the arbitrary per-atom and per-config data paradigm that the extended XYZ format represents in about 2005 and it’s served us remarkably well since then, so I’d be an advocate of finding a way to fit this into the AbstractSystem if it can be done without compromising efficiency.

In ExtXYZ, per-atom quantities can be scalar or vector, while per-config quantities can be scalar, vector or matrix. This allows many quantities including total charge, forces and energies, dipole moment, atomic stresses to be represented. It would potentially be nice to be even more general than this and allow e.g. per-atom matrix or tensor quantities as well.

@cortner
Copy link
Member

cortner commented Mar 24, 2022

@jameskermode - do you mean you want to just try locally whether my approach is sufficiently flexible for XYZ? Otherwise we could collaborate on a separate package that implements the Atom and System interfaces.

@jameskermode
Copy link
Author

There's an experimental implementation of an AtomsBase interface for ExtXYZ.jl here:

https://github.com/libAtoms/ExtXYZ.jl/blob/AtomsBase/src/atoms.jl

It includes a "fast and flexible" Atoms structure based on @cortner's suggestion from ACE.State, which is parameterised by two named tuples, one for the per-atom data and the other for the per-system data. I've hopefully implemented the AtomsBase interface on the new struct, and conversion routines to/from the internal dictionary data structure used to read/write extended XYZ files. I now provide ExtXYZ.load() and ExtXYZ.save() functions should be useable from AtomsIO in a way that is compatible with the FileIO package.

This idea may be relevant to the discussion in #43 as well. I'd welcome any feedback or suggestions and would be happy to contribute the Atoms struct (name is up for discussion too, I'm just use to ASE/JuLIP nomenclature) via a pull request here instead if it would be of wider use. If it looks generally OK I can make a new release of ExtXYZ to make it easier for others to try out.

@cortner
Copy link
Member

cortner commented Apr 1, 2022

James - why not publish this as a separate package independent of XYZ? We could start making ACE compatible with that, and then try it out with Molly etc? If all this goes well then we can start thinking about which parts of JuLIP to get rid of.

@jameskermode
Copy link
Author

Happy to do that. Any good name suggestions?

@cortner
Copy link
Member

cortner commented Sep 2, 2024

Can this be closed?

@jameskermode
Copy link
Author

Yes, ok for me to close

@cortner cortner closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants