Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning the features of HighFive v3. #864

Closed
1uc opened this issue Nov 24, 2023 · 11 comments
Closed

Planning the features of HighFive v3. #864

1uc opened this issue Nov 24, 2023 · 11 comments
Labels
v3 Anything that needs to be resolved before `v3`.

Comments

@1uc
Copy link
Collaborator

1uc commented Nov 24, 2023

This issue collects the features that we would like HighFive to have, but we can only do so by breaking API or build-systems. It is also an invitation for feedback from users of HighFive.

  1. Improve the CMake build-system. I see two paths:
  2. Restructure headers into forward declarations, declarations and definitions. This should allow us to avoid the cyclic structure. This might slightly change what each header implicitly provides.

Small decided changes:

Changes that need review/discussion:

@1uc 1uc added the v3 Anything that needs to be resolved before `v3`. label Nov 24, 2023
@1uc
Copy link
Collaborator Author

1uc commented Dec 13, 2023

Here's a sketch of how we'd need to reorganize the header for the first approach: #892

@tdegeus
Copy link
Collaborator

tdegeus commented Dec 18, 2023

  1. (CMake) I'm all for it! (I have positioned myself in the past, so I will not repeat my arguments/preference here).
  2. (Style) I would rather have as little as possible forward declarations, and just put the implementation with the declarations where possible (and use forward declarations only if absolutely necessary). I find this way more maintainer friendly. Instead, the code structure can easily be grasped from the documentation, such as the doxygen docs as available online.

A thing to discuss is if you want to include H5Easy implemenations for Eigen, Boost, xtensor, openCV deeper in the core or not.

@1uc
Copy link
Collaborator Author

1uc commented Dec 18, 2023

A thing to discuss is if you want to include H5Easy implemenations for Eigen, Boost, xtensor, openCV deeper in the core or not.

Yes, this is one of the outstanding chores. It starts with #871 so that we can systematically test all combinations and can't make the Eigen mistake again. After that we reduce duplication in the tests #868 because I fear we'll soon slip into unmaintainable. Then one could start extending the trait HighFive::details::inspector until it covers everything in Easy. I've not included this, because I doubt it'll break API. The inspector isn't part of the public API and in HighFive this trait is the only way we inject container specific code. Hence, we should be free to do whatever we like. I'm not very familiar with Easy, but I wasn't expecting it to expose details about how it does these types of things. Are you aware of anything? There was something about Eigen column and row vectors, either Easy makes the distinction or HighFive does, but I don't think both do. Even if that were the case I'm not sure unifying the behaviour is worth the trouble it causes our users.

So far we're not planning on anything too disruptive. Just rectifying error prone or quirky behaviour and fixing the build-system. Sometimes these can be fixed without actually breaking well-behaved code.

@1uc
Copy link
Collaborator Author

1uc commented Dec 18, 2023

Regarding Style. I much prefer splitting definition and declaration. However, the point isn't cosmetic. If one's unlucky and one needs to change which files includes which other files, then HighFive can be quite uncompromising. Since C++ does really allow fully cyclic code, it should always be possible to have the preprocessor figure out what to include in which order. However, the requirement for this work, is to split declaration from definition, like one does in regular C++ code (decl in .h and defn in .cpp). In some unlucky cases one may need forward declarations. However, I think that choice has been made for us. Either the code needs the forward declaration or it doesn't. Shuffling around code will probably not help us.

Additionally, splitting the two will enable users to write code such as:

// file: write_stuff.hpp
#include <highfive/dataset_decl.hpp>
void write_data(const HighFive::DataSet& dset, const Container& containter);

// file: write_stuff.cpp
void write_data(...) { ... }

If someone really wants to reduce compile times it might help. It would also allow users to decide that they want to explicitly instantiate some of our templates and link to a "pre-compiled" HighFive.

@tdegeus
Copy link
Collaborator

tdegeus commented Dec 19, 2023

A thing to discuss is if you want to include H5Easy implemenations for Eigen, Boost, xtensor, openCV deeper in the core or not.

Yes, this is one of the outstanding chores. It starts with #871 so that we can systematically test all combinations and can't make the Eigen mistake again. After that we reduce duplication in the tests #868 because I fear we'll soon slip into unmaintainable. Then one could start extending the trait HighFive::details::inspector until it covers everything in Easy. I've not included this, because I doubt it'll break API. The inspector isn't part of the public API and in HighFive this trait is the only way we inject container specific code. Hence, we should be free to do whatever we like. I'm not very familiar with Easy, but I wasn't expecting it to expose details about how it does these types of things. Are you aware of anything? There was something about Eigen column and row vectors, either Easy makes the distinction or HighFive does, but I don't think both do. Even if that were the case I'm not sure unifying the behaviour is worth the trouble it causes our users.

  • H5Easy does not expose the user to any details concerning the HighFive implementation. I would say that that is the selling point: a single line, very generic API (similar to what can be done with h5py). I would not mind going even close to that API in the future (for example file["/path/to/dataset"] = variable for automatic dataset generation with a templated function). However, at the time that I wrote H5Easy there was a desire from the HighFive maintainers to have only free-functions on top of HighFive.
  • H5Easy explicitly only reads/writes data in row-major as expected from HDF5. I.e. the bug in HighFive did not affect H5Easy.

So far we're not planning on anything too disruptive. Just rectifying error prone or quirky behaviour and fixing the build-system. Sometimes these can be fixed without actually breaking well-behaved code.

OK! However, making a major update is also a nice time to reflect on API changes if needed or desired.

@tdegeus
Copy link
Collaborator

tdegeus commented Dec 19, 2023

P.S. Concerning CMake I also link conda-forge/highfive-feedstock#41

@1uc
Copy link
Collaborator Author

1uc commented Dec 19, 2023

A prime example why we need a relatively unexciting version 3.0.0 that just fixes the way dependencies are handled, fixes the CMake and doesn't break too much else. Moreover, it should be feasible with limited resources. Not so much a new version, but rather a set of improvements that unfortunately can't be done without them being breaking, and as a consequence the major version number gets bumped.

@gouarin
Copy link

gouarin commented Dec 19, 2023

if I've read everything correctly, splitting definitions and declarations will mean that HighFive will no longer be a header-only library, right?

Why this choice?

Do you have a date for the finalization of the v3?

Let me know if I can help for CMake.

@1uc
Copy link
Collaborator Author

1uc commented Dec 19, 2023

if I've read everything correctly, splitting definitions and declarations will mean that HighFive will no longer be a header-only library, right?

HighFive will continue to be a header only library. In fact it'll make use of the fact that it's header-only more. Especially, in the CMake code.

The splitting happens into separate header files:

// file: highfive/H5File.hpp
#include "H5File_decl.hpp"
#include "H5File_impl.hpp"

and:

// file: highfive/H5File_decl.hpp
class File {
  DataSet getDataSet(...);
};

// file highfive/H5File_impl.hpp
DataSet File::getDataSet(...) { std::cout << "Yay!"; }

Then for extra points in pedantry, forward declarations (if added) need to be in a separate file (it's kinda uncommon though, so maybe we leave it off), and implementations of templates behave slightly differently from implementations of non-template functions. Hence, at it's most pedantic they'd get split into separate files too.

The only difference to what we do now is that we stop tail including:

// file: highfive/H5File.hpp
class File {
  DataSet getDataSet(...);
};

#include "highfive/H5File_defn.hpp"

The problem with this type of tail inclusion is that we can get only the declaration of File without it's implementation.

Let me know if I can help for CMake.

Thank you. We'll need testing in environments I'm unaware of and don't have easy access to via Github Actions. If you want to look at things way before they're ready for review see #897. In particular "Second:" in:
#897 (comment)
would benefit from user feedback.

No, HighFive doesn't have a schedule. We work on it whenever other projects take forever to compile.

@gouarin
Copy link

gouarin commented Dec 19, 2023

Thank you for these clarifications.
I will take a look at #897.

@1uc
Copy link
Collaborator Author

1uc commented May 17, 2024

We're done planning.

@1uc 1uc closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v3 Anything that needs to be resolved before `v3`.
Projects
None yet
Development

No branches or pull requests

3 participants