Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX512 Support #3776

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

theIDinside
Copy link
Contributor

@theIDinside theIDinside commented Jul 3, 2024

Fixes #2974

This PR introduces AVX512 support in replays.

CMake Changes:
Added the AVX512F_SUPPORT flag to CMake. Set by checking
if system supports AVX512F. If it doesn't it will not build (and therefore run) any of the AVX512 tests

Added the target descriptions. Assumed that PKRU support also implies AVX512 support as I could find no information that said otherwise.

Added gdb_avx512 test - sets 3 ZMM registers and reads them back in a gdb session of a replay to verify the contents.

We place mask registers K0-K7 after ZMM31H internally, because this simplifies the logic that parses the XSAVE area. The target description we provide, takes this into account (and also places them last) so they come after the ZMM registers in the 'g' packet. GDB doesn't care so long as we do what we say in the target desc

Changed test harness util.sh to also export TESTNAME variable as we may (this one test do at least) want to query from the python script what mode we are running in (64-bit/32-bit). See gdb_avx512.py for further info.

Not tested or developed for:

  • Being able to write to / alter AVX512 registers during replay/diversion.

CMake Changes:
Added the AVX512F_SUPPORT flag to CMake. Set by checking
if system supports AVX512F. If it doesn't it will not build (and therefore run) any of the AVX512 tests

Added the target descriptions. Assumed that PKRU support also implies AVX512 support as I could find no information that said otherwise.

Added gdb_avx512 test - sets 3 ZMM registers and reads them back in a gdb session of a replay to verify the contents.

We place mask registers K0-K7 after ZMM31H internally, because this simplifies the logic that parses the XSAVE area. The target description we provide, takes this into account (and also places them last) so they come after the ZMM registers in the 'g' packet. GDB doesn't care so long as we do what we say in the target desc

Changed test harness util.sh to also export TESTNAME variable as we may (I do at least) want to query from the python script what mode we are running in (64-bit/32-bit). See gdb_avx512.py for further info.
AVX512F support always returned true due to using -mavx512f as flag instead of -march=native to `try_compile`
@khuey
Copy link
Collaborator

khuey commented Jul 7, 2024

I'll look at this in detail this week.

The assumption that AVX-512 implies PKRU is not necessarily correct because these two features can be controlled independently by the kernel. So it's possible to have a system with AVX-512 available but PKRU unavailable even if the silicon supports both. I do wonder if we could start pretending that certain things are available when they're not (e.g. for reads returning default register values and for writes returning errors) to avoid a combinatorial explosion in the number of target description XML files we need to ship. We're already up to 20 files. Any thoughts @rocallahan?

@theIDinside
Copy link
Contributor Author

theIDinside commented Jul 8, 2024

I'll look at this in detail this week.

The assumption that AVX-512 implies PKRU is not necessarily correct because these two features can be controlled independently by the kernel. So it's possible to have a system with AVX-512 available but PKRU unavailable even if the silicon supports both. I do wonder if we could start pretending that certain things are available when they're not (e.g. for reads returning default register values and for writes returning errors) to avoid a combinatorial explosion in the number of target description XML files we need to ship. We're already up to 20 files. Any thoughts @rocallahan?

GDB has a way to signal "could not read", which means, we could potentially do something like this:

Have a "one description to rule them all" (ish, or just remove the edge case ones like PKRU) kind of thing, and then just have RR splat 'x' (representing "could not read"-value in the g-packet) in all the right places. Sure, it will mean that when reporting registers an architecture doesn't even have, we are doing the extra work of filling (N*regWidth + ...) bytes of 'x' into the g-packet, so that has to be taken into consideration.

At least this could be done for the PKRU (and potentially similar CPU features) feature, as it doesn't involve all that much unnecessary data.

(the g-packet is the packet a debugger client requests from a gdb server implementation. So PKRU being an 8 byte register, on an architecture that doesn't have it or it isn't enabled would have xxxxxxxxxxxxxxxx filled in as a response)

@theIDinside
Copy link
Contributor Author

theIDinside commented Jul 8, 2024

I guess we can assume the presence of AVX & AVX2 too (in the target descriptions), since these are very common features these days anyhow.

@rocallahan
Copy link
Collaborator

rocallahan commented Jul 8, 2024

We're already up to 20 files.

We could dynamically assemble the top-level XML file. I.e. given the base CPU type (aarch64, i386, amd64) and some set of optional features (a subset of avx, pkeys, soon avx512), we report a register description file name of <cpu>-<feature1>-...<featureN>-linux.xml but don't actually ship all those files with rr. Instead when the client requests a file of that form, we can send dynamically generated contents with the xi:includes corresponding to the feature tags in the file name.

@rocallahan rocallahan closed this Jul 8, 2024
@rocallahan rocallahan reopened this Jul 8, 2024
@theIDinside
Copy link
Contributor Author

theIDinside commented Jul 9, 2024

We're already up to 20 files.

We could dynamically assemble the top-level XML file. I.e. given the base CPU type (aarch64, i386, amd64) and some set of optional features (a subset of avx, pkeys, soon avx512), we report a register description file name of <cpu>-<feature1>-...<featureN>-linux.xml but don't actually ship all those files with rr. Instead when the client requests a file of that form, we can send dynamically generated contents with the xi:includes corresponding to the feature tags in the file name.

Great idea. I think GDB does something similar under the hood. I can add this feature to this PR, or in a later PR, if we're all in agreement. What say you @khuey || @rocallahan ?

The feature would basically involve:

  • Do the feature checking we already do
  • Change target_description_name to target_feature_set to instead return a Set<CPUFeature /* the GdbServerConnection::enum */> as these are what determines what features are represented in the final target.xml
  • Using the returned set, hand-serialize a target.xml with all the <xi:include href="feature.xml">
  • send over the wire

@rocallahan
Copy link
Collaborator

I think GDB does something similar under the hood. I can add this feature to this PR, or in a later PR, if we're all in agreement.

I think we should do it before this PR.

@theIDinside
Copy link
Contributor Author

I think GDB does something similar under the hood. I can add this feature to this PR, or in a later PR, if we're all in agreement.

I think we should do it before this PR.

Ok, can I wrap it up in the same PR or create a separate PR for it?

@rocallahan
Copy link
Collaborator

Separate PR I think, thanks

theIDinside added a commit to theIDinside/rr that referenced this pull request Jul 11, 2024
TargetDescription dynamically generates the "master" target description
contents (target.xml), so that we only have to manage the individual
concrete CPU features (like foo-sse.xml, for instance). TargetDescription
will generate the xml contents that includes the used features.

Removed combinatorial target.xml files

These are now handled by `TargetDescription` instead.

This unblocks the rr-debugger#3776 PR (AVX512 support) from getting pulled in,
though that PR will need to take this PR into account.
theIDinside added a commit to theIDinside/rr that referenced this pull request Jul 11, 2024
TargetDescription dynamically generates the "master" target description
contents (target.xml), so that we only have to manage the individual
concrete CPU features (like foo-sse.xml, for instance). TargetDescription
will generate the xml contents that includes the used features.

Removed combinatorial target.xml files

These are now handled by `TargetDescription` instead.

This unblocks the rr-debugger#3776 PR (AVX512 support) from getting pulled in,
though that PR will need to take this PR into account.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gdb integration doesn't support AVX-512
3 participants