Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase foldability of various templates #1338

Merged
merged 16 commits into from
Aug 10, 2023

Conversation

jonwis
Copy link
Member

@jonwis jonwis commented Aug 7, 2023

SizeBench revealed heavily non-foldability of certain C++/WinRT templates contributing to increased Windows binary sizes. Some small tweaks dropped the size of test.exe (release, x64) by ~132kb. (5180416 -> 5030400, or 2.8% ... which isn't a super lot, but it's nonzero.

Fixes #1334 and related

Changes

Enable nuspec iteration - Bump the default nuspec version to 3.0.0.0 so build_nuget.cmd and iterating with a local nuget source is easier.

Add solution files - Niceness for people who live in Visual Studio.

Fold delegate bases - Forced the layout of the delegate types to be regular by moving the reference count into a base before the non-ABI types. This folds nearly all the AddRef/QueryInterface instances into one. Releases are still unique because they call up into dtors.

Fold "report" in invoke - The invoke helper is a try/catch{report} but the body of "report" is identical on each invoke specialization. Move it to a unique non-inlineable method.

Fold event::add - Make the delegate first, then all the "expand the collection" instances are identical.

Further Work

SizeBench reports the following additional foldability wastage, quoted from the same test.exe:

  • make_agile_delegate<> - 3.2kb; the agile thunk is unique basically per delegate ::Invoke vtable and offset adjusters.
  • root_implements::<T1,T2,T3>::query_interface - 4.1kb; again unique per vtable instance, also related to inlining choices
  • winrt::event::add - 3.2kb; all cost related to inlined use of make_agile_delegate
  • winrt::impl::runtime_class_name<T1,T2>::get() - 1.4kb; should probably switch to std::wstring_view but this may be a breaking change; currently induces an hstring construction.

Testing

Tested by build_and_test.cmd - all built, all passed

dmachaj
dmachaj previously approved these changes Aug 7, 2023
Copy link
Contributor

@dmachaj dmachaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

strings/base_delegate.h Outdated Show resolved Hide resolved
strings/base_delegate.h Outdated Show resolved Hide resolved
@jonwis
Copy link
Member Author

jonwis commented Aug 8, 2023

For posterity, switching name_of<> and get_name<> to use std::wstring_view was basically a net wash. No test code compilation errors that would signal a breaking change, overall binary increased by about 512 bytes.

One area of interest is in GetRuntimeClassName. The inliner is super aggressive about splatting winrt::hstring(anything) inline (and ~hstring) rather than hoisting them out. In PGO-optimized binaries this ctor is normally de-inlined since it's replicated countless times throughout the code. Making a custom winrt::hstring get_runtime_class_name(std::wstring_view) would likely not really help, as PGO should automatically produce that that when used.

@jonwis
Copy link
Member Author

jonwis commented Aug 8, 2023

It looks like I can get another 8kb by regularizing the tail of query_interface. Running total after that is ~140kb.

It also looks like the cluster around make_agile_delegate could be connected to a simplification on how delegates are generated. Delegate ctors have a lot of overloads - lambda, function, raw-pointer member, com_ptr member, weak_ref member, shared_ptr member, weak_ptr member. Under the covers they all route through the lambda form, doing various forms of "lock" and "call method." I wonder if there's a simplification of those to a two-form version. Unless we were willing to try some stronger type-erasure through a std::function-alike, I'm not sure this can be reduced further. It's probably another 3.2kb on this test binary, so under 0.5%.

@jonwis
Copy link
Member Author

jonwis commented Aug 8, 2023

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

strings/base_delegate.h Outdated Show resolved Hide resolved
strings/base_delegate.h Outdated Show resolved Hide resolved
strings/base_delegate.h Outdated Show resolved Hide resolved
strings/base_events.h Outdated Show resolved Hide resolved
@kennykerr
Copy link
Collaborator

Overall looks like goodness. Just a few comments.

@kennykerr
Copy link
Collaborator

One area of interest is in GetRuntimeClassName. The inliner is super aggressive about splatting winrt::hstring(anything) inline (and ~hstring) rather than hoisting them out. In PGO-optimized binaries this ctor is normally de-inlined since it's replicated countless times throughout the code. Making a custom winrt::hstring get_runtime_class_name(std::wstring_view) would likely not really help, as PGO should automatically produce that that when used.

For windows-rs I managed to write a "constexpr" version of hstring so that it can be declared as a compile-time constant. This is mainly used for literals, but it also turns these GetRuntimeClassName into a cheap pointer copy. There are some limitations, so I'm not sure its super practical for cppwinrt to adopt though.

@sylveon
Copy link
Contributor

sylveon commented Aug 8, 2023

Making the constructor of param::hstring constexpr seems pretty trivial, and it would enable constexpr fast-pass strings.

@kennykerr
Copy link
Collaborator

Making the constructor of param::hstring constexpr seems pretty trivial, and it would enable constexpr fast-pass strings.

I was referring to return values, which has a lot more optimizing value.

Copy link
Collaborator

@kennykerr kennykerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jonwis jonwis merged commit 4196e08 into microsoft:master Aug 10, 2023
96 checks passed
@jonwis jonwis deleted the user/jonwis/template-fold branch August 10, 2023 06:40
DefaultRyan added a commit that referenced this pull request Jan 12, 2024
#1338 changed the NuGet version to a property, but didn't update the build.yml to do the same thing
DefaultRyan added a commit that referenced this pull request Jan 12, 2024
#1338 changed the NuGet version to a property, but didn't update the build.yml to do the same thing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: implements_delegate non-fold-ability
4 participants