Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some refactors to WikiEngine, attempting to switch to ReadOnlySpan<char> #1906

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

YoshiRulz
Copy link
Collaborator

@YoshiRulz YoshiRulz commented Jun 30, 2024

I'll clean up the diffs when I'm not so sleep-deprived. At this stage, I'd like to know whether you think this is a good idea at all, whether the final signatures are good, and if you'd like any of the added types/methods placed somewhere else. Oh, and if you have any ideas for removing one of the remaining ToString calls. I think this is finished now

I still hate the StyleCop settings.

@YoshiRulz YoshiRulz marked this pull request as ready for review July 3, 2024 22:59
@YoshiRulz YoshiRulz changed the title [WIP] Some refactors to WikiEngine, attempting to switch to ReadOnlySpan<char> Some refactors to WikiEngine, attempting to switch to ReadOnlySpan<char> Jul 3, 2024
@adelikat adelikat requested a review from nattthebear July 5, 2024 19:43
Element div = new(
_charRange,
"div",
attributes: [new("class", "module-error")],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates an extra object for the attributes collection compared to the old version of the code. This isn't entirely a new problem, as there were some other callsites that wasted objects before, but it's worse now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't it allocating the same number of collections before, just in the Element ctor? Ah I see, this is an array + ToDictionary. Should I go back to inserting after init, or should I add constructors which take a Dictionary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a Dictionary seems good so long as we can make sure we never accidentally share those dictionaries (the passed object should be owned by the element that created it)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, do you mean owned by the Element or owned by the call-site? Not that I know how to put either into code—mutation of Element.Attributes is relied on by a few call-sites, so it can't simply be changed to ImmutableDictionary for example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Owned by the element; I don't want to accidentally share a single Dictionary across multiple elements, as that would be too confusing for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are actually fewer mutations than I thought (must have confused it with Children), but even after encapsulating them, I can't think of a good way to enforce that invariant that's not just defensive copying.
I tried [CallerArgumentExpression] + checking for new Dictionary or .ToDictionary(, but while it may have worked, it also would have made every callsite verbose. Plus, half the callsites use the Attr method, so they would have to be new Dictionary(new[] { Attr(..., ...) }) which I believe allocates no less than new KVP<string, string>[] { Attr(..., ...) }.ToDictionary().
Unless you have any ideas, I'll go ahead with replacing IEnumerables with Dictionarys, and it will just have to rely on humans not misusing references.

: this((charStart, default), tag, attributes, children: children.AsEnumerable()) { }

public Element(StringIndices range, string tag)
: this(range, tag, [], children: (IEnumerable<INode>)[]) { }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the use of [] here automatically pool a single Enumerable.Empty or are we making new objects every time?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just uses Array.Empty.

@adelikat adelikat requested a review from nattthebear July 7, 2024 23:22
@nattthebear
Copy link
Collaborator

I feel lukewarm on this PR as a whole. Avoiding needless copies is good for speedup, and that certainly includes ReadOnlySpan<char> usage. But it makes some complex code even more complex, and small copies don't have a large perf impact.

Do we have profiling evidence from production that wiki rendering is slow enough to be a problem? What money are we trying to save here anyway; I don't have any access to the cost breakdown of running production?

If wiki rendering is the problem, then we could likely save more speed, and with simpler code, by caching partially hydrated pages.

@Masterjun3
Copy link
Collaborator

So to me neither the PR title, nor the PR description explain what this does and what its purpose is.
By looking at the diff it seems to me it does 3 things at the same time?

  1. Combine the CharStart and CharEnd ints into a single Range variable.
  2. Replace strings with ReadOnlySpan. (Why?)
  3. Do something with Regex Shim. (What is Shim, what does this do?)

So basically I have the following questions:

  • What do these things do?
  • What is their purpose? (It doesn't seem to be code-simplification or code-reduction, which is usually what I know from refactors.)
  • Do they have to be done all at the same time in the same PR?

@YoshiRulz
Copy link
Collaborator Author

I'll try to explain further, though as natt noted, the key change of ReadOnlySpan<char> is inherently beneficial but of questionable impact without performance data.

Merging the start and end indices into a tuple is just a simplification. It also highlights the call-sites where CharEnd wasn't being set, as they now use separate constructor overloads. (I originally looked to use the BCL Range for this, but that would have required extra checks on init and extra processing to keep compatibility with the Char{Start,End} props.)

You didn't ask about the Element constructor refactor, but that was to reduce mutation, and also setup for the subsequent commits to be smaller.

The RegexMatchShim struct, as the name suggests, is a lightweight wrapper over the BCL's ReadOnlySpan<char> RegEx API to make it resemble the string one. Again this was to keep the diff small.

You also didn't ask about the first commit. That contains all the changes I would have had to do in the last commit, but doing them first reduced churn. Some of that I think would be a good idea outside of this PR, namely passing an IEnumerable<string> to MakeImage, using the correct RegEx method, a dedup in MakeLinkOrImage, and lifting the ternaries outside string.Concat.

So yes, with those few exceptions it does all have to be done in 1 PR, since the later commits depend on the previous, and merging only some without actually switching to ReadOnlySpan<char> somewhat defeats the purpose.

public int CharEnd { get; set; }
public Module(int charStart, int charEnd, string text)

private StringIndices _charRange = (default, default);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field init was necessary at one point, but in the end only the tuple ctor is left, so it can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants