Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22582 Avoid synchronizing in RuleBasedBreakIterator and ULocale unless strictly necessary #2775

Merged
merged 1 commit into from
Feb 21, 2024

Conversation

Diego1149
Copy link
Contributor

https://unicode-org.atlassian.net/browse/ICU-22582

Optimize synchronized code which is causing contention which leads to latency degradation.

RuleBasedBreakIterator: Remove fBreakEngines and instead read from the static gAllBreakEngines, this of course has issues, which we fix by using a ConcurrentLinkedQueue and read as normal, but write within a synchronized context, we will only iterate over the LanguageBreakEngines we haven't iterated over and then add a new one if needed.

ULocale: If the Locale hasn't changed then use the cached value, else go through the code path that modifies it. (Similar solution to what Locale is doing), now both defaultLocale and defaultULocale are volatile to prevent CPU caching from leading to wrong results.

Tests: I performed some manual tests and the code still behaves the same it's just that now in an environment with 1 CPU with ~6 threads it finishes 2x as fast.

@CLAassistant
Copy link

CLAassistant commented Jan 6, 2024

CLA assistant check
All committers have signed the CLA.

Diego1149 added a commit to Diego1149/icu that referenced this pull request Jan 6, 2024
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@richgillam
Copy link
Contributor

Can you please sign the CLA so we can look at this?

@Diego1149
Copy link
Contributor Author

Hi Rich, thanks for the reminder, this slipped my mind, I've signed the CLA already, so it should be possible for you to look at the contents of this PR.

@markusicu markusicu self-assigned this Jan 18, 2024
richgillam
richgillam previously approved these changes Jan 18, 2024
Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uncomfortable giving a ship-it here because I'm not longer familiar enough with either Java or the internals of ULocale to be sure I know what's going on. That said, all of this looks pretty reasonable to me. But it'd be nice to have a second pair of eyes on this (@markusicu ?).

@markusicu
Copy link
Member

@yumaoka and maybe @macchiati could you please also take a look?

@markusicu
Copy link
Member

PS: Consider separate pull requests for the changes in the two classes.

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@roubert roubert removed their request for review January 30, 2024 14:06
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@markusicu
Copy link
Member

I suggest you don't amend/squash right away, so that it's easier to track comments vs. changes. Ignore the one failing check about multiple commits -- we can squash later.

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm tnx
I will start the CI checks

@markusicu markusicu dismissed their stale review February 21, 2024 00:07

looks like good changes now; plan to approve when CI is happy modulo squashing

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@Diego1149
Copy link
Contributor Author

changes lgtm tnx I will start the CI checks

looks like good changes now; plan to approve when CI is happy modulo squashing

I wish had seen that cause I just squashed it :/

@Diego1149
Copy link
Contributor Author

Adding my results from my own tests, they still pass as expected:

Test code looks something like this (In Kotlin):

val executor = Executors.newFixedThreadPool(CONCURRENT_FLOWS)
val list: MutableList<Deferred<Long>> = mutableListOf()
for (i in 1..CONCURRENT_FLOWS) {
     list.add(async(executor.asCoroutineDispatcher()) { 
       for (i in 1..NUM_ITERATIONS) {
          // Assert that output of this is correct.
          someChromeOsFunction(INPUT_STRING + i)
        }
     })
}
list.awaitAll()

CONCURRENT_FLOWS=10000
NUM_ITERATIONS=10000
INPUT_STRING="some_input"

I can't append the someChromeOsFunction for obvious reasons, although the code looks something like this:

    BreakIterator breakIterator = BreakIterator.getCharacterInstance();
    breakIterator.setText(input);
    while (breakIterator.current() != BreakIterator.DONE) {
      doSomething(someVariables, breakIterator);
    }

With Optimization:
Took 23258 milliseconds.
Took 22634 milliseconds.
Took 23914 milliseconds.

Without Optimization:
Took 71486 milliseconds.
Took 70137 milliseconds.
Took 75051 milliseconds.

So it's ~3x as fast, although I used 10x the number of virtual threads compared with last time, still same amount of physical threads, so result should be pretty much the same, most of the comments in the cl shouldn't affect perf, but rather were around edge cases in concurrency, so I expect results to stay very similar.

@markusicu markusicu merged commit f3e50a7 into unicode-org:main Feb 21, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants