Improve relevance scoring for titles and object-name matches in search results #12441

jayaddison · 2024-06-19T10:01:30Z

Feature or Bugfix

Bugfix

Purpose

Resolve some sub-optimal search result ranking/scoring reported by [search] issues with the new HTML search algorithm #12391.

Detail

Add an example project to the JavaScript tests that exhibits the sub-optimal search result behaviour.
Add JavaScript test coverage to assert on the expected, improved relative ordering of query results.
Implement changes to the indexing/query algorithms to improve the query results without regressing other JavaScript search tests.
- Merged to fbb62cf from Boost title-matching scores if main title #12393 (thanks @wlach!) and then added some suggested refactoring to that.

Relates

Resolves [search] issues with the new HTML search algorithm #12391.

Related to sphinx-doc#12391. Intended as a proof of concept, not a full solution.

…an build.

jayaddison · 2024-06-19T21:20:47Z

Commit cb0f6e7 -- regenerating a search index file from scratch -- seems to be have been necessary because I had a stale _build directory on my local machine for the relevant input project for the fixture.

That seems like a bug; re-using an existing _build directory should be valid behaviour and should produce the same search index output as a fresh build. I'll file that as a separate issue within the next few days.

…ment

…earch-scoring-adjustment Conflicts: tests/js/searchtools.js (no edit conflict; unit test failure)

* Minimize diff relative to mainline codebase to ease review/historic viewing. * Move updated main-title score variable into a ``Scorer`` constant.

…ment

jayaddison · 2024-06-24T17:44:14Z

I've a slight preference for #12047 to be merged before this, to make the code and diff history easier to follow, if+when either of them are considered ready.

wlach

Looks good to me! Some minor optional comments.

sphinx/themes/basic/static/searchtools.js

tests/js/searchtools.js

This reverts commit 7418a71.

This reverts commit 5eaea64.

Relates-to commit 6c3ffa2.

…ment

…ment Conflicts: CHANGES.rst tests/js/searchtools.js

…loop.

…g single test case.

…e ``expectedRanking`` array.

….py`` Relates-to merge commit 7f8a5f6.

…rgets. Relates-to merge commit 7f8a5f6 and test index regeneration commit e75891e.

jayaddison · 2024-07-08T13:48:10Z

This should be ready for further review / merge; I've no changes planned on this branch.

wlach

I have a couple of extra comments but in general this LGTM. I think this will be a nice incremental improvement. @picnixz may you could take a look too (and merge when you think it ready)?

wlach · 2024-07-09T01:16:45Z

sphinx/themes/basic/static/searchtools.js

+ let score = Math.round(Scorer.title * queryLower.length / title.length);
+ let boost = titles[file] === title ? 1 : 0; // add a boost for document titles


On second look, could these be declared as const?

Suggested change

let score = Math.round(Scorer.title * queryLower.length / title.length);

let boost = titles[file] === title ? 1 : 0; // add a boost for document titles

const score = Math.round(Scorer.title * queryLower.length / title.length);

const boost = titles[file] === title ? 1 : 0; // add a small boost for document titles

I thinkg it's better using a const as well. But on a second thought, I'm wondering whether a +1 is sufficient.

Previously a title and subsection title with the same text would have equal scores, leaving their relative ranking undefined.

Any positive value here should have the effect of elevating the main-document titles above same-named subsection titles in the search results.

A single-integer increment is used because ideally we don't want the main document titles to move up in the rankings 'too much' and overtake other matches. That is possible, though, especially given that some scores are fractional. So I have the opposite worry: that +1 might be too much.

(a good way to figure these out could be to develop counterexamples and add test cases for them)

tests/js/searchtools.js

picnixz

It'd be good if we have a more complete example where you have a lot of multiple matches of the same kind. Does it cover the issue with the asyncio module that we described?

CHANGES.rst

picnixz · 2024-07-09T06:53:29Z

sphinx/themes/basic/static/searchtools.js

+ let score = Math.round(Scorer.title * queryLower.length / title.length);
+ let boost = titles[file] === title ? 1 : 0; // add a boost for document titles


I thinkg it's better using a const as well. But on a second thought, I'm wondering whether a +1 is sufficient.

tests/js/roots/titles/relevance.py

tests/js/searchtools.js

jayaddison · 2024-07-09T10:20:34Z

It'd be good if we have a more complete example where you have a lot of multiple matches of the same kind. Does it cover the issue with the asyncio module that we described?

Yep, the thinking here was to replicate the asyncio relevance ordering problem using a minimal test case, and then to adjust the code to fix it; attempting to apply (and demonstrate) a Test-Driven-Development approach to search ranking fixups. I'll investigate expanding the test fixture data to add more results.

wlach · 2024-07-09T11:37:26Z

It'd be good if we have a more complete example where you have a lot of multiple matches of the same kind. Does it cover the issue with the asyncio module that we described?

This PR uses a similar approach to what was described/shown in #12393 (comment) (edit: original link was wrong) so it should.

However, it would be good to another test before landing to be sure. I tested there by checking out the cpython repository and regenerating the Doc/ directory using a virtualenv with my development version of Sphinx.

Co-authored-by: Will Lachance <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>

Co-authored-by: Will Lachance <[email protected]>

… search relevance tests.

tests/js/searchtools.js

…main) title.

…ment

AA-Turner · 2024-07-10T22:17:39Z

@jayaddison are you happy with this // ready to review & merge?

A

jayaddison · 2024-07-11T10:36:19Z

@AA-Turner yep, I think this is ready.

AA-Turner · 2024-07-11T10:55:50Z

Thanks all!

A

jayaddison · 2024-07-11T11:10:55Z

Thank you @AA-Turner!

wlach and others added 2 commits May 25, 2024 12:23

Boost title-matching scores if main title

fbb62cf

Related to sphinx-doc#12391. Intended as a proof of concept, not a full solution.

[search] Add fixture data for use with title relevance test cases.

96e2894

jayaddison added html search javascript Pull requests that update Javascript code labels Jun 19, 2024

jayaddison added 2 commits June 19, 2024 22:10

[search] tests: add test coverage for title-related relevance scoring.

a2a4b60

[search] regenerate JS fixture root titles searchindex from fresh/cle…

cb0f6e7

…an build.

jayaddison mentioned this pull request Jun 19, 2024

CI: GitHub Actions: Modification condition for JS test directories seems to be too precise. #12444

Closed

jayaddison and others added 3 commits June 23, 2024 17:47

Merge branch 'master' into issue-12391/subtitle-search-scoring-adjust…

75eaf81

…ment

Merge branch 'boost-scores-if-main-title' into issue-12391/subtitle-s…

0f0624e

…earch-scoring-adjustment Conflicts: tests/js/searchtools.js (no edit conflict; unit test failure)

[search] Refactor scoring logic adjustment:

5eaea64

* Minimize diff relative to mainline codebase to ease review/historic viewing. * Move updated main-title score variable into a ``Scorer`` constant.

jayaddison marked this pull request as ready for review June 24, 2024 10:32

jayaddison mentioned this pull request Jun 24, 2024

[search] issues with the new HTML search algorithm #12391

Closed

jayaddison added 2 commits June 24, 2024 12:04

Add CHANGES.rst entry.

afb1685

Fixup: add missing credit to CHANGES.rst

5a5e271

jayaddison mentioned this pull request Jun 24, 2024

Boost title-matching scores if main title #12393

Closed

Merge branch 'master' into issue-12391/subtitle-search-scoring-adjust…

5c106c8

…ment

jayaddison requested a review from wlach June 24, 2024 17:44

Fixup: use intended operator precedence.

7418a71

wlach approved these changes Jun 25, 2024

View reviewed changes

sphinx/themes/basic/static/searchtools.js Outdated Show resolved Hide resolved

sphinx/themes/basic/static/searchtools.js Outdated Show resolved Hide resolved

tests/js/searchtools.js Outdated Show resolved Hide resolved

jayaddison and others added 9 commits June 25, 2024 11:04

Revert "Fixup: use intended operator precedence."

17367eb

This reverts commit 7418a71.

Revert "[search] Refactor scoring logic adjustment:"

96526a9

This reverts commit 5eaea64.

[search] Refactor scoring logic adjustments (second attempt/suggestion)

6c3ffa2

Fixup: adjust test search result score expectation.

fd36010

Relates-to commit 6c3ffa2.

Merge branch 'master' into issue-12391/subtitle-search-scoring-adjust…

c259b1c

…ment

Merge branch 'master' into issue-12391/subtitle-search-scoring-adjust…

7f8a5f6

…ment Conflicts: CHANGES.rst tests/js/searchtools.js

Tests: refactor checkRanking function to use JavaScript array-unpacking.

2f0cbe1

Tests: refactor checkRanking function to use JavaScript for...of …

bf576cd

…loop.

Tests: add early-exit path to checkRanking function.

d1a7197

jayaddison added 4 commits July 8, 2024 14:23

Tests: extract two distinct relevance-related test cases from existin…

5d5b079

…g single test case.

Tests: nitpick: reverse the results array instead of reversing th…

e9bdf2f

…e ``expectedRanking`` array.

Regenerate JS test search fixtures using ``utils/generate_js_fixtures…

e75891e

….py`` Relates-to merge commit 7f8a5f6.

Tests: update expectations since main titles now have empty anchor ta…

d5d8717

…rgets. Relates-to merge commit 7f8a5f6 and test index regeneration commit e75891e.

wlach reviewed Jul 9, 2024

View reviewed changes

picnixz reviewed Jul 9, 2024

View reviewed changes

jayaddison and others added 3 commits July 9, 2024 15:51

Code review: apply phrasing/typo-fixup suggestions.

388aef3

Co-authored-by: Will Lachance <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>

Value-safety: use const for scoring variables.

4d819cc

Co-authored-by: Will Lachance <[email protected]>

Tests: self-documentation: add comment to describe the purpose of the…

9d59eaf

… search relevance tests.

jayaddison commented Jul 9, 2024

View reviewed changes

tests/js/searchtools.js Show resolved Hide resolved

jayaddison and others added 3 commits July 9, 2024 17:21

Code review feedback: add class-level attribute with same name as a (…

7b772a2

…main) title.

Tests: split object-vs-title matching into two distinct rules.

4e07078

Merge branch 'master' into issue-12391/subtitle-search-scoring-adjust…

1b42e80

…ment

AA-Turner changed the title ~~[HTML search] Improve relevance scoring for titles and object-name matches.~~ Improve relevance scoring for titles and object-name matches in search results Jul 11, 2024

AA-Turner merged commit 91c5cd3 into sphinx-doc:master Jul 11, 2024
23 checks passed

jayaddison deleted the issue-12391/subtitle-search-scoring-adjustment branch July 11, 2024 11:10

AA-Turner added this to the 7.4.0 milestone Jul 13, 2024

jayaddison mentioned this pull request Jul 17, 2024

[HTML search] Test suite: ranking check should ensure that all entries are found. #12607

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve relevance scoring for titles and object-name matches in search results #12441

Improve relevance scoring for titles and object-name matches in search results #12441

jayaddison commented Jun 19, 2024 •

edited

Loading

jayaddison commented Jun 19, 2024

jayaddison commented Jun 24, 2024

wlach left a comment

jayaddison commented Jul 8, 2024

wlach left a comment

wlach Jul 9, 2024

picnixz Jul 9, 2024

jayaddison Jul 9, 2024

jayaddison Jul 9, 2024

picnixz left a comment

picnixz Jul 9, 2024

jayaddison commented Jul 9, 2024

wlach commented Jul 9, 2024 •

edited

Loading

AA-Turner commented Jul 10, 2024

jayaddison commented Jul 11, 2024

AA-Turner commented Jul 11, 2024

jayaddison commented Jul 11, 2024

		let score = Math.round(Scorer.title * queryLower.length / title.length);
		let boost = titles[file] === title ? 1 : 0; // add a boost for document titles

Improve relevance scoring for titles and object-name matches in search results #12441

Improve relevance scoring for titles and object-name matches in search results #12441

Conversation

jayaddison commented Jun 19, 2024 • edited Loading

Feature or Bugfix

Purpose

Detail

Relates

jayaddison commented Jun 19, 2024

jayaddison commented Jun 24, 2024

wlach left a comment

Choose a reason for hiding this comment

jayaddison commented Jul 8, 2024

wlach left a comment

Choose a reason for hiding this comment

wlach Jul 9, 2024

Choose a reason for hiding this comment

picnixz Jul 9, 2024

Choose a reason for hiding this comment

jayaddison Jul 9, 2024

Choose a reason for hiding this comment

jayaddison Jul 9, 2024

Choose a reason for hiding this comment

picnixz left a comment

Choose a reason for hiding this comment

picnixz Jul 9, 2024

Choose a reason for hiding this comment

jayaddison commented Jul 9, 2024

wlach commented Jul 9, 2024 • edited Loading

AA-Turner commented Jul 10, 2024

jayaddison commented Jul 11, 2024

AA-Turner commented Jul 11, 2024

jayaddison commented Jul 11, 2024

jayaddison commented Jun 19, 2024 •

edited

Loading

wlach commented Jul 9, 2024 •

edited

Loading