Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add to onboarding reproduction logs #2492

Merged
merged 6 commits into from
May 10, 2024
Merged

Conversation

KenWuqianghao
Copy link
Contributor

System setup:

OS: macOS Sonoma 14.4.1
Memory: 16GB
Chip: Apple M1 Pro
Python Version: 3.12.3
Java Version: 21.0.3
Maven: 3.9.6

Suggestion: I think linking try-it instead of getting-started is more suitable as I don't see a getting-started section on the read-me.

@@ -322,8 +316,8 @@ It turns out that optimizing for MRR@10 and MAP yields the same settings.

Here's the comparison between the Anserini default and optimized parameters:

| Setting | MRR@10 | MAP | Recall@1000 |
|:------------------------------------------------|-------:|-------:|------------:|
| Setting | MRR@10 | MAP | Recall@1000 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you introduced inconsistencies here? Old table seems fine to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I clicked on the file and my editor did some formatting automatically. I will fix that.

@@ -20,8 +20,8 @@ What's the problem we're trying to solve?

This is the definition I typically give:

> Given an information need expressed as a query _q_, the text retrieval task is to return a ranked list of _k_ texts {_d<sub>1</sub>_, _d<sub>2</sub>_ ... _d<sub>k</sub>_} from an arbitrarily large but finite collection
of texts _C_ = {_d<sub>i</sub>_} that maximizes a metric of interest, for example, nDCG, AP, etc.
> Given an information need expressed as a query _q_, the text retrieval task is to return a ranked list of _k_ texts {_d`<sub>`1`</sub>`_, _d`<sub>`2`</sub>`_ ... _d`<sub>`k`</sub>`_} from an arbitrarily large but finite collection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, I think I like original better...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah same here, I didn't mean to change that. My editor did that automatically for some reason. Will fix.

@lintool
Copy link
Member

lintool commented May 10, 2024

thanks for the de-linting. left comments.

@@ -89,7 +86,7 @@ On the other hand, retrieval needs to be fast, i.e., low latency, high throughpu

With the data prep above, we can now index the MS MARCO passage collection in `collections/msmarco-passage/collection_jsonl`.

If you haven't built Anserini already, build it now using the instructions in [anserini#-getting-started](https://github.com/castorini/anserini#-getting-started).
If you haven't built Anserini already, build it now using the instructions in [anserini#-try-it](https://github.com/castorini/anserini?tab=readme-ov-file#-try-it).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Installation" is the better link?

@KenWuqianghao
Copy link
Contributor Author

@lintool I have made the changes accordingly. Please let me know if anything I didn't expect breaks lol

@lintool lintool merged commit 6c6d2d0 into castorini:master May 10, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants