-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fulltext_index): integrate full-text indexer with sst writer #4302
Conversation
Signed-off-by: Zhenchi <[email protected]>
Note Reviews pausedUse the following commands to manage reviews:
WalkthroughThe changes introduce and integrate comprehensive full-text indexing capabilities into the Mito engine. Configurations for index creation during flush, compaction, and querying, along with memory thresholds, are introduced. New public constants and methods handle the configuration schema and error-handling mechanisms. The full-text index functionalities are integrated across key components, including compaction, flushing, and caching processes, enhancing the overall indexing and search capabilities of the engine. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Configuration
participant MitoEngine
participant Indexer
participant Storage
User->>Configuration: Configure full-text index settings
Configuration->>MitoEngine: Pass settings
MitoEngine->>Indexer: Initialize with full-text index config
Indexer->>Storage: Create/Update indexes during flush or compaction
Storage-->>Indexer: Return index metadata
Indexer-->>MitoEngine: Update active index
MitoEngine-->>User: Provide search capabilities
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
@coderabbitai pause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (12)
config/standalone.example.toml (1)
460-480
: Add documentation for new configuration options.The added configuration options for the full-text index should be documented to explain their purpose and usage.
Consider adding comments to describe each option, similar to other sections in the file.
src/mito2/src/sst/index.rs (3)
15-15
: Add documentation for the new module.The
fulltext_index
module is added. Ensure that it is well-documented to explain its purpose and usage.Consider adding module-level comments to describe the functionality of the
fulltext_index
module.
49-50
: Add documentation forFulltextIndexOutput
.The
FulltextIndexOutput
struct is added. Ensure that it is well-documented to explain its purpose and usage.Consider adding comments to describe each field in the
FulltextIndexOutput
struct.
332-332
: Add test cases forFulltextIndexConfig
.The test cases now include
FulltextIndexConfig
. Ensure that all necessary test cases are added and cover all possible scenarios.Consider adding more test cases to cover edge cases and potential issues.
src/datatypes/src/schema/column_schema.rs (3)
35-36
: Add documentation forFULLTEXT_KEY
.The
FULLTEXT_KEY
constant is added. Ensure that it is well-documented to explain its purpose and usage.Consider adding comments to describe the purpose of
FULLTEXT_KEY
.
313-322
: Add documentation forFulltextOptions
.The
FulltextOptions
struct is added. Ensure that it is well-documented to explain its purpose and usage.Consider adding comments to describe each field in the
FulltextOptions
struct.
324-330
: Add documentation forFulltextAnalyzer
.The
FulltextAnalyzer
enum is added. Ensure that it is well-documented to explain its purpose and usage.Consider adding comments to describe each variant in the
FulltextAnalyzer
enum.config/config.md (5)
130-130
: Add a description for theregion_engine.mito.fulltext_index
section.The new section
region_engine.mito.fulltext_index
lacks a description. Adding a brief description will improve clarity.| `region_engine.mito.fulltext_index` | -- | -- | The options for full-text index in Mito engine. | + | `region_engine.mito.fulltext_index` | -- | -- | Configuration options for full-text indexing in the Mito engine. |
131-131
: Clarify the description forcreate_on_flush
.The description for
create_on_flush
should specify what "automatically" means in this context.| `region_engine.mito.fulltext_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically<br/>- `disable`: never | + | `region_engine.mito.fulltext_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically based on internal logic<br/>- `disable`: never |
132-132
: Clarify the description forcreate_on_compaction
.The description for
create_on_compaction
should specify what "automatically" means in this context.| `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically<br/>- `disable`: never | + | `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically based on internal logic<br/>- `disable`: never |
133-133
: Clarify the description forapply_on_query
.The description for
apply_on_query
should specify what "automatically" means in this context.| `region_engine.mito.fulltext_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically<br/>- `disable`: never | + | `region_engine.mito.fulltext_index.apply_on_query` | String | `auto` | Whether to apply the index on query.<br/>- `auto`: automatically based on internal logic<br/>- `disable`: never |
134-134
: Clarify the description formem_threshold_on_create
.The description for
mem_threshold_on_create
should specify the unit of memory (e.g., bytes, MB).| `region_engine.mito.fulltext_index.mem_threshold_on_create` | String | `64M` | Memory threshold for index creation. | + | `region_engine.mito.fulltext_index.mem_threshold_on_create` | String | `64M` | Memory threshold for index creation (e.g., 64MB). |
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (21)
- config/config.md (2 hunks)
- config/datanode.example.toml (1 hunks)
- config/standalone.example.toml (1 hunks)
- src/datatypes/src/schema.rs (1 hunks)
- src/datatypes/src/schema/column_schema.rs (3 hunks)
- src/mito2/src/access_layer.rs (3 hunks)
- src/mito2/src/cache/write_cache.rs (3 hunks)
- src/mito2/src/compaction/compactor.rs (3 hunks)
- src/mito2/src/config.rs (3 hunks)
- src/mito2/src/error.rs (2 hunks)
- src/mito2/src/flush.rs (2 hunks)
- src/mito2/src/sst/file.rs (1 hunks)
- src/mito2/src/sst/index.rs (14 hunks)
- src/mito2/src/sst/index/fulltext_index.rs (1 hunks)
- src/mito2/src/sst/index/fulltext_index/creator.rs (1 hunks)
- src/mito2/src/sst/index/indexer/abort.rs (2 hunks)
- src/mito2/src/sst/index/indexer/finish.rs (4 hunks)
- src/mito2/src/sst/index/indexer/update.rs (2 hunks)
- src/mito2/src/sst/index/intermediate.rs (5 hunks)
- src/mito2/src/test_util.rs (2 hunks)
- tests-integration/tests/http.rs (1 hunks)
Files skipped from review due to trivial changes (1)
- src/mito2/src/test_util.rs
Additional comments not posted (47)
src/mito2/src/sst/index/fulltext_index.rs (2)
15-15
: Module declaration looks good.The
creator
module is declared correctly.
17-17
: Constant definition looks good.The
INDEX_BLOB_TYPE
constant is defined correctly.src/mito2/src/sst/index/indexer/update.rs (2)
29-31
: Integration ofdo_update_fulltext_index
looks good.The
do_update
method correctly integrates the call todo_update_fulltext_index
.
59-82
:do_update_fulltext_index
method looks good.The method correctly updates the full-text index and handles errors appropriately.
src/mito2/src/sst/index/indexer/abort.rs (2)
23-23
: Integration ofdo_abort_fulltext_index
looks good.The
do_abort
method correctly integrates the call todo_abort_fulltext_index
.
48-67
:do_abort_fulltext_index
method looks good.The method correctly aborts the full-text index creation and handles errors appropriately.
src/mito2/src/sst/index/indexer/finish.rs (3)
40-46
: Integration ofdo_finish_fulltext_index
looks good.The
do_finish
method correctly integrates the call todo_finish_fulltext_index
.
111-146
:do_finish_fulltext_index
method looks good.The method correctly finishes the full-text index creation and handles errors appropriately.
165-180
:fill_fulltext_index_output
method looks good.The method correctly fills the output with full-text index details.
src/mito2/src/sst/index/intermediate.rs (2)
32-32
: Addition ofbase_dir
field inIntermediateManager
struct.The addition of the
base_dir
field is appropriate for managing intermediate file paths.
65-79
: Addition offulltext_path
function and corresponding test case.The addition of the
fulltext_path
function and the corresponding test case is well-implemented and ensures proper path construction for fulltext index intermediate files.Also applies to: 190-214
src/mito2/src/access_layer.rs (2)
25-25
: Addition offulltext_index_config
field and modifications towrite_sst
method inAccessLayer
struct.The addition of the
fulltext_index_config
field and the modifications to thewrite_sst
method are appropriate for handling fulltext index configurations.Also applies to: 156-156
208-208
: Addition offulltext_index_config
field inSstWriteRequest
struct.The addition of the
fulltext_index_config
field is necessary for passing fulltext index configurations to thewrite_sst
method.src/mito2/src/sst/file.rs (2)
131-132
: Addition ofFulltextIndex
variant toIndexType
enum.The addition of the
FulltextIndex
variant is necessary for representing fulltext indexes.
139-141
: Addition offulltext_index_available
method toFileMeta
struct.The addition of the
fulltext_index_available
method is necessary for checking the availability of fulltext indexes.src/mito2/src/sst/index/fulltext_index/creator.rs (4)
40-48
: Introduction ofSstIndexCreator
struct.The introduction of the
SstIndexCreator
struct and its fields is necessary for managing fulltext index creation.
50-109
: Addition ofnew
method toSstIndexCreator
struct.The addition of the
new
method is well-implemented and initializes theSstIndexCreator
with the necessary configurations and intermediate paths.
111-127
: Addition ofupdate
method toSstIndexCreator
struct.The addition of the
update
method is well-implemented and updates the fulltext index with the given batch of data.
129-209
: Addition offinish
,abort
,memory_usage
,column_ids
,is_empty
, and other helper methods toSstIndexCreator
struct.The addition of the
finish
,abort
,memory_usage
,column_ids
,is_empty
, and other helper methods is well-implemented and necessary for managing the lifecycle and operations of the fulltext index creation process.src/datatypes/src/schema.rs (1)
28-30
: Exporting new entities for full-text indexing.The additions of
FulltextAnalyzer
,FulltextOptions
, andFULLTEXT_KEY
are appropriate given the context of integrating full-text indexing. Ensure that these entities are properly defined in thecolumn_schema
module.src/mito2/src/cache/write_cache.rs (3)
133-133
: Integrate full-text index configuration in IndexerBuilder.The addition of
fulltext_index_config
to theIndexerBuilder
is properly done. Ensure that thefulltext_index_config
is correctly populated in thewrite_request
and utilized in the indexing process.
311-311
: Integrate full-text index configuration in write request.The
fulltext_index_config
addition to thewrite_request
in the test is appropriate. Ensure that tests cover the full-text indexing scenarios once the search functionality is available.
396-396
: Integrate full-text index configuration in another write request.The
fulltext_index_config
addition to thewrite_request
in another test is appropriate. Ensure that tests cover the full-text indexing scenarios once the search functionality is available.config/datanode.example.toml (1)
437-457
: Add configuration options for full-text indexing.The new configuration section for full-text indexing is well-structured and covers necessary options like
create_on_flush
,create_on_compaction
,apply_on_query
, andmem_threshold_on_create
. Ensure that these configurations are documented and properly utilized in the codebase.src/mito2/src/config.rs (4)
112-113
: Integrate full-text index configuration in MitoConfig.The addition of
fulltext_index
toMitoConfig
is appropriate. Ensure that the new configuration is correctly utilized throughout the codebase.
144-144
: Add default value for full-text index configuration.The default value for
fulltext_index
inMitoConfig
is correctly set. Ensure that the default values align with the intended behavior for full-text indexing.
388-404
: Define FulltextIndexConfig struct.The
FulltextIndexConfig
struct is well-defined and covers necessary options. Ensure that these configurations are used correctly in the full-text indexing process.
405-415
: Implement default for FulltextIndexConfig.The default implementation for
FulltextIndexConfig
is appropriate. Ensure that the default values are suitable for typical use cases.src/mito2/src/compaction/compactor.rs (3)
278-278
: Ensure full-text index configuration is correctly cloned.The full-text index configuration is cloned from the engine configuration. Ensure that the cloning operation is correct and the configuration is properly initialized.
303-303
: Verify full-text index configuration inSstWriteRequest
.The
SstWriteRequest
now includesfulltext_index_config
. Ensure that all necessary configurations are correctly passed and used in the request.
319-321
: Check for potential issues in index availability check.The code checks if the full-text index is available. Ensure that the logic correctly identifies the availability of the full-text index.
src/mito2/src/sst/index.rs (9)
33-33
: Ensure correct usage ofFulltextIndexConfig
.The
FulltextIndexConfig
is used in the file. Ensure that it is correctly initialized and used throughout the code.
38-38
: Check for potential issues inFulltextIndexer
usage.The
FulltextIndexer
is used in the code. Ensure that it is correctly initialized and used in all necessary places.
64-73
: Ensure consistent implementation ofFulltextIndexOutput
.The
FulltextIndexOutput
struct is implemented. Ensure that its implementation is consistent with other similar structs in the codebase.
81-85
: Check for potential issues inis_available
method.The
is_available
method is added to theFulltextIndexOutput
struct. Ensure that the logic correctly identifies the availability of the full-text index.
95-95
: Ensure correct initialization offulltext_indexer
.The
fulltext_indexer
field is added to theIndexer
struct. Ensure that it is correctly initialized and used in all necessary places.
130-132
: Check for potential issues in memory usage calculation.The memory usage calculation now includes
fulltext_indexer
. Ensure that the logic correctly calculates the memory usage.
146-146
: Ensure correct usage offulltext_index_config
inIndexerBuilder
.The
fulltext_index_config
is used in theIndexerBuilder
. Ensure that it is correctly initialized and used throughout the code.
161-162
: Check for potential issues inbuild_fulltext_indexer
method.The
build_fulltext_indexer
method is added to theIndexerBuilder
. Ensure that the logic correctly builds the full-text indexer.
239-295
: Ensure correct implementation ofbuild_fulltext_indexer
method.The
build_fulltext_indexer
method is implemented. Ensure that its implementation is correct and consistent with other similar methods in the codebase.src/datatypes/src/schema/column_schema.rs (1)
246-256
: Ensure correct implementation offulltext_options
method.The
fulltext_options
method is added to theColumnSchema
struct. Ensure that its implementation is correct and consistent with other similar methods in the codebase.src/mito2/src/error.rs (3)
787-793
: New error variant for full-text options retrieval.The
FulltextOptions
error variant is correctly defined to handle errors related to retrieving full-text options from column metadata.
795-823
: New error variants for full-text index creation and management.The error variants
CreateFulltextCreator
,CastVector
,FulltextPushText
, andFulltextFinish
are correctly defined to handle errors related to full-text index creation and management.
946-949
: Updated status codes for new error variants.The
status_code
method has been correctly updated to handle the new error variants related to full-text indexing.src/mito2/src/flush.rs (2)
335-335
: Addedfulltext_index_config
toSstWriteRequest
.The
fulltext_index_config
field has been correctly added to theSstWriteRequest
to handle full-text indexing configurations.
358-360
: Addedfulltext_index
availability check toFileMeta
.The
fulltext_index
availability check has been correctly added to theavailable_indexes
field inFileMeta
to track the availability of full-text indexes.tests-integration/tests/http.rs (1)
843-849
: Verify the correctness of the added full-text index configuration.The added configuration section for full-text indexing should be verified for correctness and consistency with the existing configuration structure.
Actions performedReviews paused. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4302 +/- ##
==========================================
- Coverage 85.29% 84.88% -0.41%
==========================================
Files 1066 1060 -6
Lines 188636 187534 -1102
==========================================
- Hits 160889 159188 -1701
- Misses 27747 28346 +599 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Zhenchi <[email protected]>
Signed-off-by: Zhenchi <[email protected]>
Signed-off-by: Zhenchi <[email protected]>
Signed-off-by: Zhenchi <[email protected]>
Signed-off-by: Zhenchi <[email protected]>
Signed-off-by: Zhenchi <[email protected]>
1dd999c
to
cd50c62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Zhenchi <[email protected]>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
#4246
What's changed and what's your intention?
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores