Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fulltext_index): integrate full-text indexer with sst writer #4302

Merged
merged 8 commits into from
Jul 7, 2024

Conversation

zhongzc
Copy link
Contributor

@zhongzc zhongzc commented Jul 5, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#4246

What's changed and what's your intention?

  • Add config for full-text index
  • Integrate full-text index with sst writer
  • TODO: tests til search available

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • New Features

    • Introduced new configuration options for full-text indexing in Mito engine, including settings for index creation, query application, and memory thresholds.
  • Bug Fixes

    • Enhanced error handling capabilities for full-text index operations.
  • Documentation

    • Updated configuration files with new full-text indexing options.
  • Tests

    • Added new test cases to validate full-text indexing configurations and operations.
  • Chores

    • Refined internal data structures and methods to support full-text indexing functionality.

@zhongzc zhongzc requested review from evenyag, v0y4g3r, waynexia and a team as code owners July 5, 2024 09:57
Copy link
Contributor

coderabbitai bot commented Jul 5, 2024

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Walkthrough

The changes introduce and integrate comprehensive full-text indexing capabilities into the Mito engine. Configurations for index creation during flush, compaction, and querying, along with memory thresholds, are introduced. New public constants and methods handle the configuration schema and error-handling mechanisms. The full-text index functionalities are integrated across key components, including compaction, flushing, and caching processes, enhancing the overall indexing and search capabilities of the engine.

Changes

Files Change Summary
config/config.md, config/datanode.example.toml, config/standalone.example.toml Introduced configuration options for full-text indexing, including settings for index creation on flush, compaction, and query, and memory threshold for index creation.
src/datatypes/src/schema.rs, src/datatypes/src/schema/column_schema.rs Added FulltextAnalyzer and FulltextOptions, constants for full-text keys, and methods in ColumnSchema for retrieving full-text options.
src/mito2/src/access_layer.rs, src/mito2/src/cache/write_cache.rs Incorporated fulltext_index_config into the AccessLayer and WriteCache structs.
src/mito2/src/compaction/compactor.rs, src/mito2/src/flush.rs Added handling of fulltext_index_config in compaction and flush tasks, integrating full-text indexing into these processes.
src/mito2/src/config.rs, src/mito2/src/error.rs Added FulltextIndexConfig struct with configurations and new error variants for full-text indexing.
src/mito2/src/sst/file.rs, src/mito2/src/sst/index.rs, src/mito2/src/sst/index/fulltext_index.rs Included FulltextIndex in FileMeta, IndexType, and added methods for full-text index availability and creation.
src/mito2/src/sst/index/fulltext_index/creator.rs Introduced SstIndexCreator and SingleCreator structs for creating and managing full-text indexes, with methods for updating, finishing, and aborting index creation.
src/mito2/src/sst/index/indexer/abort.rs, src/mito2/src/sst/index/indexer/finish.rs, src/mito2/src/sst/index/indexer/update.rs Added functions for aborting, finishing, and updating full-text indexes within the Indexer implementation.
src/mito2/src/sst/index/intermediate.rs, src/mito2/src/test_util.rs Added base_dir to IntermediateManager and methods for generating fulltext index paths. Adjusted test utilities for new full-text configurations.
tests-integration/tests/http.rs Added configuration section [region_engine.mito.fulltext_index] with settings for full-text indexing in integration tests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Configuration
    participant MitoEngine
    participant Indexer
    participant Storage

    User->>Configuration: Configure full-text index settings
    Configuration->>MitoEngine: Pass settings
    MitoEngine->>Indexer: Initialize with full-text index config
    Indexer->>Storage: Create/Update indexes during flush or compaction
    Storage-->>Indexer: Return index metadata
    Indexer-->>MitoEngine: Update active index
    MitoEngine-->>User: Provide search capabilities
Loading

Poem

In the engine, new indexes bloom,
Full-text now pierces the query's gloom.
On flush and compaction, they take their place,
Enhancing search with elegant grace.
Memory thresholds guide the way,
For faster finds, without delay.
Here's to data, swift and bright,
🐰 A joyful hop to indexing's new height! 📚✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jul 5, 2024
@zhongzc zhongzc requested a review from fengjiachun July 5, 2024 09:58
@zhongzc
Copy link
Contributor Author

zhongzc commented Jul 5, 2024

@coderabbitai pause

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (12)
config/standalone.example.toml (1)

460-480: Add documentation for new configuration options.

The added configuration options for the full-text index should be documented to explain their purpose and usage.

Consider adding comments to describe each option, similar to other sections in the file.

src/mito2/src/sst/index.rs (3)

15-15: Add documentation for the new module.

The fulltext_index module is added. Ensure that it is well-documented to explain its purpose and usage.

Consider adding module-level comments to describe the functionality of the fulltext_index module.


49-50: Add documentation for FulltextIndexOutput.

The FulltextIndexOutput struct is added. Ensure that it is well-documented to explain its purpose and usage.

Consider adding comments to describe each field in the FulltextIndexOutput struct.


332-332: Add test cases for FulltextIndexConfig.

The test cases now include FulltextIndexConfig. Ensure that all necessary test cases are added and cover all possible scenarios.

Consider adding more test cases to cover edge cases and potential issues.

src/datatypes/src/schema/column_schema.rs (3)

35-36: Add documentation for FULLTEXT_KEY.

The FULLTEXT_KEY constant is added. Ensure that it is well-documented to explain its purpose and usage.

Consider adding comments to describe the purpose of FULLTEXT_KEY.


313-322: Add documentation for FulltextOptions.

The FulltextOptions struct is added. Ensure that it is well-documented to explain its purpose and usage.

Consider adding comments to describe each field in the FulltextOptions struct.


324-330: Add documentation for FulltextAnalyzer.

The FulltextAnalyzer enum is added. Ensure that it is well-documented to explain its purpose and usage.

Consider adding comments to describe each variant in the FulltextAnalyzer enum.

config/config.md (5)

130-130: Add a description for the region_engine.mito.fulltext_index section.

The new section region_engine.mito.fulltext_index lacks a description. Adding a brief description will improve clarity.

| `region_engine.mito.fulltext_index` | -- | -- | The options for full-text index in Mito engine. |
+ | `region_engine.mito.fulltext_index` | -- | -- | Configuration options for full-text indexing in the Mito engine. |

131-131: Clarify the description for create_on_flush.

The description for create_on_flush should specify what "automatically" means in this context.

| `region_engine.mito.fulltext_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically<br/>- `disable`: never |
+ | `region_engine.mito.fulltext_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically based on internal logic<br/>- `disable`: never |

132-132: Clarify the description for create_on_compaction.

The description for create_on_compaction should specify what "automatically" means in this context.

| `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically<br/>- `disable`: never |
+ | `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically based on internal logic<br/>- `disable`: never |

133-133: Clarify the description for apply_on_query.

The description for apply_on_query should specify what "automatically" means in this context.

| `region_engine.mito.fulltext_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically<br/>- `disable`: never |
+ | `region_engine.mito.fulltext_index.apply_on_query` | String | `auto` | Whether to apply the index on query.<br/>- `auto`: automatically based on internal logic<br/>- `disable`: never |

134-134: Clarify the description for mem_threshold_on_create.

The description for mem_threshold_on_create should specify the unit of memory (e.g., bytes, MB).

| `region_engine.mito.fulltext_index.mem_threshold_on_create` | String | `64M` | Memory threshold for index creation. |
+ | `region_engine.mito.fulltext_index.mem_threshold_on_create` | String | `64M` | Memory threshold for index creation (e.g., 64MB). |
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b1219fa and 58eaeb5.

Files selected for processing (21)
  • config/config.md (2 hunks)
  • config/datanode.example.toml (1 hunks)
  • config/standalone.example.toml (1 hunks)
  • src/datatypes/src/schema.rs (1 hunks)
  • src/datatypes/src/schema/column_schema.rs (3 hunks)
  • src/mito2/src/access_layer.rs (3 hunks)
  • src/mito2/src/cache/write_cache.rs (3 hunks)
  • src/mito2/src/compaction/compactor.rs (3 hunks)
  • src/mito2/src/config.rs (3 hunks)
  • src/mito2/src/error.rs (2 hunks)
  • src/mito2/src/flush.rs (2 hunks)
  • src/mito2/src/sst/file.rs (1 hunks)
  • src/mito2/src/sst/index.rs (14 hunks)
  • src/mito2/src/sst/index/fulltext_index.rs (1 hunks)
  • src/mito2/src/sst/index/fulltext_index/creator.rs (1 hunks)
  • src/mito2/src/sst/index/indexer/abort.rs (2 hunks)
  • src/mito2/src/sst/index/indexer/finish.rs (4 hunks)
  • src/mito2/src/sst/index/indexer/update.rs (2 hunks)
  • src/mito2/src/sst/index/intermediate.rs (5 hunks)
  • src/mito2/src/test_util.rs (2 hunks)
  • tests-integration/tests/http.rs (1 hunks)
Files skipped from review due to trivial changes (1)
  • src/mito2/src/test_util.rs
Additional comments not posted (47)
src/mito2/src/sst/index/fulltext_index.rs (2)

15-15: Module declaration looks good.

The creator module is declared correctly.


17-17: Constant definition looks good.

The INDEX_BLOB_TYPE constant is defined correctly.

src/mito2/src/sst/index/indexer/update.rs (2)

29-31: Integration of do_update_fulltext_index looks good.

The do_update method correctly integrates the call to do_update_fulltext_index.


59-82: do_update_fulltext_index method looks good.

The method correctly updates the full-text index and handles errors appropriately.

src/mito2/src/sst/index/indexer/abort.rs (2)

23-23: Integration of do_abort_fulltext_index looks good.

The do_abort method correctly integrates the call to do_abort_fulltext_index.


48-67: do_abort_fulltext_index method looks good.

The method correctly aborts the full-text index creation and handles errors appropriately.

src/mito2/src/sst/index/indexer/finish.rs (3)

40-46: Integration of do_finish_fulltext_index looks good.

The do_finish method correctly integrates the call to do_finish_fulltext_index.


111-146: do_finish_fulltext_index method looks good.

The method correctly finishes the full-text index creation and handles errors appropriately.


165-180: fill_fulltext_index_output method looks good.

The method correctly fills the output with full-text index details.

src/mito2/src/sst/index/intermediate.rs (2)

32-32: Addition of base_dir field in IntermediateManager struct.

The addition of the base_dir field is appropriate for managing intermediate file paths.


65-79: Addition of fulltext_path function and corresponding test case.

The addition of the fulltext_path function and the corresponding test case is well-implemented and ensures proper path construction for fulltext index intermediate files.

Also applies to: 190-214

src/mito2/src/access_layer.rs (2)

25-25: Addition of fulltext_index_config field and modifications to write_sst method in AccessLayer struct.

The addition of the fulltext_index_config field and the modifications to the write_sst method are appropriate for handling fulltext index configurations.

Also applies to: 156-156


208-208: Addition of fulltext_index_config field in SstWriteRequest struct.

The addition of the fulltext_index_config field is necessary for passing fulltext index configurations to the write_sst method.

src/mito2/src/sst/file.rs (2)

131-132: Addition of FulltextIndex variant to IndexType enum.

The addition of the FulltextIndex variant is necessary for representing fulltext indexes.


139-141: Addition of fulltext_index_available method to FileMeta struct.

The addition of the fulltext_index_available method is necessary for checking the availability of fulltext indexes.

src/mito2/src/sst/index/fulltext_index/creator.rs (4)

40-48: Introduction of SstIndexCreator struct.

The introduction of the SstIndexCreator struct and its fields is necessary for managing fulltext index creation.


50-109: Addition of new method to SstIndexCreator struct.

The addition of the new method is well-implemented and initializes the SstIndexCreator with the necessary configurations and intermediate paths.


111-127: Addition of update method to SstIndexCreator struct.

The addition of the update method is well-implemented and updates the fulltext index with the given batch of data.


129-209: Addition of finish, abort, memory_usage, column_ids, is_empty, and other helper methods to SstIndexCreator struct.

The addition of the finish, abort, memory_usage, column_ids, is_empty, and other helper methods is well-implemented and necessary for managing the lifecycle and operations of the fulltext index creation process.

src/datatypes/src/schema.rs (1)

28-30: Exporting new entities for full-text indexing.

The additions of FulltextAnalyzer, FulltextOptions, and FULLTEXT_KEY are appropriate given the context of integrating full-text indexing. Ensure that these entities are properly defined in the column_schema module.

src/mito2/src/cache/write_cache.rs (3)

133-133: Integrate full-text index configuration in IndexerBuilder.

The addition of fulltext_index_config to the IndexerBuilder is properly done. Ensure that the fulltext_index_config is correctly populated in the write_request and utilized in the indexing process.


311-311: Integrate full-text index configuration in write request.

The fulltext_index_config addition to the write_request in the test is appropriate. Ensure that tests cover the full-text indexing scenarios once the search functionality is available.


396-396: Integrate full-text index configuration in another write request.

The fulltext_index_config addition to the write_request in another test is appropriate. Ensure that tests cover the full-text indexing scenarios once the search functionality is available.

config/datanode.example.toml (1)

437-457: Add configuration options for full-text indexing.

The new configuration section for full-text indexing is well-structured and covers necessary options like create_on_flush, create_on_compaction, apply_on_query, and mem_threshold_on_create. Ensure that these configurations are documented and properly utilized in the codebase.

src/mito2/src/config.rs (4)

112-113: Integrate full-text index configuration in MitoConfig.

The addition of fulltext_index to MitoConfig is appropriate. Ensure that the new configuration is correctly utilized throughout the codebase.


144-144: Add default value for full-text index configuration.

The default value for fulltext_index in MitoConfig is correctly set. Ensure that the default values align with the intended behavior for full-text indexing.


388-404: Define FulltextIndexConfig struct.

The FulltextIndexConfig struct is well-defined and covers necessary options. Ensure that these configurations are used correctly in the full-text indexing process.


405-415: Implement default for FulltextIndexConfig.

The default implementation for FulltextIndexConfig is appropriate. Ensure that the default values are suitable for typical use cases.

src/mito2/src/compaction/compactor.rs (3)

278-278: Ensure full-text index configuration is correctly cloned.

The full-text index configuration is cloned from the engine configuration. Ensure that the cloning operation is correct and the configuration is properly initialized.


303-303: Verify full-text index configuration in SstWriteRequest.

The SstWriteRequest now includes fulltext_index_config. Ensure that all necessary configurations are correctly passed and used in the request.


319-321: Check for potential issues in index availability check.

The code checks if the full-text index is available. Ensure that the logic correctly identifies the availability of the full-text index.

src/mito2/src/sst/index.rs (9)

33-33: Ensure correct usage of FulltextIndexConfig.

The FulltextIndexConfig is used in the file. Ensure that it is correctly initialized and used throughout the code.


38-38: Check for potential issues in FulltextIndexer usage.

The FulltextIndexer is used in the code. Ensure that it is correctly initialized and used in all necessary places.


64-73: Ensure consistent implementation of FulltextIndexOutput.

The FulltextIndexOutput struct is implemented. Ensure that its implementation is consistent with other similar structs in the codebase.


81-85: Check for potential issues in is_available method.

The is_available method is added to the FulltextIndexOutput struct. Ensure that the logic correctly identifies the availability of the full-text index.


95-95: Ensure correct initialization of fulltext_indexer.

The fulltext_indexer field is added to the Indexer struct. Ensure that it is correctly initialized and used in all necessary places.


130-132: Check for potential issues in memory usage calculation.

The memory usage calculation now includes fulltext_indexer. Ensure that the logic correctly calculates the memory usage.


146-146: Ensure correct usage of fulltext_index_config in IndexerBuilder.

The fulltext_index_config is used in the IndexerBuilder. Ensure that it is correctly initialized and used throughout the code.


161-162: Check for potential issues in build_fulltext_indexer method.

The build_fulltext_indexer method is added to the IndexerBuilder. Ensure that the logic correctly builds the full-text indexer.


239-295: Ensure correct implementation of build_fulltext_indexer method.

The build_fulltext_indexer method is implemented. Ensure that its implementation is correct and consistent with other similar methods in the codebase.

src/datatypes/src/schema/column_schema.rs (1)

246-256: Ensure correct implementation of fulltext_options method.

The fulltext_options method is added to the ColumnSchema struct. Ensure that its implementation is correct and consistent with other similar methods in the codebase.

src/mito2/src/error.rs (3)

787-793: New error variant for full-text options retrieval.

The FulltextOptions error variant is correctly defined to handle errors related to retrieving full-text options from column metadata.


795-823: New error variants for full-text index creation and management.

The error variants CreateFulltextCreator, CastVector, FulltextPushText, and FulltextFinish are correctly defined to handle errors related to full-text index creation and management.


946-949: Updated status codes for new error variants.

The status_code method has been correctly updated to handle the new error variants related to full-text indexing.

src/mito2/src/flush.rs (2)

335-335: Added fulltext_index_config to SstWriteRequest.

The fulltext_index_config field has been correctly added to the SstWriteRequest to handle full-text indexing configurations.


358-360: Added fulltext_index availability check to FileMeta.

The fulltext_index availability check has been correctly added to the available_indexes field in FileMeta to track the availability of full-text indexes.

tests-integration/tests/http.rs (1)

843-849: Verify the correctness of the added full-text index configuration.

The added configuration section for full-text indexing should be verified for correctness and consistency with the existing configuration structure.

Copy link
Contributor

coderabbitai bot commented Jul 5, 2024

Actions performed

Reviews paused.

Copy link

codecov bot commented Jul 5, 2024

Codecov Report

Attention: Patch coverage is 61.34615% with 201 lines in your changes missing coverage. Please review.

Project coverage is 84.88%. Comparing base (f71b7b9) to head (5189593).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4302      +/-   ##
==========================================
- Coverage   85.29%   84.88%   -0.41%     
==========================================
  Files        1066     1060       -6     
  Lines      188636   187534    -1102     
==========================================
- Hits       160889   159188    -1701     
- Misses      27747    28346     +599     

Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

config/datanode.example.toml Show resolved Hide resolved
config/standalone.example.toml Show resolved Hide resolved
src/mito2/src/sst/index/indexer/finish.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/index.rs Show resolved Hide resolved
Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhongzc zhongzc enabled auto-merge July 7, 2024 03:47
@zhongzc zhongzc added this pull request to the merge queue Jul 7, 2024
Merged via the queue into GreptimeTeam:main with commit a710676 Jul 7, 2024
60 checks passed
@zhongzc zhongzc deleted the zhongzc/fulltext-build-4 branch July 7, 2024 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants