Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: register & deregister region failure detectors actively #4223

Merged
merged 17 commits into from
Jul 1, 2024

Conversation

WenyXu
Copy link
Member

@WenyXu WenyXu commented Jun 27, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#4161

What's changed and what's your intention?

  • Register failure detector during create table procedure. The datanode may crash without sending a heartbeat that contains information about newly created regions, which may prevent the RegionSupervisor from detecting failures in these newly created regions.
  • Register failure detector during rollback region migration procedure. The original failure detector was removed once the procedure was triggered. Therefore, we need to register the failure detector for the failed region during rollback.
  • Deregister failure detectors during drop table procedure. Once the regions were dropped, subsequent heartbeats no longer include these regions. Therefore, we need to remove the failure detectors for these dropped regions.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • New Features

    • Introduced region failure detection for enhanced table creation and region management processes.
  • Improvements

    • Updated handling of region failure detectors across various components.
    • Added initialization and registration of failure detectors in multiple workflows.
  • Configuration

    • Adjusted acceptable_heartbeat_pause value from "3000ms" to "10000ms" for better failure detection timing.
  • Tests

    • Updated test setups to include failure detector initialization and validated event handling.
    • Modified random generation parameters in failover tests for optimized data handling.

@WenyXu WenyXu requested review from MichaelScofield and a team as code owners June 27, 2024 10:59
Copy link
Contributor

coderabbitai bot commented Jun 27, 2024

Walkthrough

The changes introduce a RegionFailureDetectorController to manage region failure detection across various parts of the system. Modifications include adding and initializing the controller, updating procedures to use the controller, and adjusting configurations and tests to incorporate the new failure detection mechanism.

Changes

Files Change Summaries
.../standalone.rs, .../ddl.rs, .../ddl/create_table.rs Added NoopRegionFailureDetectorControl and updates to initialization, import adjustments, and inclusion in various procedures related to region failure detection.
.../drop_database.rs, .../drop_table.rs Added ClusterId entity in several structs, modified procedure implementations to utilize ClusterId, and updated drop actions to include failure detector handling.
.../executor.rs, .../cursor.rs, .../metadata.rs Added cluster_id to context structs and procedure handling, modified method signatures and initialization to include this new field.
.../failure_handler.rs, .../failure_detector.rs, Added heartbeat_acceptor parameter to method signatures, updated methods to handle new failure detection logic, refactored detection methods with DetectingRegion.
config/metasrv.example.toml, config/config.md Updated acceptable_heartbeat_pause value to 10000ms.
tests-fuzz/.../fuzz_failover_mito_regions.rs Modified random generation parameters for rows, inserts, and parallelism in the fuzzing test to reduce the range and increase test efficiency.
.../rollback_downgraded_region.rs, .../supervisor.rs Introduced RegionFailureDetectorControl for managing failure detectors, added event handling for registering/deregistering detectors, and enhanced related test cases.

In meadows wide, regions bloom anew,

With failure detectors, robust and true.

Procedures align, clusters refined,

Heartbeats steady, no regions left behind.

Code sings in brevity, a harmonious cue,

As systems evolve, their strengths accrue.


Tip

Early access features: disabled

We are currently testing the following features in early access:

  • OpenAI gpt-4o model for code reviews and chat: OpenAI claims that this model is better at understanding and generating code than the previous models. We seek your feedback over the next few weeks before making it generally available.

Note:

  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.
  • Please join our Discord Community to provide feedback and report issues.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jun 27, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Outside diff range and nitpick comments (1)
src/meta-srv/src/procedure/region_migration.rs (1)

Line range hint 158-203: Assess the impact of the failure detector on region migration logic.

The integration of the region_failure_detector_controller into the region migration logic is a significant change. It is essential to conduct thorough testing, particularly integration tests, to ensure that the failure detection mechanisms do not adversely affect the migration processes.

Would you like help in setting up the integration tests or reviewing the existing test scenarios to ensure comprehensive coverage?

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 10b7a3d and 45a2ac8.

Files selected for processing (22)
  • src/cmd/src/standalone.rs (2 hunks)
  • src/common/meta/src/ddl.rs (4 hunks)
  • src/common/meta/src/ddl/create_table.rs (1 hunks)
  • src/common/meta/src/ddl/drop_database.rs (4 hunks)
  • src/common/meta/src/ddl/drop_database/cursor.rs (3 hunks)
  • src/common/meta/src/ddl/drop_database/executor.rs (7 hunks)
  • src/common/meta/src/ddl/drop_database/metadata.rs (2 hunks)
  • src/common/meta/src/ddl/drop_database/start.rs (3 hunks)
  • src/common/meta/src/ddl/drop_table.rs (1 hunks)
  • src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
  • src/common/meta/src/ddl_manager.rs (2 hunks)
  • src/common/meta/src/test_util.rs (2 hunks)
  • src/meta-srv/src/failure_detector.rs (4 hunks)
  • src/meta-srv/src/handler/failure_handler.rs (2 hunks)
  • src/meta-srv/src/metasrv/builder.rs (4 hunks)
  • src/meta-srv/src/procedure/region_migration.rs (4 hunks)
  • src/meta-srv/src/procedure/region_migration/test_util.rs (2 hunks)
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
  • src/meta-srv/src/procedure/utils.rs (2 hunks)
  • src/meta-srv/src/region/failure_detector.rs (2 hunks)
  • src/meta-srv/src/region/supervisor.rs (13 hunks)
  • tests-integration/src/standalone.rs (2 hunks)
Files not reviewed due to errors (1)
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (no review received)
Additional comments not posted (34)
src/meta-srv/src/handler/failure_handler.rs (2)

29-32: Constructor updated to include HeartbeatAcceptor.

The addition of HeartbeatAcceptor as a parameter in the constructor is aligned with the PR's goal to enhance failure detection. Ensure that all instances of RegionFailureHandler creation are updated accordingly.


76-82: Updated test method to include HeartbeatAcceptor.

The test method test_handle_heartbeat now properly initializes HeartbeatAcceptor, which is crucial for testing the new functionality. Ensure the rest of the test suite is similarly updated to cover all new changes.

src/meta-srv/src/region/failure_detector.rs (2)

54-54: Method region_failure_detector correctly handles Ident.

This method efficiently handles the retrieval or creation of PhiAccrualFailureDetector instances, ensuring that failure detection can be dynamically managed for each Ident.


64-76: Method maybe_init_region_failure_detector efficiently handles initialization.

This method correctly initializes a PhiAccrualFailure Detector with a timestamp, which is crucial for accurate failure detection based on recent activity.

src/common/meta/src/ddl/drop_database/start.rs (1)

92-92: Proper initialization of cluster_id in DropDatabaseContext.

The cluster_id field is consistently initialized across different test scenarios. Ensure that the actual usage of cluster_id in production scenarios is similarly consistent and correct.

Also applies to: 108-108, 131-131

src/common/meta/src/ddl/drop_database/metadata.rs (1)

121-121: Proper initialization of cluster_id in DropDatabaseContext.

The cluster_id field is consistently initialized across different test scenarios. Ensure that the actual usage of cluster_id in production scenarios is similarly consistent and correct.

Also applies to: 148-148

src/common/meta/src/ddl.rs (1)

105-125: Introduction of RegionFailureDetectorController and its no-op implementation.

The new trait RegionFailureDetectorController and its no-operation implementation NoopRegionFailureDetectorControl are well-defined. They provide a clear interface for failure detection management, enhancing the system's modularity and extensibility.

src/common/meta/src/ddl/drop_database.rs (1)

38-38: Ensure appropriate imports for ClusterId.

The file now uses ClusterId, but it's unclear if it's properly imported. This could potentially lead to compilation errors if not handled correctly.

Verification successful

The import of ClusterId is correctly handled.

The ClusterId type is defined within the crate as a type alias for u64, ensuring that the import statement use crate::ClusterId; in src/common/meta/src/ddl/drop_database.rs is appropriate and will not cause any compilation issues.

  • src/common/meta/src/lib.rs: pub type ClusterId = u64;
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify if `ClusterId` is imported in the file.

# Test: Search for the import statement for `ClusterId`.
rg --type rust 'use crate::ClusterId;' src/common/meta/src/ddl/drop_database.rs

Length of output: 103



Script:

#!/bin/bash
# Description: Verify if `ClusterId` is defined within the crate.

# Test: Search for the definition of `ClusterId` within the crate.
ast-grep --lang rust --pattern 'struct ClusterId { $$$ }' 

Length of output: 59



Script:

#!/bin/bash
# Description: Verify if `ClusterId` is defined within the crate.

# Test: Search for any definition of `ClusterId` within the crate.
rg --type rust 'ClusterId' 

Length of output: 10475

src/common/meta/src/test_util.rs (2)

27-27: Addition of NoopRegionFailureDetectorControl for testing purposes.

The introduction of NoopRegionFailureDetectorControl in the testing utilities is appropriate for simulating behavior without actual side effects. Ensure that this aligns with the testing strategies and that it is used consistently across tests.


185-185: Usage of NoopRegionFailureDetectorControl in new_ddl_context.

This ensures that the testing environment simulates the failure detection mechanism without performing real operations, which is crucial for unit tests that don't require actual side effects.

src/meta-srv/src/procedure/utils.rs (2)

117-117: Addition of NoopRegionFailureDetectorControl in procedural utilities.

The inclusion of NoopRegionFailureDetectorControl in procedural utilities is vital for testing and simulating procedural operations without real side effects. This is particularly important for unit tests and integration tests where actual failure detection is not required.


229-229: Consistent use of NoopRegionFailureDetectorControl in new_ddl_context.

The consistent use of NoopRegionFailureDetectorControl across different utility functions ensures that the testing environment is uniform and predictable, which is crucial for reliable testing outcomes.
[APROVED]

tests-integration/src/standalone.rs (2)

26-26: Inclusion of NoopRegionFailureDetectorControl in integration tests.

The use of NoopRegionFailureDetectorControl in integration tests is appropriate for simulating the behavior of the region failure detection without actual network or hardware interaction, which is crucial for isolating tests from external dependencies.


202-202: Proper integration of NoopRegionFailureDetectorControl in DdlContext.

This ensures that the integration tests simulate the region failure detection mechanism accurately without performing real operations, which is essential for testing in a controlled environment.

src/common/meta/src/ddl/drop_table.rs (1)

282-282: Ensure consistent usage of cluster_id across related methods.

The addition of cluster_id to the DropTableExecutor constructor is consistent with the changes made in other parts of the system to enhance region failure handling. Confirm that cluster_id is being used appropriately within the DropTableExecutor methods to manage region-specific operations effectively.

Verification successful

Ensure consistent usage of cluster_id across related methods.

The cluster_id is consistently used in the DropTableExecutor constructor and within a method to manage region-specific operations. The usage appears to be appropriate for the intended functionality.

  • DropTableExecutor constructor: Initializes cluster_id.
  • Method usage: Utilizes self.cluster_id in region-related operations.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify consistent usage of `cluster_id` in related methods.

# Test: Search for `cluster_id` usage in `DropTableExecutor` methods. Expect: Consistent and correct usage.
rg --type rust --context 5 $'cluster_id' src/common/meta/src/ddl/drop_table/executor.rs

Length of output: 1090



Script:

#!/bin/bash
# Description: Verify consistent usage of `cluster_id` in all methods of `DropTableExecutor`.

# Step 1: List all methods within the `DropTableExecutor` implementation.
ast-grep --lang rust --pattern 'impl DropTableExecutor { $$$ }' src/common/meta/src/ddl/drop_table/executor.rs

# Step 2: Check for `cluster_id` usage within each method of `DropTableExecutor`.
rg --type rust 'impl DropTableExecutor' -A 1000 src/common/meta/src/ddl/drop_table/executor.rs | rg 'fn ' | sed 's/^/rg cluster_id -A 5 -B 5 /' | bash

Length of output: 15703



Script:

#!/bin/bash
# Description: Verify consistent usage of `cluster_id` in all methods of `DropTableExecutor`.

# Step 1: List all methods within the `DropTableExecutor` implementation.
ast-grep --lang rust --pattern 'impl DropTableExecutor { $$$ }' src/common/meta/src/ddl/drop_table/executor.rs

# Step 2: Check for `cluster_id` usage within each method of `DropTableExecutor`.
rg --type rust 'impl DropTableExecutor' -A 1000 src/common/meta/src/ddl/drop_table/executor.rs | rg 'fn ' | awk '{print $2}' | xargs -I {} rg 'cluster_id' -A 5 -B 5 src/common/meta/src/ddl/drop_table/executor.rs | grep -B 5 -A 5 -E 'fn {}'

Length of output: 15713



Script:

#!/bin/bash
# Description: Verify consistent usage of `cluster_id` in all methods of `DropTableExecutor`.

# Search for `cluster_id` within the `DropTableExecutor` implementation block.
rg 'impl DropTableExecutor' -A 1000 src/common/meta/src/ddl/drop_table/executor.rs | rg 'cluster_id' -A 5 -B 5

Length of output: 1111

src/common/meta/src/ddl/drop_table/executor.rs (1)

54-59: Proper integration of cluster_id in DropTableExecutor.

The addition of cluster_id to the DropTableExecutor constructor is correctly implemented. Ensure that cluster_id is utilized effectively within the executor's methods to manage cluster-specific operations.

Verification successful

Proper integration of cluster_id in DropTableExecutor.

The addition of cluster_id to the DropTableExecutor constructor is correctly implemented and effectively used within its methods.

  • DropTableExecutor constructor initializes cluster_id properly.
  • cluster_id is utilized in a method that processes region leaders.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the effective usage of `cluster_id` within `DropTableExecutor` methods.

# Test: Search for `cluster_id` usage in `DropTableExecutor` methods. Expect: Effective and correct usage.
rg --type rust --context 5 $'cluster_id' src/common/meta/src/ddl/drop_table/executor.rs

Length of output: 1090

src/common/meta/src/ddl/drop_database/executor.rs (1)

190-190: Consistency check for cluster_id default value in DropDatabaseContext.

The repeated initialization of DropDatabaseContext with a default cluster_id of 0 across multiple methods raises concerns about the appropriateness of this default value. Ensure that cluster_id is set dynamically based on operational requirements or a meaningful default is provided.

Also applies to: 203-203, 257-257, 347-347, 377-377

src/common/meta/src/ddl/create_table.rs (1)

293-296: Add logging for failure detector registration.

Consider adding a log statement after registering the failure detectors to improve traceability and help in debugging.

+        info!("Registered failure detectors for table {table_id} with identifiers {ident:?}");
         self.context
             .region_failure_detector_controller
             .register_failure_detectors(ident)
             .await;
src/meta-srv/src/metasrv/builder.rs (3)

25-27: Check usage of newly imported modules.

The imports NoopRegionFailureDetectorControl and RegionFailureDetectorControllerRef are part of the new failure detection mechanism. Ensure that these are utilized effectively in the server initialization and configuration.


290-311: Ensure correct conditional logic for failure detection setup.

The conditional setup for the region_failure_detector_controller and region_supervisor_ticker is crucial for the correct functioning of the failure detection mechanism. Make sure that the conditions (options.enable_region_failover and is_remote_wal) and their implications are correctly implemented and tested.

- if !is_remote_wal && options.enable_region_failover {
+ if options.enable_region_failover && !is_remote_wal {
    return error::UnexpectedSnafu {
        violated: "Region failover is not supported in the local WAL implementation!",
    }
    .fail();
}
Verification successful

Conditional logic for failure detection setup is correctly implemented.

The condition if !is_remote_wal && options.enable_region_failover is present and correctly handles the scenario where region failover is not supported in the local WAL implementation.

  • src/meta-srv/src/metasrv/builder.rs: Line 290
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the conditional logic in failure detection setup.

# Test: Check if the conditions are correctly implemented.
rg --type rust $'if options.enable_region_failover && !is_remote_wal'

Length of output: 70



Script:

#!/bin/bash
# Description: Verify the usage of `is_remote_wal` and `options.enable_region_failover` in the codebase.

# Search for the presence and usage of `is_remote_wal` in the codebase.
rg --type rust 'is_remote_wal'

# Search for the presence and usage of `options.enable_region_failover` in the codebase.
rg --type rust 'options.enable_region_failover'

Length of output: 920


Line range hint 325-355: Validate final server assembly in the MetasrvBuilder build method.

The final assembly of the Metasrv object involves integrating various components such as region_failure_detector_controller, region_migration_manager, and others. Ensure that all components are correctly initialized and integrated, and that the configuration options are respected throughout the process.

src/meta-srv/src/failure_detector.rs (1)

Line range hint 18-88: Review enhancements to PhiAccrualFailureDetector.

The enhancements to the PhiAccrualFailureDetector, including new traits (Debug, Clone, PartialEq) and configuration options (PhiAccrualFailureDetectorOptions), are crucial for the flexibility and testability of the failure detection mechanism. Ensure that these enhancements are correctly implemented and that they integrate seamlessly with the existing logic.

src/meta-srv/src/procedure/region_migration/test_util.rs (2)

23-23: Added import of NoopRegionFailureDetectorControl.

This import is essential for the new testing environment setup that includes the failure detector control. Ensure that this class is appropriately utilized within the test suites to simulate the behavior of the region failure detectors.


154-154: Integration of NoopRegionFailureDetectorControl into the TestingEnv context factory.

The integration of NoopRegionFailureDetectorControl into the context_factory method is crucial for ensuring that the region failure detection mechanism is appropriately simulated during testing. This change helps maintain the fidelity of the test environment relative to the production environment where such detectors are active.

src/meta-srv/src/region/supervisor.rs (4)

19-19: Import of RegionFailureDetectorController.

The import of RegionFailureDetectorController is critical for the new functionality that allows dynamic management of region failure detectors. This aligns with the PR's objectives of enhancing the system's reliability and fault tolerance.


79-110: Addition of new Event variants for failure detector management.

The introduction of RegisterFailureDetectors and DeregisterFailureDetectors as new event types in the Event enum is a significant enhancement. These events facilitate the dynamic management of failure detectors, which is central to the functionality introduced in this PR. Ensure that these events are handled appropriately in the system's event loop to maintain robustness.


133-139: Initialization of RegionSupervisorTicker.

The setup of RegionSupervisorTicker with a configurable tick interval is a good practice, allowing for flexibility in how frequently the system checks for region failures. This setup is crucial for maintaining the system's responsiveness to failure conditions.


307-320: Methods for registering and deregistering failure detectors in RegionSupervisor.

These methods are crucial for managing the state of region failure detectors dynamically. The implementation is straightforward and aligns with the system's requirements for handling region failures proactively.

src/cmd/src/standalone.rs (1)

28-28: Approved import of failure detector related entities.

The import of DdlContext, NoopRegionFailureDetectorControl, and ProcedureExecutorRef aligns with the PR's goal to enhance region failure detection capabilities.

src/common/meta/src/ddl_manager.rs (1)

Line range hint 808-859: Approved the import and initialization of NoopRegionFailureDetectorControl

The import and initialization of NoopRegionFailureDetectorControl in the test setup are correctly handled. This ensures that the new region failure detection feature can be adequately tested.

src/meta-srv/src/procedure/region_migration.rs (4)

158-158: Integration of region_failure_detector_controller in structs and methods.

The region_failure_detector_controller is added to multiple structs (DefaultContextFactory, Context) and is used in method implementations. Ensure that this integration does not interfere with existing functionalities and adheres to the intended use cases for region failure management.

Also applies to: 164-164, 168-168, 176-176, 190-190, 203-203


158-176: Proper initialization of region_failure_detector_controller.

The initialization in DefaultContextFactory appears correct. It is important to ensure that the region_failure_detector_controller is properly passed to each new context created, maintaining thread safety and consistency.


32-32: Ensure appropriate usage of imported RegionFailureDetectorControllerRef.

It's crucial to verify that the imported RegionFailureDetectorControllerRef is used correctly throughout the file. This is part of the new mechanism to actively manage region failure detection.


190-190: Verify the propagation of region_failure_detector_controller in context creation.

The region_failure_detector_controller is cloned and passed to the Context struct in the new_context method. This is crucial for ensuring that each context has its own reference to the controller, which is necessary for correct failure detection operations.

Also applies to: 203-203

Verification successful

The region_failure_detector_controller is correctly cloned and propagated in the new_context method.

The new_context method in src/meta-srv/src/procedure/region_migration.rs correctly clones the region_failure_detector_controller and passes it to the Context struct, ensuring that each context has its own reference to the controller.

  • Verified at:
    • src/meta-srv/src/procedure/region_migration.rs:190
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the correct propagation and cloning of `region_failure_detector_controller` in context creation.

# Test: Search for the `new_context` method calls and verify the propagation of the controller.
rg --type rust $'new_context'

Length of output: 7604



Script:

#!/bin/bash
# Description: Extract the implementation of the `new_context` method to verify the propagation and cloning of `region_failure_detector_controller`.

# Find the implementation of `new_context` method.
ast-grep --lang rust --pattern 'fn new_context($_, $_) -> $_ {
  $$$
}' src/meta-srv/src/procedure/region_migration.rs

Length of output: 1164

src/common/meta/src/ddl/create_table.rs Outdated Show resolved Hide resolved
src/meta-srv/src/region/supervisor.rs Outdated Show resolved Hide resolved
src/cmd/src/standalone.rs Show resolved Hide resolved
src/common/meta/src/ddl/drop_database.rs Show resolved Hide resolved
@WenyXu WenyXu force-pushed the feat/register-failure-detector branch from 45a2ac8 to d0f2134 Compare June 27, 2024 14:25
@WenyXu
Copy link
Member Author

WenyXu commented Jun 27, 2024

@coderabbitai resolve

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 45a2ac8 and d0f2134.

Files selected for processing (14)
  • src/common/meta/src/ddl/create_table.rs (1 hunks)
  • src/common/meta/src/ddl/drop_database.rs (4 hunks)
  • src/common/meta/src/ddl/drop_database/cursor.rs (3 hunks)
  • src/common/meta/src/ddl/drop_database/executor.rs (7 hunks)
  • src/common/meta/src/ddl/drop_database/metadata.rs (2 hunks)
  • src/common/meta/src/ddl/drop_database/start.rs (3 hunks)
  • src/common/meta/src/ddl/drop_table.rs (1 hunks)
  • src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
  • src/meta-srv/src/handler/failure_handler.rs (2 hunks)
  • src/meta-srv/src/metasrv/builder.rs (4 hunks)
  • src/meta-srv/src/procedure/region_migration.rs (4 hunks)
  • src/meta-srv/src/procedure/region_migration/test_util.rs (2 hunks)
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
  • src/meta-srv/src/region/supervisor.rs (13 hunks)
Files skipped from review as they are similar to previous changes (13)
  • src/common/meta/src/ddl/create_table.rs
  • src/common/meta/src/ddl/drop_database.rs
  • src/common/meta/src/ddl/drop_database/cursor.rs
  • src/common/meta/src/ddl/drop_database/executor.rs
  • src/common/meta/src/ddl/drop_database/metadata.rs
  • src/common/meta/src/ddl/drop_database/start.rs
  • src/common/meta/src/ddl/drop_table.rs
  • src/common/meta/src/ddl/drop_table/executor.rs
  • src/meta-srv/src/handler/failure_handler.rs
  • src/meta-srv/src/metasrv/builder.rs
  • src/meta-srv/src/procedure/region_migration.rs
  • src/meta-srv/src/procedure/region_migration/test_util.rs
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs
Additional comments not posted (7)
src/meta-srv/src/region/supervisor.rs (7)

Line range hint 19-39: Ensure appropriate use of newly added imports and declarations.

The imports and global declarations related to RegionFailureDetectorController and related entities are correctly added. These are necessary for the new functionality around region failure detection. Ensure these are used consistently throughout the file.


39-39: Review the DatanodeHeartbeat struct.

The DatanodeHeartbeat struct is well-defined and includes necessary fields for handling heartbeats. This struct will play a critical role in the new failure detection mechanism by providing timely data from data nodes.


79-110: Ensure correct integration of new events in the Event enum.

The addition of RegisterFailureDetectors and DeregisterFailureDetectors to the Event enum is crucial for the dynamic management of failure detectors. Ensure that these events are handled appropriately in all relevant parts of the system to maintain robustness.
[APROVED]


133-140: Review modifications in the RegionSupervisorTicker struct.

The modifications to RegionSupervisorTicker are appropriate and support the new functionality of handling tick events for failure detection. Ensure that the implementation does not introduce any performance bottlenecks or race conditions.


249-252: Review the HeartbeatAcceptor implementation.

The HeartbeatAcceptor is correctly implemented to forward heartbeat signals. Ensure that this implementation is efficient and error-resistant, particularly in high-load scenarios.


290-295: Ensure robust handling of failure detector events in the main loop.

The handling of RegisterFailureDetectors and DeregisterFailureDetectors events in the main loop of RegionSupervisor is crucial for dynamic failure management. Ensure that these events are processed efficiently and correctly.


Line range hint 456-622: Review the test implementations for RegionSupervisor.

The tests appear to be comprehensive and effectively simulate various scenarios related to region failure detection. Ensure that all new functionalities are covered by the tests to prevent regressions.

Copy link
Contributor

coderabbitai bot commented Jun 27, 2024

Actions performed

Comments resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d0f2134 and e03ba5a.

Files selected for processing (1)
  • config/metasrv.example.toml (1 hunks)
Files skipped from review due to trivial changes (1)
  • config/metasrv.example.toml

Copy link

codecov bot commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 83.94649% with 48 lines in your changes missing coverage. Please review.

Project coverage is 84.59%. Comparing base (a7aa556) to head (cef5b65).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4223      +/-   ##
==========================================
- Coverage   84.86%   84.59%   -0.28%     
==========================================
  Files        1045     1046       +1     
  Lines      184265   184597     +332     
==========================================
- Hits       156384   156158     -226     
- Misses      27881    28439     +558     

@WenyXu WenyXu force-pushed the feat/register-failure-detector branch from e03ba5a to e4f318e Compare June 28, 2024 03:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e03ba5a and e4f318e.

Files selected for processing (2)
  • config/config.md (1 hunks)
  • config/metasrv.example.toml (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • config/metasrv.example.toml
Additional context used
LanguageTool
config/config.md

[uncategorized] ~31-~31: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload.
For now, gRPC tls config does not ...

(AUTO_HYPHEN)


[uncategorized] ~31-~31: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...r now, gRPC tls config does not support auto reload. | | mysql | -- | -- | MySQL server o...

(AUTO_HYPHEN)


[uncategorized] ~40-~40: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload | | postgres | -- | -- | PostgresSQL ...

(AUTO_HYPHEN)


[uncategorized] ~49-~49: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload | | opentsdb | -- | -- | OpenTSDB pro...

(AUTO_HYPHEN)


[grammar] ~93-~93: “Google” is a proper noun and needs to be capitalized.
Context: ...e| String |None` | The scope of the google cloud storage.
**It's only used whe...

(A_GOOGLE)


[grammar] ~94-~94: “Google” is a proper noun and needs to be capitalized.
Context: ...g | None | The credential path of the google cloud storage.
**It's only used whe...

(A_GOOGLE)


[style] ~134-~134: To form a complete sentence, be sure to include a subject.
Context: ...vel| String |None| The log level. Can beinfo/debug/warn/error. | | ...

(MISSING_IT_THERE)


[grammar] ~143-~143: Consider using either the past participle “recommended” or the present participle “recommending” here.
Context: ...For standalone mode, self_import is recommend to collect metrics generated by itself ...

(BEEN_PART_AGREEMENT)


[uncategorized] ~179-~179: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload.
For now, gRPC tls config does not ...

(AUTO_HYPHEN)


[uncategorized] ~179-~179: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...r now, gRPC tls config does not support auto reload. | | mysql | -- | -- | MySQL server o...

(AUTO_HYPHEN)


[uncategorized] ~188-~188: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload | | postgres | -- | -- | PostgresSQL ...

(AUTO_HYPHEN)


[uncategorized] ~197-~197: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload | | opentsdb | -- | -- | OpenTSDB pro...

(AUTO_HYPHEN)


[style] ~221-~221: To form a complete sentence, be sure to include a subject.
Context: ...vel| String |None| The log level. Can beinfo/debug/warn/error. | | ...

(MISSING_IT_THERE)


[grammar] ~230-~230: Consider using either the past participle “recommended” or the present participle “recommending” here.
Context: ...For standalone mode, self_import is recommend to collect metrics generated by itself ...

(BEEN_PART_AGREEMENT)


[style] ~283-~283: To form a complete sentence, be sure to include a subject.
Context: ...vel| String |None| The log level. Can beinfo/debug/warn/error. | | ...

(MISSING_IT_THERE)


[grammar] ~292-~292: Consider using either the past participle “recommended” or the present participle “recommending” here.
Context: ...For standalone mode, self_import is recommend to collect metrics generated by itself ...

(BEEN_PART_AGREEMENT)


[uncategorized] ~326-~326: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload.
For now, gRPC tls config does not ...

(AUTO_HYPHEN)


[uncategorized] ~326-~326: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...r now, gRPC tls config does not support auto reload. | | runtime | -- | -- | The runtime ...

(AUTO_HYPHEN)


[grammar] ~374-~374: “Google” is a proper noun and needs to be capitalized.
Context: ...e| String |None` | The scope of the google cloud storage.
**It's only used whe...

(A_GOOGLE)


[grammar] ~375-~375: “Google” is a proper noun and needs to be capitalized.
Context: ...g | None | The credential path of the google cloud storage.
**It's only used whe...

(A_GOOGLE)


[style] ~415-~415: To form a complete sentence, be sure to include a subject.
Context: ...vel| String |None| The log level. Can beinfo/debug/warn/error. | | ...

(MISSING_IT_THERE)


[grammar] ~424-~424: Consider using either the past participle “recommended” or the present participle “recommending” here.
Context: ...For standalone mode, self_import is recommend to collect metrics generated by itself ...

(BEEN_PART_AGREEMENT)

Markdownlint
config/config.md

5-5: Expected: 2; Actual: 4
Unordered list indentation

(MD007, ul-indent)


6-6: Expected: 2; Actual: 4
Unordered list indentation

(MD007, ul-indent)


7-7: Expected: 2; Actual: 4
Unordered list indentation

(MD007, ul-indent)


151-151: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


238-238: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


300-300: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


37-37: null
Bare URL used

(MD034, no-bare-urls)


185-185: null
Bare URL used

(MD034, no-bare-urls)


247-247: null
Bare URL used

(MD034, no-bare-urls)

Additional comments not posted (1)
config/config.md (1)

262-262: Updated Heartbeat Pause Configuration

The acceptable_heartbeat_pause setting was increased from 3000ms to 10000ms. This change should help in environments with variable network conditions but might delay the detection of node failures. It's crucial to monitor the effects of this change on the system's ability to detect and respond to failures in a timely manner.

src/common/meta/src/ddl.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl.rs Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e4f318e and 6e3bcb2.

Files selected for processing (4)
  • src/common/meta/src/ddl.rs (4 hunks)
  • src/common/meta/src/ddl/create_table.rs (1 hunks)
  • src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
Files skipped from review as they are similar to previous changes (3)
  • src/common/meta/src/ddl/create_table.rs
  • src/common/meta/src/ddl/drop_table/executor.rs
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs
Additional comments not posted (2)
src/common/meta/src/ddl.rs (2)

149-150: Addition of region_failure_detector_controller in DdlContext.

The addition of region_failure_detector_controller to DdlContext aligns with the PR's objective to enhance region failure detection. This change integrates the newly defined controller into the system context, allowing for broader access and management within DDL operations. Ensure that all usages of DdlContext correctly initialize this new field.


105-129: Review of the RegionFailureDetectorController interface and its NoopRegionFailureDetectorControl implementation.

  1. Interface Definition (Lines 107-117): The interface and its methods are well-defined for async operations. The method parameters use tuples, which align with previous suggestions by fengjiachun. However, detailed Rustdoc comments are needed to explain the parameters and expected behavior.
  2. Noop Implementation (Lines 119-128): The noop implementation is clear and adheres to the async trait. It correctly implements the interface with no operations, which is typical for a noop pattern. Ensure that this implementation is only used where appropriate, as it effectively disables failure detection.

: The method signatures match the suggestions by fengjiachun.

Would you like me to add detailed Rustdoc comments to these methods?

src/common/meta/src/ddl.rs Show resolved Hide resolved
@WenyXu
Copy link
Member Author

WenyXu commented Jun 28, 2024

@fengjiachun PTAL

@fengjiachun
Copy link
Collaborator

@coderabbitai Hi

Copy link
Contributor

coderabbitai bot commented Jun 28, 2024

Tip

For best results, initiate chat on the files or code changes.

@fengjiachun Hi! How can I assist you today?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6e3bcb2 and bf1f5ca.

Files selected for processing (1)
  • src/meta-srv/src/procedure/region_migration.rs (4 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/meta-srv/src/procedure/region_migration.rs

Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/common/meta/src/ddl.rs Show resolved Hide resolved
src/common/meta/src/ddl/create_table.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl/create_table.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl/drop_table/executor.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl/drop_table/executor.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between bf1f5ca and b240e93.

Files selected for processing (2)
  • src/common/meta/src/ddl/create_table.rs (1 hunks)
  • src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
Files skipped from review as they are similar to previous changes (2)
  • src/common/meta/src/ddl/create_table.rs
  • src/common/meta/src/ddl/drop_table/executor.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b240e93 and 94e24fe.

Files selected for processing (1)
  • tests-fuzz/targets/failover/fuzz_failover_mito_regions.rs (2 hunks)
Additional comments not posted (3)
tests-fuzz/targets/failover/fuzz_failover_mito_regions.rs (3)

267-267: Reduced parallelism in fuzz tests.

The parallelism setting has been reduced from 8 to 4. While this might stabilize the tests, consider verifying if it still effectively uncovers concurrency-related issues.


87-87: Reduced range for row generation in fuzz tests.

The maximum limit for rows has been halved from 4096 to 2048. This reduction could make the tests less stressful and more focused, but it's important to ensure that it still meets the testing requirements for edge cases.


89-89: Reduced range for insert generation in fuzz tests.

The maximum limit for inserts has been reduced from 16 to 8. This change might focus the testing on more typical scenarios, but ensure it still adequately tests the system's handling of multiple concurrent inserts.

src/common/meta/src/ddl.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl/create_table.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl/drop_table/executor.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl/create_table.rs Outdated Show resolved Hide resolved
src/meta-srv/src/region/failure_detector.rs Outdated Show resolved Hide resolved
src/meta-srv/src/region/failure_detector.rs Outdated Show resolved Hide resolved
src/meta-srv/src/region/supervisor.rs Outdated Show resolved Hide resolved
src/meta-srv/src/region/supervisor.rs Outdated Show resolved Hide resolved
src/meta-srv/src/region/supervisor.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another suggestion: can we add some tests to cover these cases?

@WenyXu
Copy link
Member Author

WenyXu commented Jul 1, 2024

Another suggestion: can we add some tests to cover these cases?

Yes, tests will be introduced in following PRs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 94e24fe and 62d1076.

Files selected for processing (8)
  • src/common/meta/src/ddl.rs (4 hunks)
  • src/common/meta/src/ddl/create_table.rs (2 hunks)
  • src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
  • src/common/meta/src/ddl/utils.rs (2 hunks)
  • src/meta-srv/src/procedure/region_migration.rs (5 hunks)
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
  • src/meta-srv/src/region/failure_detector.rs (5 hunks)
  • src/meta-srv/src/region/supervisor.rs (12 hunks)
Files skipped from review as they are similar to previous changes (7)
  • src/common/meta/src/ddl.rs
  • src/common/meta/src/ddl/create_table.rs
  • src/common/meta/src/ddl/drop_table/executor.rs
  • src/meta-srv/src/procedure/region_migration.rs
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs
  • src/meta-srv/src/region/failure_detector.rs
  • src/meta-srv/src/region/supervisor.rs
Additional comments not posted (1)
src/common/meta/src/ddl/utils.rs (1)

134-147: Validate the new utility function for converting region routes.

The new function convert_region_routes_to_detecting_regions correctly maps RegionRoute objects to DetectingRegion tuples. This is crucial for the new failure detection features. However, ensure that all callers handle the output correctly, especially regarding empty or null leader peers.

src/common/meta/src/ddl/utils.rs Show resolved Hide resolved
@WenyXu
Copy link
Member Author

WenyXu commented Jul 1, 2024

@killme2008 @MichaelScofield PTAL

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 62d1076 and 40b95f3.

Files selected for processing (2)
  • src/meta-srv/src/metasrv/builder.rs (4 hunks)
  • src/meta-srv/src/region/supervisor.rs (12 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/meta-srv/src/region/supervisor.rs
Additional comments not posted (2)
src/meta-srv/src/metasrv/builder.rs (2)

25-27: Approved import changes for region failure detection.

The newly imported NoopRegionFailureDetectorControl and RegionFailureDetectorControllerRef are crucial for the enhancements related to region failure detection as described in the PR. Ensure these are integrated and utilized correctly in the subsequent code.


290-311: Refine error handling and verify logical conditions for region failure detection setup.

The conditional logic for setting up the region failure detectors seems appropriate. However, consider using a more descriptive error type or message than UnexpectedSnafu to clarify the specific conditions under which the error occurs.

- return error::UnexpectedSnafu {
-     violated: "Region failover is not supported in the local WAL implementation!",
- }
- .fail();
+ return Err(MetasrvError::ConfigurationError {
+     message: "Region failover requires a remote WAL setup. Please configure your WAL settings appropriately.",
+ });

Ensure the conditions is_remote_wal and enable_region_failover are documented and clearly explained in the system configuration or setup documentation.

@WenyXu WenyXu force-pushed the feat/register-failure-detector branch from 40b24d5 to cef5b65 Compare July 1, 2024 04:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 40b95f3 and cef5b65.

Files selected for processing (5)
  • src/common/meta/src/ddl.rs (4 hunks)
  • src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
  • src/meta-srv/src/region/failure_detector.rs (4 hunks)
  • src/meta-srv/src/region/supervisor.rs (13 hunks)
Files skipped from review as they are similar to previous changes (3)
  • src/common/meta/src/ddl/drop_table/executor.rs
  • src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs
  • src/meta-srv/src/region/failure_detector.rs
Additional comments not posted (6)
src/common/meta/src/ddl.rs (3)

19-19: Imports are appropriate and well-organized.

These imports are necessary for the new functionality related to region failure detection and handling.

Also applies to: 33-33


151-153: Integration of failure detection into DDL operations is well executed.

The addition of region_failure_detector_controller in DdlContext struct allows for seamless integration of the new failure detection functionality.


160-164: Effective management of failure detectors through new methods in DdlContext.

The methods register_failure_detectors and deregister_failure_detectors are well-implemented, providing essential functionality for managing region failure detectors dynamically.

Also applies to: 170-174

src/meta-srv/src/region/supervisor.rs (3)

19-19: Imports are correctly updated to support new functionality.

The imports are necessary for the new event types and failure detector management functionality introduced in this file.


79-80: New event types for failure detectors are well-designed and implemented.

The addition of RegisterFailureDetectors and DeregisterFailureDetectors event types allows for effective asynchronous management of failure detectors, enhancing the system's resilience and adaptability.

Also applies to: 103-110


310-323: Effective event handling for failure detectors in RegionSupervisor.

The methods register_failure_detectors and deregister_failure_detectors are correctly implemented to handle respective events, ensuring robust management of region failure detectors.

src/common/meta/src/ddl.rs Show resolved Hide resolved
src/meta-srv/src/region/supervisor.rs Show resolved Hide resolved
Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WenyXu WenyXu enabled auto-merge July 1, 2024 05:57
@WenyXu WenyXu added this pull request to the merge queue Jul 1, 2024
Merged via the queue into GreptimeTeam:main with commit 6a634f8 Jul 1, 2024
60 checks passed
@WenyXu WenyXu deleted the feat/register-failure-detector branch July 1, 2024 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants