Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: region_peers returns same region_id for multi logical tables #4190

Merged
merged 4 commits into from
Jun 24, 2024

Conversation

realtaobo
Copy link
Contributor

@realtaobo realtaobo commented Jun 23, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

Resolve #4157

What's changed and what's your intention?

as title.
image

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • New Features

    • Introduced new tables in the INFORMATION_SCHEMA database for storing and managing region peer information.
    • Added support for querying region peer data, including distinct counts of region IDs.
  • Bug Fixes

    • Enhanced the method for adding region peers to handle additional parameters for better data management.
  • Tests

    • Included comprehensive test cases covering the creation, querying, and deletion of region peer tables in both distributed and standalone environments.

@realtaobo realtaobo requested a review from a team as a code owner June 23, 2024 17:03
Copy link
Contributor

coderabbitai bot commented Jun 23, 2024

Walkthrough

The changes optimize how region and peer information is collected and stored in the information_schema.region_peers table of the INFORMATION_SCHEMA database. Key modifications include adjustments to the add_region_peers method to accept a table_id parameter and the introduction of new SQL test files illustrating the creation, querying, and dropping of region peer-related tables.

Changes

File/Directory Path Change Summary
src/catalog/src/information_schema/region_peers.rs Modified the add_region_peers method to accept a table_id parameter and adjusted its logic for handling region and peer information.
.../distributed/information_schema/region_peers.result Introduced the creation of multiple tables related to region peers and included a sample SELECT query on the region_peers table and subsequent table drops.
.../standalone/information_schema/region_peers.result Included the creation of several region peer-related tables, data selection, and table drops before switching back to the public schema.
.../distributed/information_schema/region_peers.sql Added definitions for multiple tables in the INFORMATION_SCHEMA database related to region peers, partitioning logic, and a SELECT query on region peers.
.../standalone/information_schema/region_peers.sql Defined the structure of region peer-related tables and included partitioning, data selection logic, and table drops in the INFORMATION_SCHEMA database.
.../standalone/common/information_schema/region_peers.result Added logic for creating/manipulating tables in the information_schema database and a query for counting distinct region_id values, followed by table drops.
.../standalone/common/information_schema/region_peers.sql Introduced table definitions related to region peers in the INFORMATION_SCHEMA database, including a count query and table dropping logic.

Assessment against linked issues

Objective Addressed Explanation
information_schema.region_peers returns same region_id multiple times (#4157)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jun 23, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 5566dd7 and d12200f.

Files selected for processing (1)
  • src/catalog/src/information_schema/region_peers.rs (2 hunks)
Additional comments not posted (2)
src/catalog/src/information_schema/region_peers.rs (2)

34-34: Updated import statement to include RegionId.

The inclusion of RegionId in the import statement supports the changes in the add_region_peers method, which now utilizes RegionId to ensure unique identifiers for different logical tables.


217-222: Refactored add_region_peers to accept table_id.

The addition of the table_id parameter is a significant and necessary change to ensure unique region_id values across different logical tables. This directly addresses the bug reported in issue #4157. Consider adding a comment here explaining how table_id is used to generate unique region_ids.

Copy link

codecov bot commented Jun 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.85%. Comparing base (07cbaba) to head (c4f26cc).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4190      +/-   ##
==========================================
- Coverage   85.11%   84.85%   -0.27%     
==========================================
  Files        1028     1028              
  Lines      180903   180908       +5     
==========================================
- Hits       153970   153503     -467     
- Misses      26933    27405     +472     

Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fengjiachun
Copy link
Collaborator

PTAL @WenyXu

@WenyXu
Copy link
Member

WenyXu commented Jun 24, 2024

@realtaobo, that is nicely done. Would you like to add a sqlness test for this case?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d12200f and 69d6335.

Files selected for processing (4)
  • tests/cases/distributed/information_schema/region_peers.result (1 hunks)
  • tests/cases/distributed/information_schema/region_peers.sql (1 hunks)
  • tests/cases/standalone/information_schema/region_peers.result (1 hunks)
  • tests/cases/standalone/information_schema/region_peers.sql (1 hunks)
Additional comments not posted (9)
tests/cases/standalone/information_schema/region_peers.sql (2)

4-26: Ensure proper configuration for metric tables and partitioning.

The tables are configured with the metric engine, which is suitable for time-series data. The partitioning logic in region_peers_test appears well-thought-out for scalability. However, ensure that the partition ranges (a < 10, a >= 10 AND a < 20, a >= 20) cover all potential values without overlaps or gaps.


32-32: Proper schema context switch.

Switching back to the public schema at the end of the script is a good practice to maintain a clean testing environment.

tests/cases/distributed/information_schema/region_peers.sql (3)

28-28: Optimized SELECT statement for distributed environments.

Specifying columns (region_id, is_leader, status, down_seconds) instead of using * is a good practice, especially in distributed environments where minimizing data transfer is crucial.


32-32: Consistent schema context switching across test environments.

Maintaining consistency in switching back to the public schema at the end of the script is crucial for ensuring a clean state across different test environments.


4-26: Check distributed-specific configurations for metric tables.

The CREATE TABLE statements are identical to the standalone version. Ensure that any distributed-specific configurations or optimizations are considered and applied if necessary.

Verification successful

Check distributed-specific configurations for metric tables.

The CREATE TABLE statements in the distributed environment are consistent with those in the standalone environment. The tables are configured with engine = metric and PARTITION ON COLUMNS, which are typical for distributed environments. However, ensure that any additional distributed-specific optimizations or configurations are considered and applied if necessary.

  • region_peers_phy: engine = metric with ("physical_metric_table" = "")
  • region_peers_t1: engine = metric with ("on_physical_table" = "region_peers_phy")
  • region_peers_t2: engine = metric with ("on_physical_table" = "region_peers_phy")
  • region_peers_test: PARTITION ON COLUMNS (a)
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Check for distributed-specific configurations in CREATE TABLE statements.

# Test: Search for distributed-specific configurations in SQL files.
rg --type sql $'CREATE TABLE' --context 10

Length of output: 201978

tests/cases/distributed/information_schema/region_peers.result (2)

40-49: Correct expected results reflecting unique region_id.

The expected results correctly show unique region_id values, which is the main objective of the PR to address the issue of duplicate region_ids in different logical tables.


2-57: Good use of affected rows messages for clarity in test results.

Including messages about affected rows after each operation provides clear feedback on the impact of each SQL statement, which is beneficial for debugging and understanding the flow of the test.

tests/cases/standalone/information_schema/region_peers.result (2)

40-49: Correct expected results reflecting unique region_id.

The expected results correctly show unique region_id values, which is the main objective of the PR to address the issue of duplicate region_ids in different logical tables.


2-57: Consistent use of affected rows messages across test environments.

The inclusion of messages about affected rows after each operation is consistent with the distributed environment tests and provides clear feedback, which is beneficial for debugging and understanding the flow of the test.

tests/cases/standalone/information_schema/region_peers.sql Outdated Show resolved Hide resolved
@WenyXu WenyXu enabled auto-merge June 24, 2024 14:02
@WenyXu
Copy link
Member

WenyXu commented Jun 24, 2024

Thanks!

@WenyXu WenyXu added this pull request to the merge queue Jun 24, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 69d6335 and c4f26cc.

Files selected for processing (2)
  • tests/cases/standalone/common/information_schema/region_peers.result (1 hunks)
  • tests/cases/standalone/common/information_schema/region_peers.sql (1 hunks)
Additional comments not posted (5)
tests/cases/standalone/common/information_schema/region_peers.sql (4)

32-32: Validate context switch to public schema.

Switching back to the public schema at the end of the script is a good practice to reset the testing environment. Confirm that this switch effectively resets the session to avoid any unintended side effects in other tests.


4-10: Verify table creation and partitioning logic.

The SQL script correctly creates the tables region_peers_phy, region_peers_t1, and region_peers_t2 with appropriate engine settings and primary keys. The partitioning logic for region_peers_test is well-defined. Ensure that the partitioning ranges meet the expected use cases and data distribution.

Also applies to: 12-16, 18-26

Verification successful

Verify the partitioning logic in the region_peers_test table.

The partitioning logic for the region_peers_test table needs to be verified to ensure it meets the expected use cases and data distribution. The search results indicate the presence of partitioning logic in the region_peers.sql file. Let's extract the relevant partitioning details from this file.


Verified partitioning logic in the region_peers_test table.

The partitioning logic for the region_peers_test table is correctly defined with appropriate ranges for column a.

  • a < 10
  • a >= 10 AND a < 20
  • a >= 20

This meets the expected use cases and data distribution requirements.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the partitioning logic in the `region_peers_test` table.

# Test: Search for the partitioning logic in the SQL files.
rg --type sql $'PARTITION ON COLUMNS'

Length of output: 797



Script:

#!/bin/bash
# Extract the partitioning logic from the `region_peers.sql` file.
rg --type sql --context 5 'PARTITION ON COLUMNS' tests/cases/standalone/common/information_schema/region_peers.sql

Length of output: 298


28-28: Ensure accuracy of the SELECT query.

The query aims to count distinct region_id values from the region_peers table. This is crucial for verifying that the issue #4157 is resolved. Consider adding more comprehensive checks to ensure the correctness of the returned data.


30-30: Check the cleanup process.

The script drops all created tables at the end, which is good for cleanup. Ensure that no residual data or metadata persists after the tables are dropped, which could affect subsequent tests.

tests/cases/standalone/common/information_schema/region_peers.result (1)

4-4: Check consistency of affected rows messages.

The result file consistently reports 'Affected Rows: 0' for each table operation. Verify that these messages accurately reflect the operations performed, especially considering the creation and dropping of tables should affect rows.

Also applies to: 8-8, 16-16, 24-24, 36-36, 48-48, 52-52

Verification successful

Check consistency of affected rows messages.

The 'Affected Rows: 0' messages in the result file accurately reflect the operations performed in the region_peers.sql script. The creation and dropping of tables typically do not affect rows, and the SELECT query returns a count without affecting rows.

  • CREATE TABLE operations: Expected affected rows = 0
  • SELECT COUNT query: Expected affected rows = 0
  • DROP TABLE operations: Expected affected rows = 0

Thus, the 'Affected Rows: 0' messages are consistent with the SQL operations performed.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the affected rows messages in the result file.

# Test: Check the logs or output directly after running the SQL script to ensure it matches the expected 'Affected Rows: 0'.
echo "Check your database logs or direct output for consistency with affected rows."

Length of output: 164



Script:

#!/bin/bash
# Locate the SQL script associated with the result file.
fd -e sql region_peers

Length of output: 90



Script:

#!/bin/bash
# Display the contents of the SQL script to inspect the operations performed.
cat tests/cases/standalone/common/information_schema/region_peers.sql

Length of output: 938

Merged via the queue into GreptimeTeam:main with commit 77904ad Jun 24, 2024
51 checks passed
@realtaobo realtaobo deleted the region-peers branch June 24, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

information_schema.region_peers returns same region_id multiple times
3 participants