Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid populating _schema if the collection is empty in MongoDB connector #21236

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

build2create
Copy link

@build2create build2create commented Mar 25, 2024

Description

This PR is to fix the [issue][1]. Whenever a empty collection was created an entry was populated in default collection _schema. Further upon populating the collection with documents the fields in _schema remained empty. So a workaround there is a check introduced to avoid populating _schema if collection is empty.

Fixes #20972

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

Copy link

cla-bot bot commented Mar 25, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Mithilesh Wachasunder.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@@ -878,7 +878,7 @@ private List<Document> guessTableFields(String schemaName, String tableName)
{
MongoDatabase db = client.getDatabase(schemaName);
Document doc = db.getCollection(tableName).find().first();
if (doc == null) {
if (doc == null || doc.isEmpty()) {
Copy link
Member

@ebyhr ebyhr Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to TestMongoConnectorTest. We can use MongoClient for preparing an empty collection in MongoDB server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will do.

Copy link

cla-bot bot commented Apr 1, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Mithilesh Wachasunder.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@github-actions github-actions bot added release-notes docs tests:hive iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector bigquery BigQuery connector labels Apr 1, 2024
@build2create build2create marked this pull request as draft April 1, 2024 06:50
Copy link

cla-bot bot commented Apr 1, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Mithilesh Wachasunder.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Comment on lines +167 to +168
database.getCollection(tableName).insertOne(new Document(ImmutableMap.of("k", "v")));
MaterializedResult result = getQueryRunner().execute(getSession(), "SELECT * FROM mongodb.test." + tableName);
Copy link
Member

@ebyhr ebyhr Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add another SELECT statement before inserting a document. That's the original scenario we want to fix.

Also, this test doesn't reach doc.isEmpty() as far as I confirmed.

@build2create build2create marked this pull request as ready for review April 1, 2024 06:51
@Test
public void testEmptyCollection()
{
String tableName = "test_empty";
Copy link
Member

@ebyhr ebyhr Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please append suffix with randomNameSuffix method.

database.createCollection(tableName);
database.getCollection(tableName).insertOne(new Document(ImmutableMap.of("k", "v")));
MaterializedResult result = getQueryRunner().execute(getSession(), "SELECT * FROM mongodb.test." + tableName);
assertThat(result.getRowCount()).isEqualTo(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop the table.

@ebyhr
Copy link
Member

ebyhr commented Apr 4, 2024

@cla-bot check

Copy link

cla-bot bot commented Apr 4, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Mithilesh Wachasunder.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Apr 4, 2024

The cla-bot has been summoned, and re-checked this pull request!

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label May 10, 2024
@mosabua
Copy link
Member

mosabua commented May 10, 2024

👋 @build2create - are you still interested in progressing this PR? If yes, could you address the comments from @ebyhr and fix any build failures?

@github-actions github-actions bot removed the stale label May 13, 2024
Copy link

github-actions bot commented Jun 4, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jun 4, 2024
@build2create
Copy link
Author

are you still interested in progressing this PR? If yes, could you address the comments from @ebyhr and fix any build failures?

Yes sorry its been a while will resume on this. Based on the offline chat with @ebyhr seems like a different behaviour with Mongo Compass and Java Client so the test results are different.

@github-actions github-actions bot removed the stale label Jun 11, 2024
String tableName = "test_empty";
MongoDatabase database = client.getDatabase("test");
database.createCollection(tableName);
database.getCollection(tableName).insertOne(new Document(ImmutableMap.of("k", "v")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#20972 issue can be reproduced once you replace the data with new Document().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery BigQuery connector delta-lake Delta Lake connector docs hive Hive connector iceberg Iceberg connector mongodb MongoDB connector release-notes
Development

Successfully merging this pull request may close these issues.

Don't populate _schema if the collection is empty in MongoDB connector
4 participants