Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] MixedClusterClientYamlTestSuiteIT test {p0=indices.stats/13_fields/Fielddata fields - all metric} failing #96711

Open
idegtiarenko opened this issue Jun 8, 2023 · 12 comments
Assignees
Labels
:Data Management/Stats Statistics tracking and retrieval APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI

Comments

@idegtiarenko
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/zlgx2yuikqnp6/tests/:qa:mixed-cluster:v8.4.0%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=indices.stats%2F13_fields%2FFielddata%20fields%20-%20all%20metric%7D

Reproduction line:

./gradlew ':qa:mixed-cluster:v8.4.0#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=indices.stats/13_fields/Fielddata fields - all metric}" -Dtests.seed=F8C1BF871B7A72B7 -Dtests.bwc=true -Dtests.locale=tr -Dtests.timezone=Asia/Hong_Kong -Druntime.java=20

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT&tests.test=test%20%7Bp0%3Dindices.stats/13_fields/Fielddata%20fields%20-%20all%20metric%7D

Failure excerpt:

java.lang.AssertionError: Failure at [indices.stats/13_fields:232]: value of [_all.total.fielddata.fields.bar.memory_size_in_bytes] is not comparable (got [null])
Expected: an instance of java.lang.Comparable
     but: null

  at __randomizedtesting.SeedInfo.seed([F8C1BF871B7A72B7:7095805DB5861F4F]:0)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:572)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:524)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  at java.lang.reflect.Method.invoke(Method.java:578)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1623)

  Caused by: java.lang.AssertionError: value of [_all.total.fielddata.fields.bar.memory_size_in_bytes] is not comparable (got [null])
  Expected: an instance of java.lang.Comparable
       but: null

    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
    at org.junit.Assert.assertThat(Assert.java:956)
    at org.elasticsearch.test.rest.yaml.section.GreaterThanAssertion.doAssert(GreaterThanAssertion.java:50)
    at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:65)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:552)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:524)
    at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
    at java.lang.reflect.Method.invoke(Method.java:578)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
    at java.lang.Thread.run(Thread.java:1623)

@idegtiarenko idegtiarenko added :Data Management/Stats Statistics tracking and retrieval APIs >test-failure Triaged test failures from CI labels Jun 8, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jun 8, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@masseyke
Copy link
Member

masseyke commented Jun 8, 2023

The fielddata fields are definitely not in the result. For example we get:

            "fielddata" : {
              "memory_size_in_bytes" : 0,
              "evictions" : 0,
              "fields" : { },
              "global_ordinals" : {
                "build_time_in_millis" : 0
              }
            },

instead of something more like:

          "fielddata" : {
            "memory_size_in_bytes" : 584,
            "evictions" : 0,
            "fields" : {
              "bar" : {
                "memory_size_in_bytes" : 292
              }
            },
            "global_ordinals" : {
              "build_time_in_millis" : 0
            }
          },

So far I haven't been able to reproduce it though.

@masseyke
Copy link
Member

masseyke commented Jun 8, 2023

Also, I don't see it in the gradle history from June 1, 2022 through April 2, 2023. But I see it 7 times from April 3 to now.

@dakrone dakrone added low-risk An open issue or test failure that is a low risk to future releases medium-risk An open issue or test failure that is a medium risk to future releases and removed low-risk An open issue or test failure that is a low risk to future releases labels Oct 12, 2023
@bpintea
Copy link
Contributor

bpintea commented Dec 14, 2023

One more similar failure https://gradle-enterprise.elastic.co/s/y7rbb75mfijvy

@kingherc
Copy link
Contributor

@gmarouli
Copy link
Contributor

Every test case firsts asserts there are no shards failures, which makes sense, if a shard does not respond then we are probably missing stats. What I find weird in this test response is the following:

      "_shards" : {	
        "total" : 4,	
        "successful" : 3,	
        "failed" : 0	
      }

So, there 4 total shards but only 3 were successful. I am following up to see when does this happen.

@gmarouli
Copy link
Contributor

gmarouli commented Mar 29, 2024

When running it locally I often get the difference between the successful number of shards and the total, so probably it's not related.

My guess is that not all replicas are allocated this is why this is happening, if I add wait for green status the difference in replicas stops.

Update
Looking into previous commits, waiting for green status has been removed intentionally because if we run in a single node cluster the replicas are never allocated.

@gmarouli
Copy link
Contributor

Another (potentially irrelevant) observation:

These failures happen on bwc tests with versions before 8.5:

Screenshot 2024-03-29 at 14 34 55

So potentially, another idea is to look if something was "fixed" then.

@gmarouli
Copy link
Contributor

I have found nothing useful in the logs so far, apart from cluster instability in https://gradle-enterprise.elastic.co/s/loyijho4swg4w/tests/task/:qa:mixed-cluster:v7.17.19%23mixedClusterTest/details/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0%3Dindices.stats%2F13_fields%2FFielddata%20fields%20-%20pattern%7D?top-execution=1.

It appears during this test only the master node was available but this did not change during the test but much earlier. On the other hand, during this test it manages to allocate 3 shards, which is confusing since there is only one node available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

9 participants