Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test]: BigtableIOWriteTest::test_bigtable_write is about 50% flaky #30927

Closed
2 of 16 tasks
tvalentyn opened this issue Apr 10, 2024 · 3 comments
Closed
2 of 16 tasks

Comments

@tvalentyn
Copy link
Contributor

tvalentyn commented Apr 10, 2024

What happened?

From #30867 (comment) and below, it appears that BigTable client initialization sometimes gets stuck and holding GIL indefinitely in:

Traceback for thread 100 (python) [Has the GIL] (most recent call last):
    (Python) File "/usr/local/lib/python3.8/threading.py", line 890, in _bootstrap
        self._bootstrap_inner()
    (Python) File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
    (Python) File "/usr/local/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
    (Python) File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 80, in _worker
        work_item.run()
    (Python) File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
        result = self.fn(*self.args, **self.kwargs)
    (Python) File "/usr/local/lib/python3.8/site-packages/google/cloud/bigtable/batcher.py", line 385, in _flush_rows
        response = self.table.mutate_rows(rows_to_flush)
    (Python) File "/usr/local/lib/python3.8/site-packages/google/cloud/bigtable/table.py", line 724, in mutate_rows
        self.name,
    (Python) File "/usr/local/lib/python3.8/site-packages/google/cloud/bigtable/table.py", line 160, in name
        table_client = self._instance._client.table_data_client
    (Python) File "/usr/local/lib/python3.8/site-packages/google/cloud/bigtable/client.py", line 332, in table_data_client
        transport = self._create_gapic_client_channel(
    (Python) File "/usr/local/lib/python3.8/site-packages/google/cloud/bigtable/client.py", line 285, in _create_gapic_client_channel
        channel = grpc_transport.create_channel(
    (Python) File "/usr/local/lib/python3.8/site-packages/google/cloud/bigtable_v2/services/bigtable/transports/grpc.py", line 217, in create_channel
        return grpc_helpers.create_channel(
    (Python) File "/usr/local/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 386, in create_channel
        return grpc.secure_channel(
    (Python) File "/usr/local/lib/python3.8/site-packages/grpc/__init__.py", line 2119, in secure_channel
        return _channel.Channel(
    (Python) File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 2046, in __init__
        self._channel = cygrpc.Channel(

Repro:

python -m pytest  -o log_cli=True -o log_level=Info apache_beam/examples/cookbook/bigtableio_it_test.py::BigtableIOWriteTest::test_bigtable_write   --test-pipeline-options='--runner=TestDataflowRunner --project=apache-beam-testing --temp_location=valentyn-testing  --region=us-central1 --wait_until_finish_duration=36000000' --timeout=36000

I wasn't able to repro the error locally outside of Dataflow context.

cc: @mutianf FYI given that BT client is involved. It might be a wider issue, as discussed in #30867 and grpc/grpc#36256.

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 2 (backlog / disabled test but we think the product is healthy)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@liferoad
Copy link
Collaborator

@mutianf Can you check this?

@tvalentyn
Copy link
Contributor Author

gdb backtrace of the stuck that thread holds up GIL: grpc/grpc#36256 (comment)

damccorm pushed a commit that referenced this issue Apr 17, 2024
* Exclude currently available GAPIC versions affected by a GRPC regression

* Regenerate dependencies for Python containers.
@tvalentyn
Copy link
Contributor Author

fixed in #31044

@github-actions github-actions bot added this to the 2.57.0 Release milestone Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants