Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-35232] Add retry settings for GCS connector #24753

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[FLINK-35232] Add retry settings for GCS connector
  • Loading branch information
JTaky committed May 3, 2024
commit 07f00e1780b272d42f2d0d0a6df17925f5e08f5e
17 changes: 11 additions & 6 deletions docs/content.zh/docs/deployment/filesystems/gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,18 @@ You can also set `gcs-connector` options directly in the Hadoop `core-site.xml`

`flink-gs-fs-hadoop` can also be configured by setting the following options in [Flink configuration file]({{< ref "docs/deployment/config#flink-配置文件" >}}):

| Key | Description |
|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Key | Description |
|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| gs.writer.temporary.bucket.name | Set this property to choose a bucket to hold temporary blobs for in-progress writes via `RecoverableWriter`. If this property is not set, temporary blobs will be written to same bucket as the final file being written. In either case, temporary blobs are written with the prefix `.inprogress/`. <br><br> It is recommended to choose a separate bucket in order to [assign it a TTL](https://cloud.google.com/storage/docs/lifecycle), to provide a mechanism to clean up orphaned blobs that can occur when restoring from check/savepoints.<br><br>If you do use a separate bucket with a TTL for temporary blobs, attempts to restart jobs from check/savepoints after the TTL interval expires may fail.
| gs.writer.chunk.size | Set this property to [set the chunk size](https://cloud.google.com/java/docs/reference/google-cloud-core/latest/com.google.cloud.WriteChannel#com_google_cloud_WriteChannel_setChunkSize_int_) for writes via `RecoverableWriter`. <br><br>If not set, a Google-determined default chunk size will be used. |
| gs.filesink.entropy.enabled | Set this property to improve performance due to hotspotting issues on GCS. This option defines whether to enable entropy injection in filesink gcs path. If this is enabled, entropy in the form of temporary object id will be injected in beginning of the gcs path of the temporary objects. The final object path remains unchanged. |
| gs.http.connect-timeout | Set this property to [set the connection timeout](https://cloud.google.com/java/docs/reference/google-cloud-core/latest/com.google.cloud.http.HttpTransportOptions.Builder#com_google_cloud_http_HttpTransportOptions_Builder_setConnectTimeout_int_) for java-storage client. |
| gs.http.read-timeout | Set this property to [set the content read timeout](https://cloud.google.com/java/docs/reference/google-cloud-core/latest/com.google.cloud.http.HttpTransportOptions.Builder#com_google_cloud_http_HttpTransportOptions_Builder_setReadTimeout_int_) from connection established via java-storage client. |
| gs.writer.chunk.size | Set this property to [set the chunk size](https://cloud.google.com/java/docs/reference/google-cloud-core/latest/com.google.cloud.WriteChannel#com_google_cloud_WriteChannel_setChunkSize_int_) for writes via `RecoverableWriter`. <br><br>If not set, a Google-determined default chunk size will be used. |
| gs.filesink.entropy.enabled | Set this property to improve performance due to hotspotting issues on GCS. This option defines whether to enable entropy injection in filesink gcs path. If this is enabled, entropy in the form of temporary object id will be injected in beginning of the gcs path of the temporary objects. The final object path remains unchanged. |
| gs.http.connect-timeout | Set this property to [set the connection timeout](https://cloud.google.com/java/docs/reference/google-cloud-core/latest/com.google.cloud.http.HttpTransportOptions.Builder#com_google_cloud_http_HttpTransportOptions_Builder_setConnectTimeout_int_) for java-storage client. |
| gs.http.read-timeout | Set this property to [set the content read timeout](https://cloud.google.com/java/docs/reference/google-cloud-core/latest/com.google.cloud.http.HttpTransportOptions.Builder#com_google_cloud_http_HttpTransportOptions_Builder_setReadTimeout_int_) from connection established via java-storage client. |
| gs.retry.max-attempt | Set this property to [define the maximum number of retry attempts](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getMaxAttempts__) to perform. |
| gs.retry.init-rpc-timeout | Set this property to [set the timeout](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getInitialRpcTimeout__) for the initial RPC. Subsequent calls will use this value adjusted according to the gs.retry.rpc-timeout-multiplier. |
| gs.retry.rpc-timeout-multiplier | Set this property to [controls the change in RPC timeout](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getRpcTimeoutMultiplier__). The timeout of the previous call is multiplied by the RpcTimeoutMultiplier to calculate the timeout for the next call. |
| gs.retry.max-rpc-timeout | Set this property to [put a limit](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getMaxRpcTimeout__) on the value of the RPC timeout, so that the max rpc timeout can't increase the RPC timeout higher than this amount. |
| gs.retry.total-timeout | Set this property to [change the total duration](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getTotalTimeout__) during which retries could be attempted. |

### Authentication to access GCS

Expand Down
5 changes: 5 additions & 0 deletions docs/content/docs/deployment/filesystems/gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,11 @@ You can also set `gcs-connector` options directly in the Hadoop `core-site.xml`
| gs.filesink.entropy.enabled | Set this property to improve performance due to hotspotting issues on GCS. This option defines whether to enable entropy injection in filesink gcs path. If this is enabled, entropy in the form of temporary object id will be injected in beginning of the gcs path of the temporary objects. The final object path remains unchanged. |
| gs.http.connect-timeout | Set this property to [set the connection timeout](https://cloud.google.com/java/docs/reference/google-cloud-core/latest/com.google.cloud.http.HttpTransportOptions.Builder#com_google_cloud_http_HttpTransportOptions_Builder_setConnectTimeout_int_) for java-storage client. |
| gs.http.read-timeout | Set this property to [set the content read timeout](https://cloud.google.com/java/docs/reference/google-cloud-core/latest/com.google.cloud.http.HttpTransportOptions.Builder#com_google_cloud_http_HttpTransportOptions_Builder_setReadTimeout_int_) from connection established via java-storage client. |
| gs.retry.max-attempt | Set this property to [define the maximum number of retry attempts](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getMaxAttempts__) to perform. |
| gs.retry.init-rpc-timeout | Set this property to [set the timeout](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getInitialRpcTimeout__) for the initial RPC. Subsequent calls will use this value adjusted according to the gs.retry.rpc-timeout-multiplier. |
| gs.retry.rpc-timeout-multiplier | Set this property to [controls the change in RPC timeout](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getRpcTimeoutMultiplier__). The timeout of the previous call is multiplied by the RpcTimeoutMultiplier to calculate the timeout for the next call. |
| gs.retry.max-rpc-timeout | Set this property to [put a limit](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getMaxRpcTimeout__) on the value of the RPC timeout, so that the max rpc timeout can't increase the RPC timeout higher than this amount. |
| gs.retry.total-timeout | Set this property to [change the total duration](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#com_google_api_gax_retrying_RetrySettings_getTotalTimeout__) during which retries could be attempted. |

### Authentication to access GCS

Expand Down
Loading