-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcs-pubsub pr #1658
base: main
Are you sure you want to change the base?
gcs-pubsub pr #1658
Conversation
Pardon my mediocre diagram, but this is the scenario I am using to test an MVP. Read it from the bottom up, wherein logs are generated and flushed periodically as NLD-JSON files to a GCS bucket. This triggers a bucket event notification on to a Pub/Sub topic and then dropped on a subscription Benthos watches. Benthos then reads event notification information from the message to determine the bucket and object name and the input then proceeds to download the referenced file. See this repository for Pulumi code which builds the aforementioned scenario: https://github.com/mhite/benthos-gcp-dev/tree/main/input_cloud_storage |
Thank you for working on this, I have exactly this use case and was thinking about an implementation, glad you got to it first. |
@mhite is this already functional? What parts are still in progress, is it only the TODO items in the comments? I'm testing on a local fork on my machine and I think you can do the above like this:
|
Yes, I've noted stuff that I don't understand/want to discuss/need to fix in the TODO items. It does run and "works" but is not merge worthy yet due to some of the outstanding issues I've noted.
Yes, would take a similar approach, although you won't find |
Maybe it makes sense to check for What do you think about a config option to point to a cache resource to de-duplicate on gcs_key? At work we built an event-based log ingestion pipeline and one of the pain points was handling when a production system uploads the same log file twice. I think it would make sense to de-duplicate based on the pubsub message content before the object is even retrieved. |
Well, the
Oh, that's an interesting idea. But it should probably be captured in a different issue/PR for tracking / simplicity sake. |
I think I have an approach that addresses the problematic gcsfuse behavior[1]. By checking the object generation before download, we can avoid the race condition of potentially consuming the non-zero byte version of a file twice. Additionally, we will refuse to process 0 byte files [if the user has json object payload enabled for the notification config]. I'll do a bit of testing this weekend and update the PR. |
@loicalleyne - want to give this latest version a test? |
@mhite ran a short smoke test, everything appeared to work ok. |
@mhite is this PR still a WIP? |
@loicalleyne - Should be ready for a review. I will update PR title. At some point I want to add functionality that allows a user to choose (by configuration) to acknowledge (ie. delete) bad/unsupported Pub/Sub messages. For now, you very much need to have a dead letter queue setup. |
This looks like a good start. I did a bit of search and it looks like there’s an official PubSub emulator that can be run via gcloud to add integration tests. Annoyingly, in Google fashion, they don’t offer a Docker container for this, but I see there’s a maintained 3rd party one over at https://hub.docker.com/r/thekevjames/gcloud-pubsub-emulator. I’ll defer to someone else for a proper review, since I’m not very familiar with PubSub |
@mihaitodor - There is unfortunately no official Cloud Storage emulator which also makes things tricky. I did come across this, though - https://github.com/fsouza/fake-gcs-server I think it might also support publishing cloud storage trigger notifications to Pub/Sub, so perhaps it could all be wired up for something simulating end-to-end environment. |
@mhite did you ever make this as a plugin? |
@loicalleyne - The PR should work but needs merge conflicts resolved along with integration tests authored. |
A work in progress MVP for #1034