You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A clear and concise description of the feature you're requesting.
Add parameters:
batch-size an optional integer that will be used to iteratively slice and run the pipeline on that many events at a time. I.e. if the gather for the specified time range finds 50 events but the batch size is 10, the pipeline will run 5 independent times each with 10 events to process.
skip-errored-events-during-processing that will ignore events that raise an error during processing. Enough debug info should be gathered / kept that the log printed out after the pipeline finishes contains the event details and "the thing that errored".
skip-errored-events-during-gather that will ignore events that fail to scrape / gather. Similar to the above parameter, enough debug info should be printed after scraping. "Found 20 events, skipping 2 due to errors" for example.
Also would be really interesting to see if I can allow certain errors. retry-errors=[ConnectionError]
Use Case
Please provide a use case to help us understand your request in context.
I am backfilling a lot of data for certain instances and it is becoming annoying to process week by week. This is generally required for a couple of reasons:
storage space on machine (GHA runners only have 16 GB of disk so can't download and process more than ~4 meeting videos at a time) -- hence batch size
there are errors in less than 1% of events that aren't random connection errors. These are things like the video page being parsed incorrectly and such.
The text was updated successfully, but these errors were encountered:
Feature Description
A clear and concise description of the feature you're requesting.
Add parameters:
batch-size
an optional integer that will be used to iteratively slice and run the pipeline on that many events at a time. I.e. if the gather for the specified time range finds 50 events but the batch size is 10, the pipeline will run 5 independent times each with 10 events to process.skip-errored-events-during-processing
that will ignore events that raise an error during processing. Enough debug info should be gathered / kept that the log printed out after the pipeline finishes contains the event details and "the thing that errored".skip-errored-events-during-gather
that will ignore events that fail to scrape / gather. Similar to the above parameter, enough debug info should be printed after scraping. "Found 20 events, skipping 2 due to errors" for example.Also would be really interesting to see if I can allow certain errors.
retry-errors=[ConnectionError]
Use Case
Please provide a use case to help us understand your request in context.
I am backfilling a lot of data for certain instances and it is becoming annoying to process week by week. This is generally required for a couple of reasons:
The text was updated successfully, but these errors were encountered: