-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip deleting files that we just deleted #2185
Conversation
|
We see this happening with Swift. Because the consistency of swift is eventual, swift sometimes didn't process the deletion of the meta file yet, and so it turns up in the bkt.Iter(). The second deletion then causes a 404 and compaction fails. Signed-off-by: Wim Fournier <[email protected]>
Wow, this eventual consistency is really bad... (: But good to know! @khyatisoneji do you mind taking a look at this PR, how we can fix this? You have quite good experience with working with eventual consistency systems so far ❤️ ! Looks like yet another reason to finish this: https://thanos.io/proposals/201901-read-write-operations-bucket.md/ |
However still I can't see the problem. Even with inconsistency Iter should not repeat same file 🤔 Is it really the case now? |
The swift logs tell us so, I'll ask our cloud team for some logs to add to this PR. |
Signed-off-by: Wim Fournier <[email protected]>
The problem is the code first starts with deleting metadata.json outside of the iter loop in "Delete()" Note that Amazon S3 will have the exact same issue, from the amazon docs here :
|
(stripped) logs from swift:
|
I think having this extra check would be good since we are deleting the |
I mean... to me it looks like we want to actually use Delayed Delete always? @khyatisoneji |
Just for clarification, by delayed delete do you mean that we don't straightaway delete the block right? This case is about So this particular concern is not about deleting the blocks after some delay. Even if we delete the block after some delay, this scenario can happen |
Signed-off-by: Wim Fournier <[email protected]>
Signed-off-by: Wim Fournier <[email protected]>
Just tested this PR in our integration env, and where we had errors almost every hour, they're now gone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for debugging this!
I explained why the current fix is a bit unsafe and may introduce bugs someday below, and gave suggestion on how to avoid it.
Hope that makes sense (:
…on that allows to keep certain files. Signed-off-by: Wim Fournier <[email protected]>
Signed-off-by: Wim Fournier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Small nit only 👍
LGTM
pkg/block/block.go
Outdated
return deleteDirRec(ctx, logger, bkt, name, keep) | ||
} | ||
if keep(name) { | ||
level.Debug(logger).Log("msg", "skipping deletion of object, as requested by keep()", "file", name, "bucket", bkt.Name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works as intended so I would prefer no log line even.
pkg/block/block.go
Outdated
} | ||
|
||
// deleteDir removes all objects prefixed with dir from the bucket. | ||
// deleteDirRec removes all objects prefixed with dir from the bucket. It skips objects that return true for the passed keep func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// deleteDirRec removes all objects prefixed with dir from the bucket. It skips objects that return true for the passed keep func | |
// deleteDirRec removes all objects prefixed with dir from the bucket. It skips objects that return true for the passed keep function. |
pkg/block/block.go
Outdated
@@ -145,16 +145,23 @@ func Delete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, id ulid | |||
level.Debug(logger).Log("msg", "deleted file", "file", metaFile, "bucket", bkt.Name()) | |||
} | |||
|
|||
return deleteDir(ctx, logger, bkt, id.String()) | |||
// Delete the bucket, but skip the metaFile, if found. As we just deleted that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Delete the bucket, but skip the metaFile, if found. As we just deleted that. | |
// Delete the bucket, but skip the metaFile as we just deleted that. This is required for eventual object storages (list after write). |
Signed-off-by: Wim Fournier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Thanks for fixing this 💪
... and sorry for being so strict here. I hope you felt supported and I am sure next PRs will be smoother! (:
* Removed dependency on Cortex fork; Moved to official one. (#2199) Signed-off-by: Bartlomiej Plotka <[email protected]> * Typo corrections quick-tutorial.md (#2196) * Corrected all Prometheus possessives to read `Prometheus's`, this matches Prometheus's own documentation. * Corrected `simple` to `simply` when describing compactor scanning behaviour Signed-off-by: Peter Avdjian <[email protected]> * tracing: Simplified creation of spans. (#2202) Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed links to dashboards json files. (#2203) Signed-off-by: Roman Grytskiv <[email protected]> * Skip deleting files that we just deleted (#2185) * Skip deleting files that we just deleted We see this happening with Swift. Because the consistency of swift is eventual, swift sometimes didn't process the deletion of the meta file yet, and so it turns up in the bkt.Iter(). The second deletion then causes a 404 and compaction fails. Signed-off-by: Wim Fournier <[email protected]> * return, as this is a func. Add debug log and comment Signed-off-by: Wim Fournier <[email protected]> * fixing build: wrong parameter name Signed-off-by: Wim Fournier <[email protected]> * fix lint Signed-off-by: Wim Fournier <[email protected]> * Refactor deleteDir into deleteDirRec and add a parameter for a function that allows to keep certain files. Signed-off-by: Wim Fournier <[email protected]> * Fix lint Signed-off-by: Wim Fournier <[email protected]> * implementing suggested fixes Signed-off-by: Wim Fournier <[email protected]> * improve web.route-prefix handling (#2208) This makes the handling of web.route-prefix more similar to the behavior in Prometheus. Correctly handles '/' and prefixes which do not begin with a '/'. Signed-off-by: Paul Gier <[email protected]> * Merge release-0.11 back into master (#2212) * Create release v0.11.0-rc.0 (#2156) * Update version to v0.11.0-rc.0 * Update CHANGELOG with all PRs for v0.11 * Improve CHANGELOG by being more explicit * Bumped minio-go library to v6.0.49, fixing an IAM bug in v6.0.45 (#2189) Signed-off-by: Kraig Amador <[email protected]> * Create release candidate v0.11.0-rc.1 (#2192) Signed-off-by: Matthias Loibl <[email protected]> * Release v0.11.0 (#2205) Signed-off-by: Matthias Loibl <[email protected]> * Update VERSION to 0.12.0-dev Signed-off-by: Matthias Loibl <[email protected]> * Resolve go.sum merge conflict and run go mod tidy Signed-off-by: Matthias Loibl <[email protected]> Co-authored-by: Kraig Amador <[email protected]> * returns error messages when trigger reload with http (#1848) * returns error messages when trigger reload with http Signed-off-by: arthur yang <[email protected]> * use simple reloadRules function instead of magic chan error error Signed-off-by: yapo.yang <[email protected]> * add tailing period for comment Signed-off-by: yapo.yang <[email protected]> * fix comment Signed-off-by: arthur yang <[email protected]> * add white space for better code reading Signed-off-by: arthur yang <[email protected]> * collect thanos rule metrics into one struct Signed-off-by: arthur yang <[email protected]> * remove termination logic and keep log only Signed-off-by: arthur yang <[email protected]> * update changelog for #1848 Signed-off-by: arthur yang <[email protected]> * add tailing period Signed-off-by: arthur yang <[email protected]> * check whether registry is nil Signed-off-by: arthur yang <[email protected]> * tailing period in metrics Signed-off-by: arthur yang <[email protected]> * cancel with context Signed-off-by: arthur yang <[email protected]> * return ctx.Err() instead of errors.New Signed-off-by: arthur yang <[email protected]> * register thanos rule metrics with promauto Signed-off-by: arthur yang <[email protected]> * return errs before set success related metrics Signed-off-by: arthur yang <[email protected]> * revert go.sum go.mod change Signed-off-by: arthur yang <[email protected]> * reload webhandler/sighup in one for loop Signed-off-by: arthur yang <[email protected]> * reload with chan chan error Signed-off-by: yapo.yang <[email protected]> * Fix error in component status help message (#2216) Signed-off-by: mcsammac Date: Wed Mar 4 13:50:17 2020 -0500 On branch master Changes to be committed: modified: pkg/prober/intrumentation.go Signed-off-by: s320009 <[email protected]> * tutorials: fix typo in image version (#2223) Signed-off-by: Paul Gier <[email protected]> * Blocked classic prometheus constructors, moved all to promauto; Removed unnecessary printfs. (#2228) Fixes: https://github.com/thanos-io/thanos/issues/2102 Also blocked them on CI side, thanks to https://github.com/fatih/faillint/pull/8 Signed-off-by: Bartlomiej Plotka <[email protected]> * ruler: Fix #2204 bug where alert queue is unpoppable causing full queue and dropped alerts (#2238) * Add test for alert queue Pop after multiple Push Signed-off-by: Robin Clarke-Williams <[email protected]> * Fix alert queue bug by resignal after Pop (#2204) Signed-off-by: Robin Clarke-Williams <[email protected]> * Fix alert queue test and simplify Signed-off-by: Robin Clarke-Williams <[email protected]> * Update CHANGELOG.md Signed-off-by: Robin Clarke-Williams <[email protected]> * Link to thanos-io/thanos PR in CHANGELOG.md Signed-off-by: Robin Clarke-Williams <[email protected]> * bucket: improve shard label handling (#2219) Signed-off-by: Jacob Colvin <[email protected]> * fixing querier deployment kube manifest example 404 error (#2229) Signed-off-by: Rajesh Rajendran <[email protected]> * *: Fix misuse of pkg/errors.Errorf and error directive (#2253) * Fix pkg/errors error directive issues Signed-off-by: Kemal Akkoyun <[email protected]> * Fix misuse of Errorf Signed-off-by: Kemal Akkoyun <[email protected]> * Fix false metric name in Store GW e2e test (#2256) Signed-off-by: Kemal Akkoyun <[email protected]> * Add scheme to the alertmanagers.url in ruler example (#2255) Signed-off-by: gitlawr <[email protected]> * Sort chunks by thanos.downsample.resolution for better grouping (#2231) Signed-off-by: Paul Traylor <[email protected]> * Remove duplicate log.level arg in quickstart.sh (#2148) Signed-off-by: Richard Poole <[email protected]> * tutorials: fix incorrect query (#2239) You would have to query `prometheus_tsdb_head_series` instead of `sum(prometheus_tsdb_head_series)` in order to get the 5 results when deduplicating. Signed-off-by: John Chen <[email protected]> * Use new go jsonnet formatter (#2258) Signed-off-by: Kemal Akkoyun <[email protected]> * docs: Document Thanos Sharding (#1922) * docs: Document Thanos Sharding Signed-off-by: Xiang Dai <[email protected]> * Add time partitioning Signed-off-by: Xiang Dai <[email protected]> * feedback Signed-off-by: Xiang Dai <[email protected]> * Sharding: document supported relabel action and add store gateway backgroud (#2272) * Sharding: document supported relabel action and add store gateway background Signed-off-by: Xiang Dai <[email protected]> * add hashmod Signed-off-by: Xiang Dai <[email protected]> * Add wait-interval flag (#2265) Signed-off-by: Kemal Akkoyun <[email protected]> * store: Optimized labels conversion on store.Series; Added unsafe labels conversion. (#2230) ## Changes * method TranslateLables CPU Optimized (streamed sorting). * All store GW label conversation to []storepb.Label are now alloc-less. ``` go test -bench=BenchmarkUnsafeVSSafeLabelsConversion -run=^$ -benchmem -timeout 2h -benchtime 10s ./pkg/store/storepb/... goos: linux goarch: amd64 pkg: github.com/thanos-io/thanos/pkg/store/storepb BenchmarkUnsafeVSSafeLabelsConversion/safe-12 34822 339076 ns/op 655368 B/op 2 allocs/op BenchmarkUnsafeVSSafeLabelsConversion/unsafe-12 1000000000 2.32 ns/op 0 B/op 0 allocs/op PASS ``` TODO: Do the same on Querier. Signed-off-by: Bartlomiej Plotka <[email protected]> * fix: Ignore the OS-X Trash (#2274) Signed-off-by: kushthedude <[email protected]> * docs/sharding.md: fix a typo (#2273) Signed-off-by: Xiang Dai <[email protected]> * fix replicate duplicate metrics (#2254) Signed-off-by: yeya24 <[email protected]> * Document downsample component (#2090) * scripts/genflagdocs.sh: Generate downsample flag Signed-off-by: Xiang Dai <[email protected]> * Document downsample component Signed-off-by: Xiang Dai <[email protected]> * Move downsample as bucket sub-command Signed-off-by: Xiang Dai <[email protected]> * update docs Signed-off-by: Xiang Dai <[email protected]> * feedback Signed-off-by: Xiang Dai <[email protected]> * Crashing error messages now will print stacktrace. (#2277) Signed-off-by: Bartlomiej Plotka <[email protected]> * Downsample: update changelog (#2278) * Downsample: update changelog Signed-off-by: Xiang Dai <[email protected]> * feedback Signed-off-by: Xiang Dai <[email protected]> * thanos-mixin: clear units/axis (#2279) * thanos-mixin: clear units/axis Signed-off-by: Xiang Dai <[email protected]> * fix nits Signed-off-by: Xiang Dai <[email protected]> * store, compact, bucket: Delay deletes by scheduling block deletion with deletion-mark.json file (#2136) Signed-off-by: khyatisoneji <[email protected]> * Use maxInt instead of math.MaxInt64 (#2268) math.MaxInt64 doesn't work on 32-bit systems (like linux/arm builds) Signed-off-by: Peter Štibraný <[email protected]> * Replace objstore.Exists function calls with bkt.Exists (#2284) Signed-off-by: khyatisoneji <[email protected]> * Added Xiang to Triage Role. (#2289) Signed-off-by: Bartlomiej Plotka <[email protected]> * Enrich Memcached client logs (#2292) * Enrich Memcached client logs Signed-off-by: Marco Pracucci <[email protected]> * Update pkg/cacheutil/memcached_client.go Signed-off-by: Marco Pracucci <[email protected]> Co-Authored-By: Bartlomiej Plotka <[email protected]> * Update pkg/cacheutil/memcached_client.go Signed-off-by: Marco Pracucci <[email protected]> Co-Authored-By: Bartlomiej Plotka <[email protected]> Co-authored-by: Bartlomiej Plotka <[email protected]> * Added Kemal to Triage Role. (#2293) Signed-off-by: Bartlomiej Plotka <[email protected]> * bucket: handle instances where no blocks are loaded (#2271) * bucket: handle instances where no blocks are loaded Signed-off-by: Jacob Colvin <[email protected]> * bucket: reject all falsy label values Signed-off-by: Jacob Colvin <[email protected]> * bucket: update changelog Signed-off-by: Jacob Colvin <[email protected]> * docs/sharding.md: Replace example floating link with permalink (#2296) Signed-off-by: Frederic Branczyk <[email protected]> * Added latest release badge. (#2300) I think there are NOT enough badges, so added one more! Signed-off-by: Bartlomiej Plotka <[email protected]> * store: Postings fetching optimizations (#2294) * Avoid fetching duplicate keys. Simplified groups with add/remove keys. Signed-off-by: Peter Štibraný <[email protected]> * Added shortcuts Signed-off-by: Peter Štibraný <[email protected]> * Optimize away fetching of ALL postings, if possible. Only remove postings for each key once. Signed-off-by: Peter Štibraný <[email protected]> * Don't do individual index.Without, but merge them first. Signed-off-by: Peter Štibraný <[email protected]> * Don't use map for fetching postings, but return slice instead. This is in line with original code. Using a map was nicer, but more expensive in terms of allocations and hashing labels. Signed-off-by: Peter Štibraný <[email protected]> * Renamed 'all' to 'allRequested'. Signed-off-by: Peter Štibraný <[email protected]> * Typo Signed-off-by: Peter Štibraný <[email protected]> * Make linter happy. Signed-off-by: Peter Štibraný <[email protected]> * Added comment to fetchPostings. Signed-off-by: Peter Štibraný <[email protected]> * Group vars Signed-off-by: Peter Štibraný <[email protected]> * Comments Signed-off-by: Peter Štibraný <[email protected]> * Use allPostings and emptyPostings variables for special cases. Signed-off-by: Peter Štibraný <[email protected]> * Unify terminology to "special All postings" Signed-off-by: Peter Štibraný <[email protected]> * Address feedback. Signed-off-by: Peter Štibraný <[email protected]> * Added CHANGELOG.md entry. Signed-off-by: Peter Štibraný <[email protected]> * Fix check for empty group. Signed-off-by: Peter Štibraný <[email protected]> * Comment Signed-off-by: Peter Štibraný <[email protected]> * Special All postings is now added as a new group No special handling required anymore. Signed-off-by: Peter Štibraný <[email protected]> * Updated comment Signed-off-by: Peter Štibraný <[email protected]> * cmd/thanos/receive: Remove unused TLSClientConfig from Options (#2299) Signed-off-by: mrIncompetent <[email protected]> * compactor: Add ReplicaLabelRemover as MetaFetcher filter to enable offline vertical compaction/deduplication for replicated data (#2250) * Create ReplicaLabelsFilter to allow for offline deduplication Signed-off-by: Matthias Loibl <[email protected]> * Start adding a e2e test for offline-deduplication with Thanos compact Signed-off-by: Matthias Loibl <[email protected]> * Address issues that have discovered after review Signed-off-by: Kemal Akkoyun <[email protected]> * Fix e2e test service issue Signed-off-by: Kemal Akkoyun <[email protected]> * Improve fetcher unit tests Signed-off-by: Kemal Akkoyun <[email protected]> * Add simple compactor e2e tests with replica remover Signed-off-by: Kemal Akkoyun <[email protected]> * Remove unnecessary interface Signed-off-by: Kemal Akkoyun <[email protected]> * Address review issues Signed-off-by: Kemal Akkoyun <[email protected]> * Add more test cases Signed-off-by: Kemal Akkoyun <[email protected]> * Improve and stabilize e2e tests Signed-off-by: Kemal Akkoyun <[email protected]> * Address review issues Signed-off-by: Kemal Akkoyun <[email protected]> * Increase ruler sd refresh interval Signed-off-by: Kemal Akkoyun <[email protected]> * Address review issues Signed-off-by: Kemal Akkoyun <[email protected]> * Separate filters and modifiers Signed-off-by: Kemal Akkoyun <[email protected]> Co-authored-by: Matthias Loibl <[email protected]> * docs/release: squat to release v0.12.0 (#2312) Signed-off-by: Lucas Servén Marín <[email protected]> * cmd/thanos/receive: Serve TLS when TLSConfig is given (#2311) Signed-off-by: mrIncompetent <[email protected]> Signed-off-by: Lucas Servén Marín <[email protected]> Co-authored-by: mrIncompetent <[email protected]> * cmd/thanos/compact: add bucket UI (#1714) This commit enhances the compact component so that it runs the bucket UI whenever the --wait flag is also passed. In order to reduce the overhead of running the UI in addition to the compactor, this commit also refactors the compactor and bucket commands a bit in order to re-use a single meta fetcher. Signed-off-by: Lucas Servén Marín <[email protected]> * reloadRules initlialization should fail (#2301) Signed-off-by: arthur yang <[email protected]> * Fixed inconsistent metrics and methods (#2319) Signed-off-by: jojohappy <[email protected]> * e2e: Refactored compactor test; Fixed flakiness. (#2313) Also: * Reduced number of services for e2e for latency * Fixed halting * Improved logging. * Improved test cases (e.g added test for compaction and halting) Signed-off-by: Bartlomiej Plotka <[email protected]> * pkg/store: Report no data if no stores discovered (#2310) * pkg/store: Report no data if no stores discovered Signed-off-by: Frederic Branczyk <[email protected]> * CHANGELOG.md: Add timespan reported on empty stores Signed-off-by: Frederic Branczyk <[email protected]> * Added max_item_size to Memcached client (#2304) * Added max_item_size to Memcached client Signed-off-by: Marco Pracucci <[email protected]> * Changed imports order and splitted tests Signed-off-by: Marco Pracucci <[email protected]> * Fixed type casting Signed-off-by: Marco Pracucci <[email protected]> * Changed imports grouping Signed-off-by: Marco Pracucci <[email protected]> * Changed memcached max_item_size default from 0 to 1MB Signed-off-by: Marco Pracucci <[email protected]> * Increased e2e tests timeout Signed-off-by: Marco Pracucci <[email protected]> * Fixed typo in CHANGELOG Signed-off-by: Marco Pracucci <[email protected]> * Reverted Makefile changes Signed-off-by: Marco Pracucci <[email protected]> * tesutil: Enchanced testutil, refactored for our needs. (#2325) Changed LICENSE as we no longer use version we copied back then. Most of it was reimplemented. Why? * Much richer diff (inspired by testify packages * Consistent API * Less indentation. Signed-off-by: Bartlomiej Plotka <[email protected]> * make, ci: Check example alerts and rules in CI (#2318) * Check example alerts and rules in CI Signed-off-by: Kemal Akkoyun <[email protected]> * Add require clean tree Signed-off-by: Kemal Akkoyun <[email protected]> * Fix latency alerts (#2316) Signed-off-by: Kemal Akkoyun <[email protected]> * Fixed e2e. (#2327) Sorry, was late when we merged the fix. Funny bug: It would start to fail exactly 12h AFTER 25.03 8:00 GMT Should be fine now... and in future until changed ;p Signed-off-by: Bartlomiej Plotka <[email protected]> * store: added option to reencode and compress postings before storing them to the cache (#2297) * Added "diff+varint+snappy" codec for postings. Signed-off-by: Peter Štibraný <[email protected]> * Added option to reencode and compress postings stored in cache Signed-off-by: Peter Štibraný <[email protected]> * Expose enablePostingsCompression flag as CLI parameter. Signed-off-by: Peter Štibraný <[email protected]> * Use "github.com/pkg/errors" instead of "errors" package. Signed-off-by: Peter Štibraný <[email protected]> * remove break Signed-off-by: Peter Štibraný <[email protected]> * Removed empty branch Signed-off-by: Peter Štibraný <[email protected]> * Added copyright headers. Signed-off-by: Peter Štibraný <[email protected]> * Added CHANGELOG.md entry Signed-off-by: Peter Štibraný <[email protected]> * Added comments. Signed-off-by: Peter Štibraný <[email protected]> * Use Encbuf and Decbuf. Signed-off-by: Peter Štibraný <[email protected]> * Fix comments in test file. Signed-off-by: Peter Štibraný <[email protected]> * Another comment... Signed-off-by: Peter Štibraný <[email protected]> * Removed diffVarintSnappyEncode function. Signed-off-by: Peter Štibraný <[email protected]> * Comment on usage with in-memory cache. Signed-off-by: Peter Štibraný <[email protected]> * var block Signed-off-by: Peter Štibraný <[email protected]> * Removed extra comment. Signed-off-by: Peter Štibraný <[email protected]> * Move comment to error message. Signed-off-by: Peter Štibraný <[email protected]> * Separated snappy compression and postings reencoding into two functions. There is now header only for snappy-compressed postings. Signed-off-by: Peter Štibraný <[email protected]> * Added comment on using diff+varint+snappy. Signed-off-by: Peter Štibraný <[email protected]> * Shorten header Signed-off-by: Peter Štibraný <[email protected]> * Lint... Signed-off-by: Peter Štibraný <[email protected]> * Changed experimental.enable-postings-compression to experimental.enable-index-cache-postings-compression Signed-off-by: Peter Štibraný <[email protected]> * Added metrics for postings compression Signed-off-by: Peter Štibraný <[email protected]> * Added metrics for postings decompression Signed-off-by: Peter Štibraný <[email protected]> * Reorder metrics Signed-off-by: Peter Štibraný <[email protected]> * Fixed comment. Signed-off-by: Peter Štibraný <[email protected]> * Fixed comment. Signed-off-by: Peter Štibraný <[email protected]> * Use encode/decode labels. Signed-off-by: Peter Štibraný <[email protected]> * mixin: Make alert threshold values parametric (#2317) * Make alert threshold values parametric Signed-off-by: Kemal Akkoyun <[email protected]> * Rename variable Signed-off-by: Kemal Akkoyun <[email protected]> * Adjsut default values for latency thresholds Signed-off-by: Kemal Akkoyun <[email protected]> * Update UW logo (#2329) Signed-off-by: Povilas Versockas <[email protected]> * block fetcher with errgroup (#2309) * block fetcher with errgroup Signed-off-by: arthur yang <[email protected]> * errorgroup goroutine defer close Signed-off-by: arthur yang <[email protected]> * website: fix 404 on root of sections (#2328) Signed-off-by: Prem Kumar <[email protected]> * Add mallgroup.com to adopters (#2331) Signed-off-by: Daniel Rataj <[email protected]> Co-authored-by: Daniel Rataj <[email protected]> * store: Binary index header is now production ready and enabled by default (#2330) * store: Binary index header is now production ready and enabled by default. Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed typo. Signed-off-by: Bartlomiej Plotka <[email protected]> * Add leboncoin company as adopter (#2333) Signed-off-by: Guillaume Chenuet <[email protected]> * website: Collapsible menu sections (#2336) * website: make sidemenu collapsed by default Signed-off-by: Prem Kumar <[email protected]> * website: add caret svg in expandble sidemenu Signed-off-by: Prem Kumar <[email protected]> * website: expand current section's sidemenu by default Signed-off-by: Prem Kumar <[email protected]> * ui: fix store never removed from /stores page bug (#2339) * ui: fix store never removed from /stores page bug We need to update `LastCheck` only if the error is non-nil. That field is used in the cleanup function to know when to remove the StoreAPI from the UI. If we always update it, even if an error has happened, that means that `--store.unhealthy-timeout` is never respected. Signed-off-by: Giedrius Statkevičius <[email protected]> * query: fix storeset Update() test Now let's start with a proper state where LastCheck is not 0 at the beginning and we have 2 active stores, 3 store statuses just like the original author had intended. Signed-off-by: Giedrius Statkevičius <[email protected]> * fix typo in readme (#2342) data -> date Signed-off-by: afirth <[email protected]> * query: add --store-strict flag (#2337) * query: add --store-strict flag Add a new flag called `--store-strict` as agreed per https://thanos.io/proposals/202001_thanos_query_health_handling.md/ I have updated the proposal to reflect the reality. Third time's the charm, I believe it :-) Now the flag is called `--store-strict` which only accepts statically defined nodes. I guess the code is even simpler now. I have also fixed one small issue where `%w` was used in `errors.Errorf`. Couldn't compile Thanos locally with Go 1.14 without this fix. Signed-off-by: Giedrius Statkevičius <[email protected]> * CHANGELOG: fix changelog item Signed-off-by: Giedrius Statkevičius <[email protected]> * Register grpc prometheus middleware metrics (#2347) Signed-off-by: Kemal Akkoyun <[email protected]> * website: Enabled two scripts to fix Google analytics. (#2346) * website: Enabled two scripts to fix Google analytics. Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed also inline style. Signed-off-by: Bartlomiej Plotka <[email protected]> * Added Workfront as adopter (#2351) Signed-off-by: Ryan Orth <[email protected]> Co-authored-by: Ryan Orth <[email protected]> * compact: Fixed minor logging issues. (#2353) Fixes: https://github.com/thanos-io/thanos/issues/2322 Signed-off-by: Bartlomiej Plotka <[email protected]> * fetcher: Made metaFetcher go routine safe; Fixed multiple bucket UI + fetcher issues. (#2354) Fixed https://github.com/thanos-io/thanos/issues/2349 Fixed races (we were reusing fetcher by both bucket UI and compaction syncs... Fixed logging Added singleflight to ensure we don't synchronize too often. Signed-off-by: Bartlomiej Plotka <[email protected]> * test/e2e: Add timestamp to e2e test log output (#2358) Signed-off-by: Frederic Branczyk <[email protected]> * store & compact: For components that operates on blocks - expose the UI on /loaded-blocks (#2357) Signed-off-by: Bartlomiej Plotka <[email protected]> * rule: fix query addr parsing (#2288) * rule: fix query addr parsing Signed-off-by: Tobiasz Heller <[email protected]> * CR: support different schemas Signed-off-by: Tobiasz Heller <[email protected]> * CR: docs and err Signed-off-by: Tobiasz Heller <[email protected]> * CR: improve error handling and more TC Signed-off-by: Tobiasz Heller <[email protected]> * mixin: Remove unused jobPrefix field (#2364) Signed-off-by: Lili Cosic <[email protected]> * Create release v0.12.0-rc.0 (#2360) Signed-off-by: Lucas Servén Marín <[email protected]> * Allow more connection reuse than the default of 2 (#2343) Signed-off-by: Jakob Kartschall <[email protected]> * Makefile: ignore GCS in CI (#2368) We got booted from the GCS account, so skip this in CI for now. Signed-off-by: Lucas Servén Marín <[email protected]> * Revert "Makefile: ignore GCS in CI (#2368)" (#2373) This reverts commit 8591434856ced5803e399b4d9d1bf2d1459c0ee0. * mixin: Added critical Rules alerts. (#2374) * mixin: Added critical Rules alerts. Signed-off-by: Bartlomiej Plotka <[email protected]> * Addressed comments. Signed-off-by: Bartlomiej Plotka <[email protected]> * mixin: Made sure Rule alerts are not firing if one replica is failing. (#2375) Signed-off-by: Bartlomiej Plotka <[email protected]> * Update S3 endpoint mapping link (#2377) The link for the AWS Region Endpoint Mappings for S3 was out of date, this PR updates it to point to the new location. Signed-off-by: João Carvalho <[email protected]> * Fix2213 0.12 (#2382) * binaryHeader: Fixed partial write issue for index-header. Fixes https://github.com/thanos-io/thanos/issues/2213 This caused was indicated as regression of latency, and also causes potential critical issue for store GW, where manual delete of index-header from local storage was required. This might be considered as blocker for 0.12, so it would be worth to port it to 0.12 TBH @squat. Signed-off-by: Bartlomiej Plotka <[email protected]> * binary_reader: ensure fs is synced before renaming Signed-off-by: Lucas Servén Marín <[email protected]> Co-authored-by: Bartlomiej Plotka <[email protected]> * objstore: Added WithExpectedErrs which allows to control instrumentation (e.g not increment failures for expected not found) (#2383) * objstore: Added WithExpectedErrs to Reader which allows to control instrumentation (e.g not increment failures for expected not found). This allows to not wake up oncall in the middle of night, becuase of expeced, properly handled case (: Also: Has to move inmem to objstore for testing. Signed-off-by: Bartlomiej Plotka <[email protected]> * pkg/objstore: fix NewBucket comments. This commit fixes the documentation comments for the NewBucket funcs. Signed-off-by: Lucas Servén Marín <[email protected]> Co-authored-by: Bartlomiej Plotka <[email protected]> * pkge/receive: trace TSDB ingestion (#2384) This commit adds a tracing span around the writing of remote-write requests into TSDB. This will help us differentiate between the latencies in the forwarding of requests around the hashring and the latencies of appending to the database. This commit also removes the `thanos_` prefix from the forwarding span to better align with the span naming in the rest of the project. Signed-off-by: Lucas Servén Marín <[email protected]> * compact: Made MarkForDeletion less strict; Added more debugability to block deletion logic, made meta sync explicit. (#2385) Also: * Changed order: Now BestEffortCleanAbortedPartialUploads is before DeleteMarkedBlocks. * Increment markedForDeletion counter only when we actually uploaded block. * Fixed logging issues. Signed-off-by: Bartlomiej Plotka <[email protected]> * Compactor: Document reasons and solutions about overlaps (#2191) * troubleshooting.md: document overlaps Signed-off-by: Xiang Dai <[email protected]> * feedback Signed-off-by: Xiang Dai <[email protected]> * feedback Signed-off-by: Xiang Dai <[email protected]> * add reminder label to stale bot config (#2378) Signed-off-by: yeya24 <[email protected]> * fix sharding docs style; fix promtail link (#2379) Signed-off-by: yeya24 <[email protected]> * store: Fixed binary header bug that was causing all postings to be kept in memory instead of 1/32 as we meant. (#2390) * store: Fixed binary header bug that was causing all postings to be kept in memory instead of 1/32 as we meant. Spotted by @mkabischev! Thanks to you and @d-ulyanov as well! Epic finding +1 Test output before fix: testutil.Equals(t, 1, br.version) testutil.Equals(t, 2, br.indexVersion) testutil.Equals(t, &BinaryTOC{Symbols: headerLen, PostingsOffsetTable: 66}, br.toc) testutil.Equals(t, int64(626), br.indexLastPostingEnd) testutil.Equals(t, 8, br.symbols.Size()) testutil.Equals(t, map[string]*postingValueOffsets{ "": { offsets: []postingOffset{{value: "", tableOff: 4}}, lastValOffset: 392, }, "a": { offsets: []postingOffset{ {value: "1", tableOff: 9}, {value: "11", tableOff: 16}, {value: "12", tableOff: 24}, {value: "2", tableOff: 32}, {value: "3", tableOff: 39}, {value: "4", tableOff: 46}, {value: "5", tableOff: 53}, {value: "6", tableOff: 60}, {value: "7", tableOff: 67}, {value: "8", tableOff: 74}, {value: "9", tableOff: 81}, }, lastValOffset: 572, }, "longer-string": { offsets: []postingOffset{{value: "1", tableOff: 88}}, lastValOffset: 622, }, }, br.postings) testutil.Equals(t, 0, len(br.postingsV1)) testutil.Equals(t, 2, len(br.nameSymbols)) Signed-off-by: Bartlomiej Plotka <[email protected]> * Added CHANGELOG item. Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed build errs. Signed-off-by: Bartlomiej Plotka <[email protected]> * Addressed Lucas comment. Signed-off-by: Bartlomiej Plotka <[email protected]> * store: Fixed critical bug, when certain not-existing value queried was causing "invalid size" error. (#2393) Reason why we could not reproduce it locally was that for most of non-existing value we were lucky that buffer was still long enough and we could read and decode some (malformed) variadic type. For certain rare cases, buffer was not long enough. Fixed and spotted thanks to amazing @mkabischev! * Added more regression tests for binary header. Without the fix it fails with: ``` header_test.go:154: header_test.go:154: exp: range not found got: get postings offset entry: invalid size ``` Signed-off-by: Bartlomiej Plotka <[email protected]> * VERSION: cut v0.12.0-rc.1 (#2396) Signed-off-by: Lucas Servén Marín <[email protected]> * mixin: Change critical rule alert to be symtom based (#2398) This change makes the critical (typically paging) alert more symptom based, rather than observing data written to disk. Additionally after this change the alert will only fire if there are actually rules loaded. Additionally to no rules loaded the previous alert was also prone to rules that legitimately are not writing data. Signed-off-by: Frederic Branczyk <[email protected]> * scripts: Added grpcurl script useful for Thanos debugging. (#2403) Signed-off-by: Bartlomiej Plotka <[email protected]> * bucket docs: fix "thanos downsample" remnant (#2409) and follow formatting of the other bucket commands Signed-off-by: John Belmonte <[email protected]> * docs: Added Thanos Go style guide and some development tips. (#2359) * docs: Added Thanos Go style guide and some development tips. Signed-off-by: Bartlomiej Plotka <[email protected]> * Addressed comments; added TOC and image. Signed-off-by: Bartlomiej Plotka <[email protected]> * Added more rules. Signed-off-by: Bartlomiej Plotka <[email protected]> * Grammarly fixes! Signed-off-by: Bartlomiej Plotka <[email protected]> * docs: Fixed table formatting for coding style guide. (#2421) Signed-off-by: Bartlomiej Plotka <[email protected]> * Added extra check for sorting time Duration and int strings (#2416) Signed-off-by: kadern0 <[email protected]> * docs: Added minor note to single rule. (#2422) Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed TOC. (#2424) Signed-off-by: Bartlomiej Plotka <[email protected]> * store dashboard: fix gRPC streamed detail panels (#2426) Fixes #2425 Signed-off-by: John Belmonte <[email protected]> * use bytes unit where appropriate on grafana dashboards (#2423) Signed-off-by: John Belmonte <[email protected]> * bucket verify: document that compactor should be disabled (#2418) Signed-off-by: John Belmonte <[email protected]> * docs: Fixed typo in coding guide. (#2427) Signed-off-by: Bartlomiej Plotka <[email protected]> * Added Marco as Thanos Maintainer (#2428) Also, reordered list alphabetically. Signed-off-by: Bartlomiej Plotka <[email protected]> * store: proxy: fix queries never timing out bug (#2411) * store: proxy: add test for deadlocking problem Signed-off-by: Giedrius Statkevičius <[email protected]> * store: proxy: add fix for timeouts Checking here if the series context has ended is the correct fix here. We want to check it because if any of the other Series() calls error out then the context is canceled. So, it is equal to checking for errors "downstream", in `mergedSeriesSet`. Also, `handleErr()` here is the correct function to use because in such a case we want to set `s.err` -- if `io.EOF` still hasn't been received then it means that StoreAPI still has some data that it wants to send but hasn't yet. With this, the previously added test passes. Signed-off-by: Giedrius Statkevičius <[email protected]> * docs: fixed typo in coding style guide (#2431) Signed-off-by: Stephan Kirsten <[email protected]> * docs/release-process: make shell command copyable (#2433) In general, I think it is easier for users of guides when shell commands are listed without a preceeding `$`, otherwise the commands cannot be directly copied and pasted into a terminal. Signed-off-by: Lucas Servén Marín <[email protected]> * docs/contributing: clean up style guide grammar (#2432) This commit makes some small grammar fixes to the coding style guide. Signed-off-by: Lucas Servén Marín <[email protected]> * cut v0.12.0 (#2437) Signed-off-by: Lucas Servén Marín <[email protected]> * .circleci: use consistent ci image tags (#2440) We were not using the latest thanos-ci image tag for every part of the CI pipeline: we were using 0.3.0 for tests but 0.2.0 for all builds. Signed-off-by: Lucas Servén Marín <[email protected]> * CHANGELOG.md: fix changelog The changelog in the release-0.12 branch is correct, but somewhere in the merge back into master, the changelog was mangled. This puts the fixes in their correct places. Signed-off-by: Lucas Servén Marín <[email protected]> * store: proxy: fix queries never timing out bug (#2411) (#2443) * store: proxy: add test for deadlocking problem Signed-off-by: Giedrius Statkevičius <[email protected]> * store: proxy: add fix for timeouts Checking here if the series context has ended is the correct fix here. We want to check it because if any of the other Series() calls error out then the context is canceled. So, it is equal to checking for errors "downstream", in `mergedSeriesSet`. Also, `handleErr()` here is the correct function to use because in such a case we want to set `s.err` -- if `io.EOF` still hasn't been received then it means that StoreAPI still has some data that it wants to send but hasn't yet. With this, the previously added test passes. Signed-off-by: Giedrius Statkevičius <[email protected]> Co-authored-by: Giedrius Statkevičius <[email protected]> * proposal: Added proposal for new Thanos component: Thanos Frontend. (#2434) * proposal: Added proposal for new Thanos component: Thanos Frontend. Signed-off-by: Bartlomiej Plotka <[email protected]> * Added more rationales for separate binary. Signed-off-by: Bartlomiej Plotka <[email protected]> * Addressed Marco comments. Signed-off-by: Bartlomiej Plotka <[email protected]> * Addressed lucas comments. Signed-off-by: Bartlomiej Plotka <[email protected]> * Changed to approved. Signed-off-by: Bartlomiej Plotka <[email protected]> * Moved to query-frontend command. Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed memcached client metrics initialization (#2446) Signed-off-by: Marco Pracucci <[email protected]> * store: Added regex-set optimization to ExpandedPostings (#2450) * Added regex-set optimization to ExpandedPostings Signed-off-by: Peter Štibraný <[email protected]> * Fixed capitalization. Signed-off-by: Peter Štibraný <[email protected]> * CHANGELOG.md Signed-off-by: Peter Štibraný <[email protected]> * Removed unnecessary change. Signed-off-by: Peter Štibraný <[email protected]> * Remove whitespace Signed-off-by: Peter Štibraný <[email protected]> * Use testutil instead of testify. Signed-off-by: Peter Štibraný <[email protected]> * Added copyright header, from original Prometheus querier.go Signed-off-by: Peter Štibraný <[email protected]> * Use Thanos copyright header. :facepalm: Signed-off-by: Peter Štibraný <[email protected]> * Added · at the end of the sentence. :exploding_head:. I will randomly add emojis and GitHub emoji markup to commit messages that fix frustrating checks like this one. And intentionally not break the line. Let's see how lint deals with that! Ha. Signed-off-by: Peter Štibraný <[email protected]> * docs/contributing: use Before for IsExpired (#2456) Signed-off-by: Davor Kapsa <[email protected]> * cmd/thanos: clean gosimple S1039 (#2464) Signed-off-by: Davor Kapsa <[email protected]> * docs: Update CONTRIBUTING.md with DCO (#2465) * docs: Update CONTRIBUTING.md with DCO Signed-off-by: ranjithkumar007 <[email protected]> * Update CONTRIBUTING.md Co-Authored-By: Bartlomiej Plotka <[email protected]> Signed-off-by: ranjithkumar007 <[email protected]> Co-authored-by: Bartlomiej Plotka <[email protected]> * Added tests to reproduce #2459. (#2462) Related to: https://github.com/thanos-io/thanos/issues/2459 Signed-off-by: Bartlomiej Plotka <[email protected]> * Added a page for documenting beginner issues (#2461) * Added some documentation for beginner issues Signed-off-by: Yash <[email protected]> * Edited some lines Signed-off-by: Yash <[email protected]> * Update docs/operating/troubleshooting.md Co-Authored-By: Bartlomiej Plotka <[email protected]> Signed-off-by: Yash <[email protected]> Co-authored-by: Bartlomiej Plotka <[email protected]> * pkg/block/fetcher: fix concurrent map usage (#2474) Fixes: #2471 This commit fixes an issue where multiple goroutines in the block fetcher filtering were concurrently accessing the same map. The goroutines were concurrently writing AND reading to the shared metas map. This commit guards this concurrent access by giving the DeduplicateFilter struct a mutex. Signed-off-by: Lucas Servén Marín <[email protected]> * Reverted addition of deletion mark for partial uploads. (#2472) Fixes https://github.com/thanos-io/thanos/issues/2459 (quick fix). This keeps the logic from the 0.11.0 which was good enough. Some improvement for future: https://github.com/thanos-io/thanos/issues/2470 Signed-off-by: Bartlomiej Plotka <[email protected]> * Remove optimizations for label=~".*" and label!~".*". (#2475) * Remove optimizations for label=~".*" and label!~".*". They are not correct. Signed-off-by: Peter Štibraný <[email protected]> * CHANGELOG.md Signed-off-by: Peter Štibraný <[email protected]> * cut v0.12.1 (#2476) Signed-off-by: Lucas Servén Marín <[email protected]> * fix thanos web route prefix register twice (#2489) Signed-off-by: yeya24 <[email protected]> Signed-off-by: Lucas Servén Marín <[email protected]> Co-authored-by: yeya24 <[email protected]> * Do not lock DNS Provider.Address() while Resolve() is running (#2492) Signed-off-by: Marco Pracucci <[email protected]> * Compact: Update compact documentation to better clarify dedupeReplicaLabels. (#2481) * Update compact documentation to better clarify dedupeReplicaLabels. Signed-off-by: Johnathan Falk <[email protected]> * Fix capitalization. Signed-off-by: Johnathan Falk <[email protected]> * Gracefully handle additional oneof fields in SeriesResponse (#2501) * Gracefully handle additional oneof fields in SeriesResponse Signed-off-by: Marco Pracucci <[email protected]> * Removed unnecessary continue Signed-off-by: Marco Pracucci <[email protected]> * Updated CHANGELOG Signed-off-by: Marco Pracucci <[email protected]> * fix typo (#2509) Signed-off-by: arthur yang <[email protected]> * Adjust memcached operation buckets (#2504) Signed-off-by: Kemal Akkoyun <[email protected]> * pkg/query: remove obsolete 'thanos_store_node_info' metric (#2505) Signed-off-by: Simon Pasquier <[email protected]> * Add Community information (#2510) * Add Community information Signed-off-by: Povilas Versockas <[email protected]> * Fixes after review Signed-off-by: Povilas Versockas <[email protected]> * Move to contributing menu Signed-off-by: Povilas Versockas <[email protected]> * Remove incompleteView field from fetcher response. (#2455) Signed-off-by: Peter Štibraný <[email protected]> * Added hints support to store protobuf (#2502) * Added hints support to store protobuf Signed-off-by: Marco Pracucci <[email protected]> * Updated CHANGELOG Signed-off-by: Marco Pracucci <[email protected]> * Reworded hints doc Signed-off-by: Marco Pracucci <[email protected]> * Removed hints_enabled from SeriesRequest Signed-off-by: Marco Pracucci <[email protected]> * Remove spurious newline after rebase Signed-off-by: Marco Pracucci <[email protected]> * Leveraging docker layer caching (#2508) Signed-off-by: ankitjain28may <[email protected]> * add gofmt -s step to makefile and golangci (#2463) * gofmt -s files Signed-off-by: Davor Kapsa <[email protected]> * golangci: add gofmt to linters Signed-off-by: Davor Kapsa <[email protected]> * makefile: add gofmt to format Signed-off-by: Davor Kapsa <[email protected]> * Update coding-style-guide.md (#2520) make `doSomething` a function call. Signed-off-by: Halil Kaskavalci <[email protected]> * Let's be more nicer on stale things (: (#2517) Signed-off-by: Bartlomiej Plotka <[email protected]> * docs/proposals/202003_thanos_rules_federation: initial commit (#2263) Signed-off-by: Sergiusz Urbaniak <[email protected]> * cmd: Moved all no-service commands under new tools subcommand. (#2513) This will allow better extensibility for future for non-bucket related tools we plan to add. Signed-off-by: Bartlomiej Plotka <[email protected]> * Added hints support to BucketStore.Series() (#2516) * Added hints support to BucketStore.Series() Signed-off-by: Marco Pracucci <[email protected]> * Fixed goimport grouping Signed-off-by: Marco Pracucci <[email protected]> * Added missing copyright Signed-off-by: Marco Pracucci <[email protected]> * Addressed review comments Signed-off-by: Marco Pracucci <[email protected]> * Exclude zoom.us from liche (because zoom.us response headers are over 4KB) Signed-off-by: Marco Pracucci <[email protected]> * update uswitch logo and branding (#2529) Signed-off-by: Joseph-Irving <[email protected]> * *: add metrics to the reloader package (#2521) Signed-off-by: Simon Pasquier <[email protected]> * Added LocalStore and realistic data for querier counter reset bug. (#2522) (#2538) * Added LocalStore and realistic data for querier counter reset bug. Tries to reproduces: https://github.com/thanos-io/thanos/issues/2401 I would still merge as it is a great test, and allows us to quickly check data provided by Ben. Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed tsdbstore required component type. Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed ineffectual set. Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed liche. Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed unknown store issue. Signed-off-by: Bartlomiej Plotka <[email protected]> * docs: fixed broken links in documentation (#2540) * fix tiny typo Signed-off-by: Dan Potepa <[email protected]> * fix link to example manifest files Signed-off-by: Dan Potepa <[email protected]> * fixed some broken links Signed-off-by: Dan Potepa <[email protected]> * Clear duplicateIDs at the beginning of Filter. (#2544) * Clear duplicateIDs at the beginning of Filter. Signed-off-by: Peter Štibraný <[email protected]> * CHANGELOG.md Signed-off-by: Peter Štibraný <[email protected]> * Address review feedback. Signed-off-by: Peter Štibraný <[email protected]> * Fix whitespace noise. Signed-off-by: Peter Štibraný <[email protected]> * :whale: :neckbeard: :kick_scooter: Signed-off-by: Peter Štibraný <[email protected]> * cmd: rule: do not wrap reload endpoint with prefix twice (#2533) * cmd: rule: do not wrap reload endpoint with '/' Do not wrap the router with `/` on the `/-/reload` endpoint. Otherwise, it is inaccessible when no prefix has been specified by the user. Signed-off-by: Giedrius Statkevičius <[email protected]> * CHANGELOG: update Signed-off-by: Giedrius Statkevičius <[email protected]> * e2e: rule: add test for reloading rules via /-/reload Add a test-case to the e2e tests for testing whether reloading rules via /-/reload works. Signed-off-by: Giedrius Statkevičius <[email protected]> * VERSION: cut release v0.12.2 (#2545) Signed-off-by: Lucas Servén Marín <[email protected]> * ui: bump jQuery version to v3.5.0 (#2549) Signed-off-by: Prem Kumar <[email protected]> * Bumped minio-go library to v6.0.53 (#2536) * Bumped minio-go library to v6.0.53 Signed-off-by: alicek106 <[email protected]> * Updated CHANGELOG with PR Signed-off-by: alicek106 <[email protected]> * Add deleteSeries skeleton to return bad request (#2530) Signed-off-by: darshanime <[email protected]> * Revert "Add deleteSeries skeleton to return bad request (#2530)" (#2551) This reverts commit d0bcbff8375b6384292533ffa84b6408b85b0acb. * Fixed the timezone url (#2553) Signed-off-by: Yash <[email protected]> * Updated to golang v1.14.2 (#2194) * Update golang:1.14.2 Signed-off-by: Raúl Naveiras <[email protected]> * Update thanos-ci:go1.14.2-node It requires a manual process to generate and push this container. ``` make docker-ci DOCKER_CI_TAG=go1.14.2-node ``` Signed-off-by: Raúl Naveiras <[email protected]> * Update golang:1.14.2 for github actions Signed-off-by: Raúl Naveiras <[email protected]> * Update CHANGELOG Signed-off-by: Raúl Naveiras <[email protected]> * Fix yaml indentation Signed-off-by: Raúl Naveiras <[email protected]> * Added Bartek as next release shepherd. (#2556) Signed-off-by: Bartlomiej Plotka <[email protected]> * receive: Add support for TSDB per tenant (#2012) * receive: Add support for TSDB per tenant Signed-off-by: Frederic Branczyk <[email protected]> * pkg/store: Merge SeriesSets of multiple TSDB stores This is required as the Series gRPC method of the StoreAPI requires the Series returned to be sorted. Signed-off-by: Frederic Branczyk <[email protected]> * pkg/receive: Add multitsdb shipper support Signed-off-by: Frederic Branczyk <[email protected]> * Address comments Signed-off-by: Frederic Branczyk <[email protected]> * Add more comments on types and functions Signed-off-by: Frederic Branczyk <[email protected]> * pkg/store/multitsdb.go: Remove unused struct field Signed-off-by: Frederic Branczyk <[email protected]> * pkg/receive/multitsdb.go: Remove unused Close method TSDBs are implicitly closed by flushing the database, which is ensured on shutdown, hence there is no need to have the explicit close method. Signed-off-by: Frederic Branczyk <[email protected]> * pkg/store/multitsdb.go: Make errors and warnings tenant aware Signed-off-by: Frederic Branczyk <[email protected]> * pkg/store/multitsdb.go: Consistent tenant aware errors and warnings Signed-off-by: Frederic Branczyk <[email protected]> * cmd/thanos/receive.go: Auto migrate legacy to multitsdb disk layout (#2557) Signed-off-by: Frederic Branczyk <[email protected]> * Merge 0.12 into master (#2559) * Clear duplicateIDs at the beginning of Filter. (#2544) * Clear duplicateIDs at the beginning of Filter. Signed-off-by: Peter Štibraný <[email protected]> * CHANGELOG.md Signed-off-by: Peter Štibraný <[email protected]> * Address review feedback. Signed-off-by: Peter Štibraný <[email protected]> * Fix whitespace noise. Signed-off-by: Peter Štibraný <[email protected]> * :whale: :neckbeard: :kick_scooter: Signed-off-by: Peter Štibraný <[email protected]> * cmd: rule: do not wrap reload endpoint with prefix twice (#2533) * cmd: rule: do not wrap reload endpoint with '/' Do not wrap the router with `/` on the `/-/reload` endpoint. Otherwise, it is inaccessible when no prefix has been specified by the user. Signed-off-by: Giedrius Statkevičius <[email protected]> * CHANGELOG: update Signed-off-by: Giedrius Statkevičius <[email protected]> * e2e: rule: add test for reloading rules via /-/reload Add a test-case to the e2e tests for testing whether reloading rules via /-/reload works. Signed-off-by: Giedrius Statkevičius <[email protected]> * VERSION: cut release v0.12.2 (#2545) Signed-off-by: Lucas Servén Marín <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> Co-authored-by: Giedrius Statkevičius <[email protected]> * Revert "Merge 0.12 into master (#2559)" (#2560) This reverts commit 003d245282bd683826304d25d1719c39d7401629. Signed-off-by: Lucas Servén Marín <[email protected]> * querier: Added regressions tests for counter missed reset bug. (#2528) * querier: Added regressions tests for counter missed bug. PR with just tests, not fix yet. Reproduces: https://github.com/thanos-io/thanos/issues/2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. * Removed duplicates for efficiency on promSeriesSet. Signed-off-by: Bartlomiej Plotka <[email protected]> * Addressed Giedrius comments. Signed-off-by: Bartlomiej Plotka <[email protected]> * receive: Use read locks where possible to read tenants (#2563) Signed-off-by: Frederic Branczyk <[email protected]> * receive: Block WAL replay when starting receive component (#2564) Signed-off-by: Frederic Branczyk <[email protected]> * docs: Added mention about thanos-remote-read integration. (#2566) Thanks to G-Research as per: https://cloud-native.slack.com/archives/CL25937SP/p1588687640060200?thread_ts=1588167992.463800&cid=CL25937SP Signed-off-by: Bartlomiej Plotka <[email protected]> * query/storeset: do not close the connection if strict mode enabled (#2568) * query/storeset: do not close the connection if strict mode enabled Do not close the gRPC connection if establishing a connection has succeeded but we have failed to get response to a Info() call. Without this and with strict mode in such a case, we will always keep around a closed connection that won't work anymore unless the whole Thanos Query process will be restarted. Signed-off-by: Giedrius Statkevičius <[email protected]> * query/storeset: add test, add CHANGELOG item Signed-off-by: Giedrius Statkevičius <[email protected]> * Update gitignore with integration tests directory (#2552) Signed-off-by: Ranjith Kumar <[email protected]> * Fixed thanos_compact_garbage_collected_blocks_total metric help (#2572) Signed-off-by: Marco Pracucci <[email protected]> * Chunks caching at bucket level (#2532) * Added generic cache interface. Signed-off-by: Peter Štibraný <[email protected]> * Added memcached implementation of Cache. Signed-off-by: Peter Štibraný <[email protected]> * Chunks-caching bucket. Signed-off-by: Peter Štibraný <[email protected]> * Fix sentences Signed-off-by: Peter Štibraný <[email protected]> * Fix sentences Signed-off-by: Peter Štibraný <[email protected]> * Fix sentences Signed-off-by: Peter Štibraný <[email protected]> * Rename config objects. Signed-off-by: Peter Štibraný <[email protected]> * Review feedback. Signed-off-by: Peter Štibraný <[email protected]> * Review feedback. Signed-off-by: Peter Štibraný <[email protected]> * Added metrics for object size. Signed-off-by: Peter Štibraný <[email protected]> * Added requested chunk bytes metric. Signed-off-by: Peter Štibraný <[email protected]> * Caching bucket docs. Signed-off-by: Peter Štibraný <[email protected]> * Fixed tests. Signed-off-by: Peter Štibraný <[email protected]> * Fix test. Signed-off-by: Peter Štibraný <[email protected]> * Update docs/components/store.md Update pkg/store/cache/caching_bucket.go Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Peter Štibraný <[email protected]> * Dots Signed-off-by: Peter Štibraný <[email protected]> * Always set lastBlockOffset. Signed-off-by: Peter Štibraný <[email protected]> * Merged cached metric into fetched metric, added labels. Signed-off-by: Peter Štibraný <[email protected]> * Added CHANGELOG.md entry Signed-off-by: Peter Štibraný <[email protected]> * Reworded help for thanos_store_bucket_cache_fetched_chunk_bytes_total Signed-off-by: Peter Štibraný <[email protected]> * Added tracing around getRangeChunkFile method. Signed-off-by: Peter Štibraný <[email protected]> * Updated CHANGELOG.md Signed-off-by: Peter Štibraný <[email protected]> * Options Signed-off-by: Peter Štibraný <[email protected]> * Fix parameter name. (store. got dropped by accident) Signed-off-by: Peter Štibraný <[email protected]> * Use embedded Bucket Signed-off-by: Peter Štibraný <[email protected]> * Added comments. Signed-off-by: Peter Štibraný <[email protected]> * Fixed comment. Signed-off-by: Peter Štibraný <[email protected]> * Hide store.caching-bucket.config flags. Signed-off-by: Peter Štibraný <[email protected]> * Renamed block to subrange. Signed-off-by: Peter Štibraný <[email protected]> * Renamed block to subrange. Signed-off-by: Peter Štibraný <[email protected]> * Header Signed-off-by: Peter Štibraný <[email protected]> * Added TODO Signed-off-by: Peter Štibraný <[email protected]> * Removed TODO, in favor of creating issue. Signed-off-by: Peter Štibraný <[email protected]> * Use NopCloser. Signed-off-by: Peter Štibraný <[email protected]> Co-authored-by: Marco Pracucci <[email protected]> * Reword block deletion comments and logs in compactor (#2574) Signed-off-by: Marco Pracucci <[email protected]> * Coding Style typos and a few grammar improvements (#2448) Changes mainly made for consistency, like section headers being in imperative tense: "do this thing" instead of "this is the thing" Signed-off-by: Stephen Weber <[email protected]> * quickstart: fix bucket web after recent changes (#2580) The subcommand is called now `tools bucket web` after the recent changes. Without this, the quickstart script outputs: ``` Error parsing commandline arguments: expected command but got "bucket" thanos: error: expected command but got "bucket" ``` Signed-off-by: Giedrius Statkevičius <[email protected]> * Fix typo on reload function (#2584) Signed-off-by: Joel Bastos <[email protected]> * Refactor of commands and flag parsing for sidecar (#2267) Signed-off-by: Philip Gough <[email protected]> * ui: add new React UI from Prometheus (#2412) * ui: add React UI from upstream Prometheus Signed-off-by: Adrien Fillon <[email protected]> * ui: incorporate new changes from Prometheus React UI Signed-off-by: Prem Kumar <[email protected]> * ui: adapted the React UI to Thanos Signed-off-by: Prem Kumar <[email protected]> Co-authored-by: Adrien Fillon <[email protected]> Co-authored-by: Giedrius Statkevičius <[email protected]> * Fix minor typos (#2586) Signed-off-by: Pierre-Yves Aillet <[email protected]> * react: update deps (#2589) * react: graph/panel: revert changes temporarily Signed-off-by: Giedrius Statkevičius <[email protected]> * react-app: apply 'Update React vendoring' Add the commit https://github.com/prometheus/prometheus/commit/65a19421a42c69e16241eec24c66b98e4c8fa5da via a 3-way merge. Signed-off-by: Giedrius Statkevičius <[email protected]> * ui/react-app: update yarn deps Should fix security warnings. Ported from https://github.com/prometheus/prometheus/commit/24ecae995691dabf782a6b4a7464f7aab561b554. Signed-off-by: Giedrius Statkevičius <[email protected]> * ui: update bindata Signed-off-by: Giedrius Statkevičius <[email protected]> * Makefile: remove --coverage from test run (#2591) Found out that there is some weird interaction between `jest --coverage` and `babel-plugin-istanbul`. Maybe related to: https://github.com/facebook/jest/issues/6827. From my testing, removing `--coverage` makes this work again. Probably worth investigating in the future why that happens. Also, this is really not needed during CI because we do not use the coverage data anywhere anyway. Signed-off-by: Giedrius Statkevičius <[email protected]> * ci: use GitHub Actions to test React UI (#2595) * ci: test React UI using GitHub actions Signed-off-by: Prem Kumar <[email protected]> * ci: remove react-app-test from CircleCI as we now use GH Actions Signed-off-by: Prem Kumar <[email protected]> * pkg/ui: bump jQuery to 3.5.0 (#2597) Signed-off-by: Lucas Servén Marín <[email protected]> * Added receiver multidb unit tests for basic cases. (#2593) Unfortunately, all passes. ): Signed-off-by: Bartlomiej Plotka <[email protected]> * Fixed make docs; Updated last disprepancies. (#2611) Signed-off-by: Bartlomiej Plotka <[email protected]> * mixin: Alert on receive not uploading recent data (#2612) Signed-off-by: Frederic Branczyk <[email protected]> * Metadata caching in bucket (#2579) * Added caching for Iter. Signed-off-by: Peter Štibraný <[email protected]> * Added cache for Exists call for meta-files. Signed-off-by: Peter Štibraný <[email protected]> * Added cache for reading block metadata files. Signed-off-by: Peter Štibraný <[email protected]> * Make caching bucket configurable with different caches for different type of objects. Signed-off-by: Peter Štibraný <[email protected]> * Fixed tests. Signed-off-by: Peter Štibraný <[email protected]> * Added caching for ObjectSize. Enabled caching of index. Signed-off-by: Peter Štibraný <[email protected]> * Lint feedback. Signed-off-by: Peter Štibraný <[email protected]> * Use single set of metrics for all operations. Signed-off-by: Peter Štibraný <[email protected]> * Constants. Signed-off-by: Peter Štibraný <[email protected]> * Use operation specific config. Generic configuration is only for user. Signed-off-by: Peter Štibraný <[email protected]> * Fix typo, make lint happy. Signed-off-by: Peter Štibraný <[email protected]> * Simplify constants. Signed-off-by: Peter Štibraný <[email protected]> * Simplify caching configuration. Signed-off-by: Peter Štibraný <[email protected]> * Refactor cache configuration. Configuration is now passed to the cache when created. Signed-off-by: Peter Štibraný <[email protected]> * Review feedback. Signed-off-by: Peter Štibraný <[email protected]> * Fix operationRequests and operationHits for getRange. Signed-off-by: Peter Štibraný <[email protected]> * Make codec for Iter results configurable. Signed-off-by: Peter Štibraný <[email protected]> * Added header. Signed-off-by: Peter Štibraný <[email protected]> * Renamed "dir" config to "blocks-iter". Signed-off-by: Peter Štibraný <[email protected]> * Bump default values for meta exists/doesntExist ttls. Signed-off-by: Peter Štibraný <[email protected]> * Removed example how cache could be configured for index. Signed-off-by: Peter Štibraný <[email protected]> * Address review feedback. Signed-off-by: Peter Štibraný <[email protected]> * Get now implements streaming reader, and buffers object in memory. Signed-off-by: Peter Štibraný <[email protected]> * Added test for partial read. Signed-off-by: Peter Štibraný <[email protected]> * Removed unused function. Signed-off-by: Peter Štibraný <[email protected]> * Updated the help message for --data-di…
We see this happening with Swift. Because the consistency of swift is eventual, swift sometimes didn't process the deletion of the meta file yet, and so it turns up in the bkt.Iter(). The second deletion then causes a 404 and compaction fails.
Changes
Verification