Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3Cache improperly requires ListBucket permission (incurs higher cost) #609

Open
daniel-keller opened this issue Sep 28, 2022 · 2 comments

Comments

@daniel-keller
Copy link

TL;DR:
I don't think Cantaloupe should use or require "ListBucket" AWS S3 permissions to use S3 Cache or S3 Source (List incurs higher AWS cost the Get requests). Without the "ListBucket" permissions Cantaloupe can read images from an S3 Source and read/write manifests to S3 Cache but it breaks when writing images to the S3 Cache. Any insight on why this is?

Evidence of high ListBucket calls when implementing Cantaloupe.
Cantaloupe makes 100K ListBucket API calls but only 800 GetObject API calls.
Screen Shot 2022-09-28 at 4 24 11 PM
Screen Shot 2022-09-28 at 4 23 56 PM

Long versions:
I recently setup cantaloupe in a docker container on AWS using S3 two buckets as source and cache. The setup works well and I'm happy with the performance. I've noticed an unusually high number of "ListBucket" S3 API requests being made by Cantaloupe that are relatively high cost. I'm wondering why Cantaloupe needs to "ListBucket" since I am providing the bucket name and region for both the cache and the source.

When I disabling ListBucket permissions on my AWS user (keeping other permissions e.g. ReadObject, WriteObject, DeleteObject, etc.), Cantaloupe successful reads images/manifests from the source and writes manifests to the cache but fails to write images to the cache.

Examples:

Going to https://localhost:8182/iiif/3/my_test_image.jpg/info.json logs a "warning" (as I expect if Cantaloupe is calling ListBucket and being denied) but the manifest.json file is still delivered AND cached.

cantaloupe-iiif-server-cantaloupe-1  | 20:40:44.756 [qtp593415583-12] WARN  e.i.l.c.c.InfoService - getOrReadInfo(): software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: 2KSGKZVB6HCCD33R, Extended Request ID: zpfKiHuSviC0T3RnGY5nZwQgKkiDruOicV9v6oGe28KWnhdblj1hk0CGwU9jwdnh/n2qQjZpKxg=)

However going to https://localhost:8182/iiif/3/my_test_image.jpg/full/max/0/default.jpg logs an access denied "error" and the image is delivered from the source but not written to the cache.

cantaloupe-iiif-server-cantaloupe-1  | 20:44:15.545 [qtp593415583-15] ERROR e.i.l.c.r.ImageRequestHandler - software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: DA0RZSGY0BSNAKHW, Extended Request ID: WAt93dcyza4WtIxRFkrRPI+g9qGNFLCHqA7l4BQdodbq7JYM+h5kUO5N0Adr/auBhgluZ3+g8eQ=)

Since I can't find "ListBucket" in the Cantaloupe source code I assume it's being used internally by the AWS SDK. And I suspect it has something to do with the "GetObjectRequest" api call made here.
AWS SDK docs for "GetObjectRequest" notes if ListBucket is not permitted "Access Denied" will be thrown instead of "No Such Key".

My theory is this: Cantaloupe queries the cache for the key and, not finding it, receives "Access Denied" but expects to receive "No Such Key". "Access Denied" doesn't necessary imply "no access to the bucket" but could also mean no access to list the bucket.

@daniel-keller
Copy link
Author

daniel-keller commented Sep 28, 2022

Possible Solution:
I think the root cause is here. In this if statement a "403" response could also mean "No Such Key" and not actually "Access Denied". I believe 403 should be added as an exception to this if statement. In the case that the user really doesn't have permission another error will be thrown when reading from the source or writing to the cache preventing malicious behavior.

@daniel-keller daniel-keller changed the title S3Cache incurs S3Cache improperly requires ListBucket permission (incurs higher cost) Sep 28, 2022
@DiegoPino
Copy link
Contributor

@daniel-keller this is still an issue. We have made some inquiries to AWS about this (and waiting for an official response about inner workings) but after reading through all the code here and checking the SDK properly, my current perception is that the issue might be actually related to the walkObjects method on the S3 utils class, used not for getting but for purging. Since Cantaloupe does not keep any local state of which caches it manages/generated in the past and depends on time changed to decide on Cache purging, there is a BIG chance that it is over-doing the listing of Objects on the cache/image cache prefix (and also does not cache that response once it traversed the whole bucket). And thus the costs are going nuts.

We have many Cantaloupes running and this is becoming an issue. Sadly you can not deny the s3:ListBucket permission if you want to have cache expiring at all, even if we code the 403 response around (that line you pointed) the GET part. I would love to explore optimizing (even if as an option) that listing, maybe by keeping a local cache of generated derivative prefixes so the listing is less expensive. We will give a few server without expiration of caches a try for a month to check if that is the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants