Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filesystem to eager load select paths #3630

Draft
wants to merge 6 commits into
base: 7.dev
Choose a base branch
from

Conversation

bryannielsen
Copy link
Contributor

Update filemanager to use this functionality instead of attempting to eager load the entire contents. The old behavior was problematic with remote filesystems containing many files. Related to #2564

This change allows adapters to implement functionality in an eagerLoadPaths($paths = []) method when applicable.

CachedAdapter contains a lot of duplicate code from Flysystem's Cached\CachedAdapter which it extends because the parent's private properties do not lend themselves to a simpler subclass.

Update filemanager to use this functionality instead of attempting to eager load the entire contents.  The old behavior was problematic with remote filesystems containing many files.  Related to #2564
@bryannielsen bryannielsen added the enhancement New feature or request label Jul 24, 2023
@bryannielsen bryannielsen added this to the 7.3.9 milestone Jul 24, 2023
Copy link
Contributor

@intoeetive intoeetive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryannielsen - see the couple of comments I had

Comment on lines 298 to 299
if ($file->UploadDestination->adapter != 'local' && $file->UploadDestination->exists()) {
if(!array_key_exists($file->upload_location_id, $carry)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to swap these lines? I think most of the times the files would be sharing upload destination, so checking whether we already have the info before going into UploadDestination details should be more effective. Also might be worth checking if UploadDestination is not null - for edge cases when the folder record has been removed, but the file record stays - just so it wouldn't bomp out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine the way it is, that line 298 is just the old line 306 (https://github.com/ExpressionEngine/ExpressionEngine/pull/3630/files/6135f94929ff29b3d01e8a8b44d200d8a65e8b75#diff-cd5b889f5e6fcff6954cdafc0347f4ecdc532ad6e03ce5c5d115ce956fc32644L306) and the UploadDestination relationship gets eager loaded so the performance impact should be trivial. We would also have to do this same check later when $carry hasn't been set yet and I think this way is a little more readable.

$paths[] = $file->getAbsoluteManipulationPath('thumbs');
}

$uploadDestination->getFilesystem()->eagerLoadPaths($paths);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check if eagerLoadPath is available method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filesystem has an implementation for eagerLoadPaths() so we don't need to check here, but inside that function it checks to see whether the current adapter has an implementation before calling it

@intoeetive intoeetive modified the milestones: 7.3.9, 7.3.10 Jul 28, 2023
Copy link
Contributor

@intoeetive intoeetive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using CloudFiles 1.0 (did not test 1.1) the subfolders don't show up. They work fine on current 7.dev

@bryannielsen
Copy link
Contributor Author

Good catch Yuri, this latest push should resolve that issue

Copy link
Contributor

@intoeetive intoeetive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryannielsen on my test, I has S3 directory with 5 images and a subfolder. Without your changes, the load time is ~3 sec. With your changes, it's 5-6 sec. Tested multiple times, numbers staid the same. So IDK if it really makes any good, at least without hundreds of files in the directory (which is rather edge case).
I don't think we should make it better for 5% of people and worse for everyone else.
Also - unrelated - your code is missing some spaces (see PSR-12 notes on Files tab)

@bryannielsen
Copy link
Contributor Author

bryannielsen commented Aug 9, 2023

@intoeetive did you upgrade to the latest Cloud Files? The performance should be better and more consistent in 1.1.0+ whether you have 5 or 5,000 files in a bucket. Without this change and the updated version of Cloud Files we can have control panel timeouts on the file listing page. The experience may be slightly degraded with older versions of Cloud Files that aren't setup to run parallel requests.

I think I can make a change to eagerLoadPaths() so that if an adapter does not support this we do the old eager load behavior of listing the entire directory. Which will preserve better load times for smaller buckets but still break things on large buckets.

@intoeetive
Copy link
Contributor

@bryannielsen I updated to CP 1.1 but the picture did not change.

I then tried uploading more files (~80) and the load time increased for current 7.dev (4 sec) but still remains 5-6 sec when using this branch.
So perhaps this change only makes difference when you have large number of files (100+). If that is so, I think we need to check for files number and only apply this method for that case

intoeetive
intoeetive previously approved these changes Aug 9, 2023
Copy link
Contributor

@intoeetive intoeetive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about if, if people are using S3, they probably have large number of files - otherwise they have no reason to use it

so this change is probably fine, as long as it does not affect local adapter

@matthewjohns0n
Copy link
Member

matthewjohns0n commented Aug 14, 2023

My tests were consistent with Yuri's. I have 13 images in an S3 bucket. (Cloud Files 1.1.0).

7.x: ~1.9s average
this branch: 3.0s average

@bryannielsen
Copy link
Contributor Author

Interesting, I'll add mine here too for comparison

28 files in S3
7.x - 1.2 seconds
this branch - 1.5 seconds

@robinsowell maybe we can bother you for another data point

@bryannielsen bryannielsen modified the milestones: 7.3.11, 7.3.12 Aug 14, 2023
@robinsowell
Copy link
Contributor

My results don't show a lot of difference between the two, but the older version is a touch faster on average. Mostly focused on my largest directory- around 200 image files.

'Finish' old version ~ 1.8
'Finish' new version ~ 2.2s

New was faster a few times, tried in FF and FF incognito. So.... IDK.

@robinsowell
Copy link
Contributor

Just pondering, I keep browser cache disabled when testing. I wonder if that messes with the results.

@robinsowell
Copy link
Contributor

@bryannielsen totally unrelated to testing this, I put a channel form on the frontend to test some wygwam thing and I'm getting a php memory error- my memory set to 128.

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /Users/robinsowell/Sites/738/system/user/addons/cloud_files/vendor-build/aws/aws-sdk-php/src/data/s3/2006-03-01/endpoint-rule-set-1.json.php on line 6```

@robinsowell
Copy link
Contributor

Circling back, it looks like it's only happening with my wygwam field and only in the channel entry form. And it happens in the currently released v7 as well. SO nothing to do with this pr. Looks like a wygwam issue.

@intoeetive intoeetive marked this pull request as draft September 6, 2023 13:11
@intoeetive intoeetive modified the milestones: 7.3.12, 7.x Sep 6, 2023
Eager load an entire destination if using large pages or only one destination otherwise load specific files.  Improve performance of remote filesystem sync by moving mimetype filtering from initial dispatch to individual sync jobs and use eager loading in job.
@intoeetive
Copy link
Contributor

@bryannielsen there's now a code conflict

@intoeetive intoeetive modified the milestones: 7.x, 7.4.3 Feb 19, 2024
@intoeetive intoeetive modified the milestones: 7.4.4, 7.4.5 Mar 11, 2024
@bryannielsen bryannielsen marked this pull request as draft March 18, 2024 13:51
@intoeetive intoeetive modified the milestones: 7.4.5, 7.4.6 Mar 19, 2024
@bryannielsen bryannielsen modified the milestones: 7.4.6, 7.x Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants