Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store files with both Paperclip and ActiveStorage #4598

Merged
merged 18 commits into from
Sep 24, 2021
Merged

Conversation

javierm
Copy link
Member

@javierm javierm commented Jul 26, 2021

References

Objectives

  • Make it possible to replace Paperclip with Active Storage in the future

Notes

In order to migrate existing files from Paperclip to ActiveStorage, we need Paperclip to find out the files associated to existing database records. So we can't simply replace Paperclip with ActiveStorage.

That's why it's usually recommended to first run the migration and then replace Paperclip with ActiveStorage using two consecutive deployments.

However, in our case we can't rely on two consecutive deployments because we have to make an easy process so existing CONSUL installations don't run into any issues. We can't just release version 1.4.0 and 1.5.0 the same day and ask everyone to upgrade twice in a row.

Instead, we're following a different plan:

  • We provide a Rake task (which will require Paperclip) to migrate existing files
  • We still use Paperclip to generate link and image tags
  • New files are handled using both Paperclip and ActiveStorage; that way, when we make the switch, we won't have to migrate them, and in the meantime they'll be accessible thanks to Paperclip
  • After we make the switch (which will happen after releasing version 1.4.0), we'll update the name column in the active storage attachments tables in order to remove the storage_ prefix

Testing this pull request

Before merging this pull request, we need to test this scenario:

  • Make sure we've got records with attachments for all models handling attachments
  • Run the migrate_from_paperclip task
  • Attach files to both new and existing records from all models handling attachments
  • Apply the changes in Remove Paperclip and use just Active Storage #4600 and run the remove_paperclip_compatibility_in_existing_attachments task
  • Check images and documents attached before and after running the migrate_from_paperclip are correctly displayed

We need to do so with both a local storage and a remote storage.

Release Notes

⚠️ If you've added custom attachments, make sure to change these attachments so they use has_attachment instead of has_attached_file. Otherwise you might run into missing files when upgrading to version 1.5.0.

In version 1.5.0 we'll change the way we store attachments from Paperclip to Active Storage. In order to make the migration painless, version 1.4.0 stores new files using both systems. If you've changed anything related to the way attachments are handled, there's a chance you might run into some issues when executing consul:execute_release_tasks (which includes a task to copy existing files to Active Storage). In version 1.5.0 we'll drop Paperclip and use just Active Storage, so in the case you run into problems when running the migration, make sure you fix them or report them; otherwise you might run into missing files when upgrading to version 1.5.0.

After copying the existing files to Active Storage, keep both the Active Storage files and the Paperclip files. The Paperclip files are needed in version 1.4.0, and the Active Storage files will be needed in version 1.5.0.

If you're storing files using an external service (such as S3), before executing the release tasks edit the config/storage.yml file and configure the service. Sample configurations are provided for S3, GCS and Azure; uncomment them where appropriate and then edit your config/secrets.yml file with your credentials. Then edit your config/environments/production.rb file and add config.active_storage.service = :s3 (or :gcs or :azure, depending on the service you're using).

@javierm javierm self-assigned this Jul 26, 2021
@javierm javierm added this to Reviewing in Consul Democracy via automation Jul 26, 2021
@javierm javierm force-pushed the active_storage_dual branch 2 times, most recently from f00ee0f to a02dfb4 Compare July 27, 2021 01:21
@javierm javierm marked this pull request as draft July 27, 2021 11:41
@javierm javierm moved this from Reviewing to Doing in Consul Democracy Jul 27, 2021
@javierm javierm marked this pull request as ready for review July 27, 2021 14:27
@javierm javierm moved this from Doing to Reviewing in Consul Democracy Jul 27, 2021
@@ -22,7 +22,7 @@ def deploysecret(key)
set :use_sudo, false

set :linked_files, %w[config/database.yml config/secrets.yml]
set :linked_dirs, %w[.bundle log tmp public/system public/assets public/ckeditor_assets public/machine_learning/data]
set :linked_dirs, %w[.bundle log tmp public/system public/assets public/ckeditor_assets public/machine_learning/data storage]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layout/LineLength: Line is too long. [125/110] (https://rubystyle.guide#max-line-length)

@javierm javierm force-pushed the active_storage_dual branch 2 times, most recently from 168b32d to c1574f2 Compare September 23, 2021 21:49
Base automatically changed from hanging_tests to master September 24, 2021 11:38
javierm and others added 18 commits September 24, 2021 13:39
This reverts commit 7496c1b.
This migration was generated using the `active_storage:install` task.
ActiveStorage support was added to CKEditor in version 5.1.0. However,
we can't upgrade to version 5.0.0 or later because it only supports
loading CKEditor via CDN.

So we're copying the code related to ActiveStorage instead. I tried to
move it to the `vendor/` folder, but couldn't find a way to load it from
there and if I found one I wouldn't be sure it'd work on production
environments.
In order to migrate existing files from Paperclip to ActiveStorage, we
need Paperclip to find out the files associated to existing database
records. So we can't simply replace Paperclip with ActiveStorage.

That's why it's usually recommended [1] to first run the migration and
then replace Paperclip with ActiveStorage using two consecutive
deployments.

However, in our case we can't rely on two consecutive deployments
because we have to make an easy process so existing CONSUL installations
don't run into any issues. We can't just release version 1.4.0 and 1.5.0
and day and ask everyone to upgrade twice on the same day.

Instead, we're following a different plan:

* We're going to provide a Rake task (which will require Paperclip) to
  migrate existing files
* We still use Paperclip to generate link and image tags
* New files are handled using both Paperclip and ActiveStorage; that
  way, when we make the switch, we won't have to migrate them, and in
  the meantime they'll be accessible thanks to Paperclip
* After we make the switch, we'll update the `name` column in the active
  storage attachments tables in order to remove the `storage_` prefix

Regarding our handling of new files, the exception are cached
attachments. Since those attachments are temporary files used while
submitting a form and we have to delete them afterwards, we're only
handling them with Paperclip. We'll handle these ones in version 1.5.0.

Note the task creating the dev seeds was failing after these changes
with an `ActiveStorage::IntegrityError` exception because we were
opening some files without closing them. If the same file was attached
twice, it failed the second time.

We're solving it by closing the files with `File.open` and a block. Even
though we didn't get any errors, we're doing the same thing in the
`Attachable` concern because it's a good practice to close files after
we're done with them.

Also note we have to change the CKEditor Active Storage code so it's
compatible with Paperclip. In this case, I haven't been able to write a
test to confirm the attachment exists; I was getting the same
`ActiveStorage::IntegrityError` mentioned above.

Finally, we're updating the site customization image controller to use
`update` so the image and the attachment are updated within the same
transaction. This is also what we do in most controllers.

[1] https://www.youtube.com/watch?v=tZ_WNUytO9o
This task was copied from Thoughtbot's migration guide [1]. They use a
migration file, but we usually use rake tasks to migrate data.

[1] https://github.com/thoughtbot/paperclip/blob/master/MIGRATING.md
We can use the `ActiveStorage::Blob` class to find where the file is
supposed to be stored instead of manually setting the path. This is also
more robust because it works with Active Storage configurations which
don't store files in the default folder.
This model is added by the devise-security gem, but we don't use it. We
were getting an error while migrating:

ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "old_passwords" does not exist
LINE 8:                WHERE a.attrelid = '"old_passwords"'::regclas...
                                          ^
SELECT a.attname, format_type(a.atttypid, a.atttypmod),
     pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
     c.collname, col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = '"old_passwords"'::regclass
  AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum
lib/tasks/active_storage.rake:27:in `block (4 levels) in <top (required)>'
lib/tasks/active_storage.rake:26:in `each'
lib/tasks/active_storage.rake:26:in `block (3 levels) in <top (required)>'
lib/tasks/active_storage.rake:25:in `block (2 levels) in <top (required)>'

Caused by:
PG::UndefinedTable: ERROR: relation "old_passwords" does not exist
Using `LASTVAL()` wasn't working 100% of the time for some reason after
inserting and deleting some rows. So we're using `RETURNING` instead.
Just like we add the `storage_` prefix for new records so we can use
both Active Storage and Paperclip at the same time.

Now the migration actually works, at least for basic cases.
We were getting a duplicate prepared statement error when trying to
execute more than one test using this task. So we're checking whether
the prepared statement exists before defining it.
Files might be missing for whatever reason or records might not point to
any files; in these edge cases, we were getting an exception.
This way we reduce the hypothetical problems we could find if executing
the task several times.
There could be inconsistencies in the database and an attachment might
have a `record_id` pointing to a record which no longer exist. We were
getting an exception in this case.
We were having issues with cached attachments and external services.

A `Tempfile` is returned by `URI.open` when using S3, so we're dealing
with this case as well.
So forks that connected Paperclip with S3 buckets can migrate files too.
These configurations are based on the ones ActiveStorage generates
automatically for Rails 6 applications.
We're already using a custom controller to handle direct uploads.

Besides, as mentioned by one of Active Storage co-authors [1], the
Active Storage DirectUploadsController doesn't provide any
authentication or validation at all, meaning anyone could create blobs
in our database by posting to `/rails/active_storage/direct_uploads`.
The response there could be then used to upload any file (again, without
validation) to `/rails/active_storage/disk/`.

For now, we're monkey-patching the controllers in order to send
unauthorized responses, since we aren't using these routes. If we ever
enable direct uploads with Active Storage, we'll have to add some sort
of authentication.

Similar upload solutions like CKEditor don't have this issue since their
controllers inherit from `ApplicationController` (which includes
authorization rules), while Active Storage controllers inherit from
`ActionController::Base`.

[1] https://discuss.rubyonrails.org/t/activestorage-direct-uploads-safe-by-default-how-to-make-it-safe/74863/2
Consul Democracy automation moved this from Reviewing to Testing Sep 24, 2021
@javierm javierm merged commit 6cf3c74 into master Sep 24, 2021
Consul Democracy automation moved this from Testing to Release 1.4.0 Sep 24, 2021
@javierm javierm deleted the active_storage_dual branch September 24, 2021 14:31
@taitus taitus mentioned this pull request Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants