Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try URL before creating / uploading Matterfile #179

Closed
liamphmurphy opened this issue Apr 6, 2022 · 7 comments · Fixed by #187
Closed

Try URL before creating / uploading Matterfile #179

liamphmurphy opened this issue Apr 6, 2022 · 7 comments · Fixed by #187
Labels
enhancement New feature or request

Comments

@liamphmurphy
Copy link
Contributor

liamphmurphy commented Apr 6, 2022

Feature Description

This feature (enhancement) would add a bit more resiliency to the creation of Matterfile's seen here:

matter_file_db_model = db_functions.create_matter_file(
matter_ref=matter_db_model,
supporting_file=supporting_file,
credentials_file=credentials_file,
)
matter_file_db_model = db_functions.upload_db_model(
db_model=matter_file_db_model,
credentials_file=credentials_file,
)

Specifically, we want to check that the supporting_file.uri used in the creation of the Matterfile actually exists before creating / uploading the Matterfile.

Use Case

Simply the use case here is to create better Matterfile data by ensuring that the URL's used exist.

Solution

In the above linked code, before the call to create_matter_file, do the following:

  1. Make a call to try_url (using the default resolve_func) to see if the URL exists.
  2. IF exists, proceed with the creation / uploading of the Matterfile.
  3. IF not exists, add LookupError as an exception to watch for in that loop, as try_url will raise that exception on errors.

Alternatives

Let me know if any other options are better.

@liamphmurphy liamphmurphy added the enhancement New feature or request label Apr 6, 2022
@liamphmurphy
Copy link
Contributor Author

Question

If the URL does exist but it turns out the call to try_url returns an http URL, should we have any special logic for that? Not familiar enough with the data yet to be sure, but I know that in some cases we have to host the specified resource if it is not HTTPS to workaround Firefox errors.

@evamaxfield
Copy link
Member

evamaxfield commented Apr 6, 2022

In the above linked code, before the call to create_matter_file, do the following:

Make a call to try_url (using the default resolve_func) to see if the URL exists.
IF exists, proceed with the creation / uploading of the Matterfile.
IF not exists, add LookupError as an exception to watch for in that loop, as try_url will raise that exception on errors.

This is perfect.

Question

If the URL does exist but it turns out the call to try_url returns an http URL, should we have any special logic for that? Not familiar enough with the data yet to be sure, but I know that in some cases we have to host the specified resource if it is not HTTPS to workaround Firefox errors.

Yes, if the returned URL is http we should store it, regardless of it that URL is for MatterFiles, MinutesItemFiles, etc, we should store it. We can store them because we aren't embedding the linked content or anything, we are simply adding a link to the page which will open a new tab. Definitely some issues with Firefox but if we do it properly we should be able to solve that case too.

@liamphmurphy
Copy link
Contributor Author

Thanks for the info @JacksonMaxfield! I just want to double check on the hosting bit... the class declaration for the SupportingFile mentions it is not stored in firestore:

This file is not stored in the CDP file storage system.

Is that still accurate / does it pertain to this at all?

@evamaxfield
Copy link
Member

Great question! And I guess I didn't realize how deep in the weeds this issue was in the CDP infrastructure / pipeline. What you are looking at is our "ingestion models", the format of how our pipeline accepts data. What we are storing is our database models

Regardless, those two things are connected, where you see a SupportingFile in the ingestion metadata, it gets converted into usually a MatterFile or MinutesItemFile for example.

But yes, these files aren't stored in our own file store, we just store a link (or the https or http URL) to the file in our database.

@liamphmurphy
Copy link
Contributor Author

liamphmurphy commented Apr 24, 2022

Apologies for the delay on getting this out, been working on it a bit this weekend and attempting to mock the resource_exists call in this test func

def test_store_event_processing_results(
(so try_url doesn't actually perform HTTP requests) has been causing some problems.

Will update if I have any further progress (hopefully in the form of a PR!).

@evamaxfield
Copy link
Member

I think you should be able to use:

with mock.patch("cdp_backend.database.validators.resource_exists") as mocked_resource_exists:
    mocked_resource_exists.return_value = True

But I may be wrong...

@liamphmurphy
Copy link
Contributor Author

I think you should be able to use:

with mock.patch("cdp_backend.database.validators.resource_exists") as mocked_resource_exists:
    mocked_resource_exists.return_value = True

But I may be wrong...

I tried it with the parameterized patch, I'll try it with that way instead 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants