Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement scraping movies by URL #709

Merged
merged 8 commits into from
Aug 10, 2020

Conversation

woodgen
Copy link
Contributor

@woodgen woodgen commented Aug 7, 2020

It's the first time I really touched go or TypeScript, so a careful review is appreciated.

This series of patches implements the ability to scrape movies by URL.
It supports things like duration, studio and front + back image.
An example xpath scraper for Gamma Entertainment can be found on my community scrapers branch:
https://github.com/woodgen/CommunityScrapers/tree/movies

I tested the scraper with Evil Angel and it works well for creating new and updating existing movies by URL.

For more details I paste the commit messages here, since github will hide them:

    ui/movies: Add movie scrape dialog.
    
    Adds possibility to update existing movie entries with the URL scraper.
    
    For this the MovieScrapeDialog.tsx was implemented with Performers and
    Scenes as a reference. In addition DurationUtils needs to be called one
    time for converting seconds from the model to the string that is
    displayed in the component. This seemed the least intrusive to me as it
    kept a ScrapeResult<string> type compatible with ScrapedInputGroupRow.
    graphql+pkg+ui: Scrape movie studio.
    
    Extends and corrects the movie model for the ability to store and
    dereference studio IDs with received studio string from the scraper.
    This was done with Scenes as a reference. For simplicity the duplication
    of having `ScrapedMovieStudio` and `ScrapedSceneStudio` was kept, which
    should probably be refactored to be the same type in the model in the
    future.
    graphql+pkg+ui: Implement scraping movies by URL.
    
    This patch implements the missing required boilerplate for scraping
    movies by URL, using performers and scenes as a reference.
    
    Although this patch contains a big chunck of ground work for enabling
    scraping movies by fragment, the feature would require additional
    changes to be completely implemented and was not tested.

woodgen added 4 commits August 8, 2020 00:16
This patch implements the missing required boilerplate for scraping
movies by URL, using performers and scenes as a reference.

Although this patch contains a big chunck of ground work for enabling
scraping movies by fragment, the feature would require additional
changes to be completely implemented and was not tested.
Extends and corrects the movie model for the ability to store and
dereference studio IDs with received studio string from the scraper.
This was done with Scenes as a reference. For simplicity the duplication
of having `ScrapedMovieStudio` and `ScrapedSceneStudio` was kept, which
should probably be refactored to be the same type in the model in the
future.
Adds possibility to update existing movie entries with the URL scraper.

For this the MovieScrapeDialog.tsx was implemented with Performers and
Scenes as a reference. In addition DurationUtils needs to be called one
time for converting seconds from the model to the string that is
displayed in the component. This seemed the least intrusive to me as it
kept a ScrapeResult<string> type compatible with ScrapedInputGroupRow.
@WithoutPants WithoutPants added the feature Pull requests that add a new feature label Aug 10, 2020
@WithoutPants WithoutPants added this to the Version 0.3.0 milestone Aug 10, 2020
Copy link
Collaborator

@WithoutPants WithoutPants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the submission.

@WithoutPants WithoutPants merged commit 4045ddf into stashapp:develop Aug 10, 2020
Tweeticoats pushed a commit to Tweeticoats/stash that referenced this pull request Feb 1, 2021
* api/urlbuilders/movie: Auto format.

* graphql+pkg+ui: Implement scraping movies by URL.

This patch implements the missing required boilerplate for scraping
movies by URL, using performers and scenes as a reference.

Although this patch contains a big chunck of ground work for enabling
scraping movies by fragment, the feature would require additional
changes to be completely implemented and was not tested.

* graphql+pkg+ui: Scrape movie studio.

Extends and corrects the movie model for the ability to store and
dereference studio IDs with received studio string from the scraper.
This was done with Scenes as a reference. For simplicity the duplication
of having `ScrapedMovieStudio` and `ScrapedSceneStudio` was kept, which
should probably be refactored to be the same type in the model in the
future.

* ui/movies: Add movie scrape dialog.

Adds possibility to update existing movie entries with the URL scraper.

For this the MovieScrapeDialog.tsx was implemented with Performers and
Scenes as a reference. In addition DurationUtils needs to be called one
time for converting seconds from the model to the string that is
displayed in the component. This seemed the least intrusive to me as it
kept a ScrapeResult<string> type compatible with ScrapedInputGroupRow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Pull requests that add a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants