Skip to content

fscarponi/htmlResourceDownloader

Repository files navigation

HtmlResource downloader

HtmlResource downloader is used for efficiently download massive amount of resources (files) from websites

warning:

App strategy is very aggressive, he recursively go deep into web structure, be sure to set parameter with cognition

Al files are downloaded by a different coroutines whit the same http ktor client, it will cause a bandwidth saturation
Usage:
  • Clone Repo -> Distributable
  • Actually you can build a distributable though gradle task!
    Application needs in the same dir a parameters.json file with parameters for start the task
    see DataStructure.kt/Parameters for json structure

  • Clone Repo -> Run From IDE
  • You can set parameters in main, and if there is not a parameters.json in the mail folder, it will be created!
    Remember to delete parameter.json if you want to reset them
Output:
  • all files will be downloaded in "outputFolder", without lose path structure
  • all skip preference (see parameteres) will be attended
  • if some file throw some kind of error will be skipped and signed in skippedFile.csv, with relative error description
  • detailed and annoying infos are printed on console

About

a massive resources downloader using ktor and skrape

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages