Implement new Scraper for GitHub Events #119

d-Rickyy-b · 2019-10-07T16:37:42Z

Similar to shhgit (repo link) there could be a new parser which clones a repo and checks files with the given analyzers.

For now this is just a random idea with close to no detailled thoughts on how to implement this. There is the GitHub Events API which is also used by shhgit. Maybe also the source code of shhgit can be used to implement some of the code for pastepwn.

Definition of done

A new directory called 'github' was created in the scraping directory
A new scraper (which is expanding basicscraper) is implemented in the github directory
The new scraper works similar to the pastebin scraper and fetches events from the github events API. Currently it seems that it needs to clone the repo before acting on it. You are free to make suggestions how this should work.

Samyak2 · 2019-11-01T15:47:41Z

Would the scraper need to run all analyzers on all files in the cloned repo? Or would it only download the files?

d-Rickyy-b · 2019-11-01T16:04:21Z

@Samyak2 The scraper would only download the files and put them into a queue similar to the pastebin scraper. Running the analyzers is not the task of the scrapers.

Maybe this architectural illustration can show the inner workings better:

EDIT: The last box should say 'ActionHandler' and not 'AnalyzerHandler'

The interesting part is indeed what kind of files (and how many) should be inserted into the queue. Maybe starting with files from common programming languages would be a good first step I think. But I think you can come up with a great solution. Downloading would be priority 1. The scanning part can be done later.

Samyak2 · 2019-11-04T09:52:06Z

I understand the flow now. I will try to understand the Pastebin scraper and then start work on this.

d-Rickyy-b · 2019-11-04T09:57:03Z

I will update the image later. There are a few issues with it...

But the flow is the same. If you need help anywhere, feel free to contact me. I'll be happy to help.

d-Rickyy-b added enhancement New feature or request hacktoberfest Label for issues suited for the Hacktoberfest event Difficulty: Medium This issue is not easy and not hard to resolve labels Oct 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new Scraper for GitHub Events #119

Implement new Scraper for GitHub Events #119

d-Rickyy-b commented Oct 7, 2019

Samyak2 commented Nov 1, 2019

d-Rickyy-b commented Nov 1, 2019 •

edited

Loading

Samyak2 commented Nov 4, 2019

d-Rickyy-b commented Nov 4, 2019

Implement new Scraper for GitHub Events #119

Implement new Scraper for GitHub Events #119

Comments

d-Rickyy-b commented Oct 7, 2019

Definition of done

Samyak2 commented Nov 1, 2019

d-Rickyy-b commented Nov 1, 2019 • edited Loading

Samyak2 commented Nov 4, 2019

d-Rickyy-b commented Nov 4, 2019

d-Rickyy-b commented Nov 1, 2019 •

edited

Loading