Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A website list for only malicious, spyware and botnet websites #20

Closed
HydraDragonAntivirus opened this issue Dec 8, 2023 · 24 comments
Closed

Comments

@HydraDragonAntivirus
Copy link

Can you make this? I making an antivirus now I have 5.51 million unique website list.

@maravento
Copy link
Owner

Could you give more details about the issue. I do not understand your question

@HydraDragonAntivirus
Copy link
Author

Ok problem solved actually most of them are infected ones I have 14.3 million website list if you want I can give that.

@maravento
Copy link
Owner

maravento commented Dec 21, 2023

Publish the blacklist to a public repository (example: github, gitlab, etc.), and I will review it to verify that it can be included in the blackweb sources

@HydraDragonAntivirus
Copy link
Author

@maravento
Copy link
Owner

error 404. Repository empty

@HydraDragonAntivirus
Copy link
Author

oops sorry can't make public can I share at different platform due to filesize?

@maravento
Copy link
Owner

maravento commented Dec 21, 2023

to compress in parts and upload it to github:

tar cvzf - malwebsite.txt | split -b 40MB - malwebsite.tar.gz

To download and unzip:

sudo apt install git subversion
svn export "https://github.com/xylenthydradragonav/trunk/malwebsite"
cat malwebsite.tar.gz* | tar xzf -

Gitlab doesn't support trunk

or:

git clone https://gitlab.com/xylenthydradragonav/malwebsite
cd malwebsite
cat malwebsite.tar.gz* | tar xzf -

@maravento
Copy link
Owner

Upload it to mega.nz to review the list and see matches with other lists

@maravento
Copy link
Owner

I tried to clone the repository but it is impossible. "This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access"
I suggest you only upload the file to mega.nz and provide the link.
It would also be very productive if you cited the public sources on your list. The idea is to verify if it really is a blacklist
What I will do with the file is the following:

  • a diff with blackweb (to determine how many of those domains are already included in blackweb and the blackweb source projects)
  • A validity check (how many domains on your list actually exist)

@HydraDragonAntivirus
Copy link
Author

@maravento
Copy link
Owner

maravento commented Jan 29, 2024

Captura de pantalla -2024-01-29 07-34-45

@HydraDragonAntivirus
Copy link
Author

Oops sorry let me upload to mega.nz

@HydraDragonAntivirus
Copy link
Author

@maravento
Copy link
Owner

I will start reviewing the file. It will take a few days. Thanks

@maravento
Copy link
Owner

maravento commented Feb 2, 2024

https://gitlab.com/maravento/dev/-/tree/main/blacklist

What this repository means:
Domains_original.zip:
is the list of domains that you gave me, which contains 9585308 lines
Domains_valid.zip:
Of Domains_original.zip only 6612000 are valid lines of existing domains.
Domains_diff_blackweb.zip:
Comparing Domains_valid.zip with blackweb, the result is a list with 1388134 lines of domains and subdomains that are not included in Blackweb

Now the big problem is to verify if those 1388134 domains and subdomains are really blacklist or if there are false positives. It is a job that could take at least a month.It would help a lot if you mention the source(s) of that list.

@HydraDragonAntivirus

This comment was marked as off-topic.

@HydraDragonAntivirus
Copy link
Author

HydraDragonAntivirus commented Feb 3, 2024

Now I checked it down to 2.26 million and it's whitelist. Don't forget to add this https://github.com/scamaNet/blocklist/tree/main

@HydraDragonAntivirus
Copy link
Author

https://zeltser.com/malicious-ip-blocklists/ I also looked this but maybe it's not up to date.

@maravento
Copy link
Owner

maravento commented Feb 3, 2024

Hi

  • Regarding the IP address file that you sent, I can't do anything, since it is not a file that I can verify.
  • Regarding Domains_diff_blackweb.zip, I will try to verify if its content is really blacklist, however, it is not a task I can accomplish in a short time. I really don't know when I'll be able to do it. I have time limitations available.

Thank you very much for the contribution, and thank you for being interested in blackweb.

PS: Please avoid advertising your projects in this space.

@maravento
Copy link
Owner

maravento commented Feb 4, 2024

  • The project in gitlab, which contains the lists you provided, will continue to update until the debug lists reaches 0 lines
  • The Links file, being domains, is added to the domains_valid list, after debugging

https://gitlab.com/maravento/dev

@maravento maravento reopened this Feb 4, 2024
@HydraDragonAntivirus
Copy link
Author

https://github.com/HydraDragonAntivirus/XylentBlockList Updated database but didn't removed invalid ones. If you want I can remove it.

@maravento
Copy link
Owner

maravento commented Mar 16, 2024

You can check the method that blackweb uses to verify domains at DNS Loockup. Although it is more complex and in two steps, it can basically be reduced to this:

#!/usr/bin/env bash
xargs -I {} -P 300 sh -c 'if host {} >/dev/null; then echo VALID {}; echo VALID {} >> valid.txt; else echo INVALID {}; fi' < your_blacklist_name.txt

This little bash launches a parallel verification process on your list "your_blacklist_name.txt" (change it to the actual name of your blacklist), using the "host" command. By default, I set "300", but you can set this number lower if it consumes too many resources.
This bash will verify each domain in your file and those that are valid will be sent to a new list "valid.txt" and will show the entire process in the terminal. example:

./test.sh 
INVALID pslcl.com
VALID google.com
VALID facebook.com

There are other programs on github that have other more sophisticated verification methods such as https://pyfunceble.github.io/

@HydraDragonAntivirus
Copy link
Author

I'm going to focus only malicious websites things. Thank you for help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants