Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates Delay (List culling and removal of dead domains) #49

Closed
mitchellkrogza opened this issue Aug 29, 2017 · 16 comments
Closed

Updates Delay (List culling and removal of dead domains) #49

mitchellkrogza opened this issue Aug 29, 2017 · 16 comments
Assignees

Comments

@mitchellkrogza
Copy link
Member

A big build is in progress to get a reliable output of all dead domains on this hosts list. Updates to this repo will only resume in about 7 days time.

@mitchellkrogza
Copy link
Member Author

Something just interesting, can you believe that burger-imperia.com and pizza-tycoon.com are still the top ranking most active bad sites out there. Every day in all my logs across 28 sites.

@xxcriticxx
Copy link

should i visit them?

@mitchellkrogza
Copy link
Member Author

Burger Imperia (but of course all the nasty stuff /click redirecting and hijacking is hidden when you visit them)

screen shot 2017-08-29 at 5 38 36 pm

@mitchellkrogza
Copy link
Member Author

and pizza tycoon

screen shot 2017-08-29 at 5 39 49 pm

@xxcriticxx
Copy link

looks like same author

@mitchellkrogza
Copy link
Member Author

Absolutely and been a pain in people's back sides for years yet nobody seems to be able to take them down.

@mitchellkrogza
Copy link
Member Author

mitchellkrogza commented Aug 30, 2017

So we uncovered an undocumented Travis feature. If a repo builds continually, even though all builds are passing, for more than 48 hours. Travis CI stops building after 48 hours.

We did however accumulate a lot of data which can be now used to start cleaning this list of dead domains beginning with all those that returned 404. The remainder of the testing I will probably do from one of my Ubuntu Servers and not tie up Travis like this.

So possibly later today we will have our first list of dead domains to start stripping out of the final RAW files produced.

@mitchellkrogza
Copy link
Member Author

Okay, so after further investigation we did not reach any sort of timeout. One of the input sources - https://raw.githubusercontent.com/mitchellkrogza/Ultimate.Hosts.Blacklist/master/.input_sources/_Airelles_Anti_Sex_Hosts/domains.txt had an entry cz as a domain name which caused the testing script to exit and go into final commit mode.

That's the whole idea of this repo is to clean up junk information like that from all these input sources / black lists out there.

@mitchellkrogza
Copy link
Member Author

@xxcriticxx new raw links pushed out today. So far lists are now stripped of 19473 dead domains. So now we resume the funceble testing again.

@xxcriticxx
Copy link

@mitchellkrogza last count was at 1.5 mil now -19k more?

@mitchellkrogza
Copy link
Member Author

Remember each time I do a commit it pulls in fresh data so list sizes change all the time

@mitchellkrogza
Copy link
Member Author

Ultimately we want this smaller lighter and more accurate no?

@xxcriticxx
Copy link

@mitchellkrogza correct

@mitchellkrogza
Copy link
Member Author

👍

@xxcriticxx
Copy link

and no .cz domains

@mitchellkrogza
Copy link
Member Author

LOL .cz yes just not cz all by itself, that issue has been resolved from all future lists where I check the combined list now for any line not containing a . and delete it before creating the hosts files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants