Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea #139

Closed
xxcriticxx opened this issue Feb 22, 2018 · 26 comments
Closed

idea #139

xxcriticxx opened this issue Feb 22, 2018 · 26 comments
Assignees

Comments

@xxcriticxx
Copy link

@mitchellkrogza @funilrys

can you code something that finds domains on this list that aren't on any other list?

usually this way i find lots false positive

@funilrys funilrys self-assigned this Feb 22, 2018
@funilrys
Copy link
Member

@mitchellkrogza Give me 24 hours and I'll write a script unless you have a one line (I know how much you love them 😹 ) command for @xxcriticxx 👍

@mitchellkrogza
Copy link
Member

Great idea, I'm sure I can do a one liner to achieve that, will look in the morning

@mitchellkrogza
Copy link
Member

I'll test this in the morning grep -Fxv -f first-file.txt second-file.txt or comm -23 second-file-sorted.txt first-file-sorted.txt

@xxcriticxx
Copy link
Author

something like Diff Checker but i dont know if it would handle 2 millions lines

@mitchellkrogza
Copy link
Member

The comm command line should do what you need, I ran it against Ultimate Hosts and Badd Boyz and it took all of 5 seconds

@xxcriticxx
Copy link
Author

@mitchellkrogza can you tub it against few larger listsand post output here?

@funilrys
Copy link
Member

I think I can't do better than 5 seconds 😸

@mitchellkrogza
Copy link
Member

@xxcriticxx send me one list for now to test against Ultimate to find any uncommon entries

@mitchellkrogza
Copy link
Member

mitchellkrogza commented Feb 23, 2018

@xxcriticxx @funilrys I just ran Ultimate Hosts against itself to show how quick it is. So we are talking 1,602,081 domains. 2 seconds to complete ... no jokes. You can see result.txt is 0 kb which means there are no odd one's out 😄 as @funilrys says I sure do LOVE my one liners and Linux RULES !!!

compare

@xxcriticxx
Copy link
Author

i will play with this on the weekend

@mitchellkrogza
Copy link
Member

@xxcriticxx great, please share your findings with us

@xxcriticxx
Copy link
Author

comm wont work

comm -13 list1.txt list2.txt > result.txt
comm: file 2 is not in sorted order
comm: file 1 is not in sorted order

@mitchellkrogza
Copy link
Member

comm will work, you need to make sure both lists are sorted first.

sort -u list1.txt -o list1.txt
sort -u list2.txt -o list2.txt
comm -13 list1.txt list2.txt > result.txt

@mitchellkrogza
Copy link
Member

Unfortunately lists need to be sorted and also clean as far as funny commenting like # Whatever and empty lines too

@xxcriticxx
Copy link
Author

if i sorted them they wont same and stripping # will take hours

@mitchellkrogza
Copy link
Member

There's very easy scripts to remove commenting, empty lines and clean a file. Send me your list1.txt and list2.txt to my email (zip them) - [email protected] and I will show you tomorrow and also show you how quick and easy it is using command line tools

@xxcriticxx
Copy link
Author

list1 is your list
list2 is stevenblack list

@mitchellkrogza
Copy link
Member

Will run them against each other tomorrow

@xxcriticxx
Copy link
Author

@mitchellkrogza send me nice picture of nature i am in need of new wallpapers

@mitchellkrogza
Copy link
Member

What size? and what kind I have far too much stuff :) Best to check my Facebook Page and drop me an email. https://www.facebook.com/MitchellKrogPhotography

@xxcriticxx
Copy link
Author

i have 2x Acer G277HL 1920 x 1080

@smed79
Copy link
Contributor

smed79 commented Mar 1, 2018

Extract lines in file1 not found in file2

diff --new-line-format="" --unchanged-line-format="" file1 file2 > file3

Extract lines from file2 already found in file1

awk 'NR==FNR{lines[$0]++; next} $1 in lines' file1 file2 > file3

@mitchellkrogza
Copy link
Member

Thanks @smed79 👍

@funilrys
Copy link
Member

funilrys commented Mar 13, 2018

With the new repository structure/system, once we merged all sources, we remove all duplicates before generating the files 😸

@xxcriticxx
Copy link
Author

The idea was to find something here that doesn’t exist on any other list or lists

@funilrys
Copy link
Member

Because we are going to work with big lists soon, we decided that it will be great to have a script or system which can help us find the amount of domains from a list which are not already part of this repository.

So I invite you to play with https://gist.github.com/funilrys/900abd388b1f3b399a9da69e0e592fef (-h gives you the help) !!!

Have a nice day/night.

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants