Feature Request: Curl Page Title #71

mitchellkrogza · 2017-07-23T07:48:43Z

Perhaps doing a simple curl to get the page title of a page would help people sort and clean their lists. This could be include in results.txt or even separate output files??? So pages that return a name other than 404 or not found go into one result file and ones that return a page name other than 404, missing or not found go into a separate file. This way, especially with lists that have subdomains of a root domain like blogspot.com etc it would be easy to pick up blogs that no longer exist.

The reason for this is that lists that are full of sub-domains of a main domain, like this https://github.com/funilrys/dead-hosts/blob/master/add.2o7Net/tested-list/add.2o7Net.list first of all always show ACTIVE in results because the root domain is active. Getting a page title for each tested domain / sub-domain would quickly reveal which of those dub-domains are no longer actually there. Same applies to some lists that have hundreds of .blogspot.com domains in them.

Something like this?

Domain                                                                                               Status      Expiration Date   Source     Analyse Date                      Page Title        
---------------------------------------------------------------------------------------------------- ----------- ----------------- ---------- -------------------------------   ------------------------------------------------------------------
7wind.ru                                                                                             ACTIVE      12-may-2018       WHOIS      Sat Jul 22 14:52:26 UTC 2017      404 Not Found
accentstudio.co.uk                                                                                   ACTIVE      28-dec-2017       WHOIS      Sat Jul 22 14:52:27 UTC 2017      Hello this web site has a Page Title
acenar.com                                                                                           ACTIVE      19-jan-2018       WHOIS      Sat Jul 22 14:52:27 UTC 2017      Hello this web page also has a page title
subdomain.acenar.com                                                                                 ACTIVE      19-jan-2018       WHOIS      Sat Jul 22 14:52:27 UTC 2017      404 Not Found

The text was updated successfully, but these errors were encountered:

funilrys · 2017-07-23T10:09:40Z

Okay this gonna need some time but it's possible 👍

But wouldn't it be better to catch the http code instead of the page title @mitchellkrogza ?

mitchellkrogza · 2017-07-23T10:33:54Z

Yes possibly easier to decifer between 200 OK and 404 Not Found and 403 Forbidden messages than page titles.

funilrys · 2017-07-23T10:35:25Z

Clear !!
I'm going to release 1.4.0 and it'll be implemented for the release after 1.4.0 👍

mitchellkrogza · 2017-07-23T11:03:21Z

👍 Awesome looking forward to giving that a test run.

mitchellkrogza · 2017-07-24T13:49:10Z

@funilrys Check this cheap and nasty .csv file output showing domain, status code, content type and redirect url (if any).

With this .csv file, one can take anything with a 000 or 404 or 408 or 403 or 500 or other strange error code and immediately knock them off the list.

Then one can run a separate test on the redirect url's column to see what those produce 😁 and then merge them together and run a test using a list of domains that did not redirect + the ones that did.

See:

https://github.com/mitchellkrogza/Stop.Google.Analytics.Ghost.Spam.HOWTO/blob/master/.dev-tools/_output_source/results.csv

The script .... as I said .... cheap and nasty but .... fast and effective. Took Travis CI 26 minutes to produce the .csv on a list of 5451 domains 👍

mitchellkrogza · 2017-07-24T14:13:00Z

@funilrys sent you something on WeTransfer 👍 😀

funilrys · 2017-07-24T18:29:09Z

Let me check I was out with friends ....

funilrys · 2017-07-24T18:37:14Z

Okay for this issue the hardest thing is to develop some procedure/features and have to think about how data will be shown in a HTML ( #62 ) file which I started to design 😆

about WeTransfer: Answered you:+1: :open_mouth:

mitchellkrogza · 2017-07-24T18:46:29Z

Looking forward to seeing what you busy developing. Just came up with that basic bash script this morning. No doubt you will improve upon it greatly :)

funilrys · 2017-07-24T19:05:07Z

The following may be me merged to the wiki after implementation of this issue and if we share the same thought about those codes
Consider the following as not fixed and open to modifications

For your code list, you should consider the following codes @mitchellkrogza :

As active

100 - Continue
101 - Switching Protocols
200 - OK
201 - Created
202 - Accepted
203 - Non-Authoritative Information
204 - No Content
205 - Reset Content
206 - Partial Content

As potentially active

000
300 - Multiple Choices
301 - Moved Permanently
302 - Found
303 - See Other
304 - Not Modified
305 - Use Proxy
307 - Temporary Redirect
403 - Forbidden
405 - Method Not Allowed
406 - Not Acceptable
407 - Proxy Authentication Required
408 - Request Timeout
411 - Length Required
413 - Request Entity Too Large
417 - Expectation Failed
500 - Internal Server Error
501 - Not Implemented
502 - Bad Gateway
503 - Service Unavailable
504 - Gateway Timeout
505 - HTTP Version Not Supported

As inactive or potentially inactive

400 - Bad Request
401 - Unauthorized
402 - Payment Required (Not in use but may be seen in the future)
404 - Not Found
409 - Conflict
410 - Gone
412 - Precondition Failed
414 - Request-URI Too Long
415 - Unsupported Media Type
416 - Requested Range Not Satisfiable

mitchellkrogza · 2017-07-24T19:10:25Z

Cool thanks Nissar, still lots of playing to do with this one. Trying to add page title into the equation then it will also help diagnosing lists. Kind Regards Mitchell Krog ************************************************** Visit me at https://mitchellkrog.com ************************************************** License My Images From Getty Images Here <http:https://www.gettyimages.com/search/photographer?family=creative&page=1&photographer=mitchell%20krog&sort=best&excludenudity=true#license> or From Gallo Images Here <http:https://galloimages.co.za/Search?q=mitchell%20krog&p=1&a=1&l=2,1&st=2&dr=on&is=1&token=48034&pp=13&rc=6> **************************************************

funilrys · 2017-07-24T19:19:03Z

@mitchellkrogza Updated my last comments about the codes

For the "checking redirection" part imagine that the redirection have a redirection which also have a redirection 😜 🤣 🤣

mitchellkrogza · 2017-07-25T08:43:39Z

@funilrys yes indeed those with multiple redirects are the one's I am most interested in. I test all stuff added to my lists manually in a browser before they get added. I always run screen recorder to capture what's happening in the url bar of the browser as I often test a site and it does 1-7 redirects in a split second. So then I play back the screen recording and capture all those redirect links and then add them to my lists. ....... very time consuming as you can imagine 😬

funilrys · 2017-07-25T08:45:33Z

Are you suggesting that I should add a follow redirection for funceble once we have the curl column ? 🤣 🤣 🤣 🤣 🤣

mitchellkrogza · 2017-07-25T08:53:04Z

It would probably kill funceble and Travis too 🤣 ..... this should be a separate project, a redirect-redirect checker.

funilrys · 2017-07-25T08:54:31Z

Indeed yeah 🤣 imagine a dead-hosts with follow redirection 🤣 🤣 🤣 🤣

mitchellkrogza · 2017-07-25T09:07:47Z

That could go awfully wrong very fast 🤣 🤣 🤣

funilrys · 2017-07-25T16:40:22Z

A bit of teasing 😜 what do you think of the following ? 😸

mitchellkrogza · 2017-07-25T16:55:10Z

YEAH .... now we are heading in the right direction

funilrys · 2017-07-28T20:36:34Z

Another teasing 😉
Can you find the new directories? 😸

.
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── funceble
├── iana-domains-db
├── index.html
├── LICENSE
├── output
│   ├── hosts
│   │   ├── ACTIVE
│   │   ├── INACTIVE
│   │   └── INVALID
│   ├── HTTP_Analytic
│   │   ├── ACTIVE
│   │   ├── POTENTIALLY_ACTIVE
│   │   └── POTENTIALLY_INACTIVE
│   ├── logs
│   │   ├── dateFormat
│   │   ├── noReferer
│   │   ├── percentage
│   │   └── whois
│   └── splited
├── README.md
└── tool

mitchellkrogza · 2017-07-29T09:55:36Z

Nice 👍 question, will running funceble on my repo like https://github.com/mitchellkrogza/Stop.Google.Analytics.Ghost.Spam.HOWTO automatically create the new folders if needed? and populate them with a .keep file so they get added and committed?

Looking forward to what's coming 😁

funilrys · 2017-07-29T12:47:25Z

Thank you for the question @mitchellkrogza !!

Please report to #89 😉 👍

funilrys self-assigned this Jul 23, 2017

funilrys added Features Suggestions labels Jul 23, 2017

funilrys added this to the Suggestions/Features milestone Jul 23, 2017

funilrys added this to Waiting in Suggestions/Features Jul 23, 2017

funilrys moved this from Waiting to Possible // TODO in Suggestions/Features Jul 23, 2017

funilrys added a commit that referenced this issue Jul 25, 2017

Partial implementation of #71 (DO NOT USE THIS NOW)

1b74077

funilrys moved this from Possible // TODO to In Progress in Suggestions/Features Jul 26, 2017

funilrys added a commit that referenced this issue Jul 26, 2017

Improvement of #71 + Fix #83

bcc7d0c

funilrys added a commit that referenced this issue Jul 28, 2017

Improvement of #71 + Improve comments

9f67160

funilrys mentioned this issue Jul 29, 2017

Folder are not automatically created #89

Closed

funilrys added a commit that referenced this issue Aug 16, 2017

(Improvement of #71) Review output data

9183597

funilrys added a commit that referenced this issue Aug 17, 2017

(Improvement of #71) Review status depending of httpCode() output

2f6051a

funilrys closed this as completed Mar 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Curl Page Title #71

Feature Request: Curl Page Title #71

mitchellkrogza commented Jul 23, 2017

funilrys commented Jul 23, 2017

mitchellkrogza commented Jul 23, 2017

funilrys commented Jul 23, 2017

mitchellkrogza commented Jul 23, 2017

mitchellkrogza commented Jul 24, 2017

mitchellkrogza commented Jul 24, 2017

funilrys commented Jul 24, 2017

funilrys commented Jul 24, 2017 •

edited

Loading

mitchellkrogza commented Jul 24, 2017 via email

funilrys commented Jul 24, 2017 •

edited

Loading

mitchellkrogza commented Jul 24, 2017 via email

funilrys commented Jul 24, 2017

mitchellkrogza commented Jul 25, 2017

funilrys commented Jul 25, 2017

mitchellkrogza commented Jul 25, 2017

funilrys commented Jul 25, 2017

mitchellkrogza commented Jul 25, 2017

funilrys commented Jul 25, 2017

mitchellkrogza commented Jul 25, 2017

funilrys commented Jul 28, 2017

mitchellkrogza commented Jul 29, 2017

funilrys commented Jul 29, 2017

Feature Request: Curl Page Title #71

Feature Request: Curl Page Title #71

Comments

mitchellkrogza commented Jul 23, 2017

funilrys commented Jul 23, 2017

mitchellkrogza commented Jul 23, 2017

funilrys commented Jul 23, 2017

mitchellkrogza commented Jul 23, 2017

mitchellkrogza commented Jul 24, 2017

mitchellkrogza commented Jul 24, 2017

funilrys commented Jul 24, 2017

funilrys commented Jul 24, 2017 • edited Loading

mitchellkrogza commented Jul 24, 2017 via email

funilrys commented Jul 24, 2017 • edited Loading

As active

As potentially active

As inactive or potentially inactive

mitchellkrogza commented Jul 24, 2017 via email

funilrys commented Jul 24, 2017

mitchellkrogza commented Jul 25, 2017

funilrys commented Jul 25, 2017

mitchellkrogza commented Jul 25, 2017

funilrys commented Jul 25, 2017

mitchellkrogza commented Jul 25, 2017

funilrys commented Jul 25, 2017

mitchellkrogza commented Jul 25, 2017

funilrys commented Jul 28, 2017

mitchellkrogza commented Jul 29, 2017

funilrys commented Jul 29, 2017

funilrys commented Jul 24, 2017 •

edited

Loading

funilrys commented Jul 24, 2017 •

edited

Loading