Skip to content

Newly registered Domain Monitoring to detect phishing and brand impersonation with subdomain enumeration and source code scraping

License

Notifications You must be signed in to change notification settings

cray-uc/domainthreat

 
 

Repository files navigation

domainthreat

Daily Domain Monitoring for Brands and Mailing Domain Names

Current Version 3.12

Here you can find a Domain Monitoring tool. You can monitor your company brands (e.g. "amazon"), your mailing domains (e.g. "companygroup) or other words.

Motivation

Typical Domain Monitoring relies on brand names as input. Sometimes this is not sufficient enough to detect phishing attacks in cases where the brand names and mailing domain names are not equal.

Thought experiment: If example company "IBM" monitors their brand "IBM", send mails via @ibmgroup.com and attacker registers the domain ibrngroup.com (m = rn) for spear phishing purposes (e.g. CEO Fraud). Typical Brand (Protection) Domain Monitoring Solutions may experience difficulties because the distance between monitored brand name "IBM" and registered domain name "ibrngroup.com" is too big to classify it as a true positive and therefore makes it harder for the targeted company to take appropriate measures more proactively. This scenario is avoidable by also monitoring your mailing domain names and thus focussing more on text strings rather than brands.

This was the motivation for this project.

Detection Scope

  • full-word matching (e.g. amazon-shop.com),
  • regular typo squatting cases (e.g. ammazon.com),
  • typical look-alikes / phishing / so called CEO-Fraud domains (e.g. arnazon.com (rn = m),
  • IDN Detection / look-alike Domains based on full word matching (e.g. 𝗉ay𝞀al.com - greek letter RHO '𝞀' instead of latin letter 'p'),
  • IDN Detection / look-alike Domains based on partial word matching (e.g. 𝗉ya𝞀a1.com - greek letter RHO '𝞀' instead of latin letter 'p' AND "ya" instead of "ay" AND Number "1" instead of Letter "l")

Example Screenshot: Illustration of detected topic keyword 'tech' in source code of newly registered domain 'microsoftintegration[.]com' and detected subdomains image

Features

Key Features & CSV Output Columns

  • Unicode domain names (IDN) / Homoglyph / Homograph Detection

  • Variety of domain fuzzing / similarity algorithms

  • Automated Website Translations

  • Support of a variety of different languages

  • Detected By: Full Keyword Match or Similar/Fuzzy Keyword Match

  • Source Code Match: Keyword detection in websites - even if they are in other languages (e.g. chinese) by using different translators (normalized to english per default)
    ==> This is to cover needs of international companies and foreign-speaking markets

  • Website Status: Check website status by http status codes: HTTPError for a 4XX client error or 5XX server error response code

  • Parked: Check if domain is parked for 2XX or 3XX Status Code domains (experimental state)

  • Subdomains: Subdomain Scan

  • E-Mail Availability: Check if domain is ready for receiving mails and/or ready for sending mails

  • Daily CSV export into a calender week based CSV file (can be filtered by dates)

Other Features

  • Multithreading (CPU core based) & Multiprocessing
  • False Positive Reduction Instruments (e.g. self defined Blacklists, Thresholds depending on string lenght)
  • Keyword detection in websites which neither contain brands in domain names nor are similar registered

Principles

1. Basic Domainmonitoring

1.1. Keywords from file keywords.txt (e.g. tuigroup) are used to make full-word detection (e.g. newtuigroup.shop) and similar-word detection (e.g. tuiqroup.com (g=q)) on newly registered domain names.

1.2. Keywords from file topic_keywords.txt are used to find these keywords (e.g. travel) in source code of (translated) webpages (e.g. dulichtui.com) of domain monitoring results from point 1.1.

==> Results are exported to Newly_Registered_Domains_Calender_Week_ .csv File

2. Advanced Domainmonitoring

2.1. Keywords from file topic_keywords.txt (e.g. holiday) are used to make full-word detection (e.g. usa-holiday.net) on newly registered domain names.

2.2. Keywords from file topic_keywords.txt (e.g. holiday) are automatically translated into the languages which are provided by the User in the "User Input/languages_advanced_monitoring.txt" file. Please see supported_languages.txt for supported languages at this moment. Copy / Paste the demanded languages from supported_languages.txt to "User Input/languages_advanced_monitoring.txt" file if you want to (empty per default). Punycode domains are not supported by these translations at the moment.

==> Results from 2.1. will be enhanced by translated keywords from topic-keywords.txt file. For example "urlaub" is the german word for "holiday". The program will now find in addition german registerd domains like "sea-urlaub.com"

2.3. Keywords from file unique_brand_names.txt are used to find these keywords (e.g. tui) in webpages of monitoring results from point 2.1. (e.g. usa-holiday.net) and from 2.2. (e.g. sea-urlaub.com) (if any supported languages are provided)

==> Results are exported to Advanced_Monitoring_Results_Calender_Week_ .csv File

Instructions

How to install:

How to run:

--similarity : Selection of similarity mode of homograph, typosquatting detection algorithms with options "close" OR "wide".

  • close: Less false positives and (potentially) more false negatives
  • wide: More false positives and (potentially) less false negatives
  • Default: Tradeoff between both mode options.

--threads : Number of Threads

  • Default: Number of Threads is based on CPU cores

Running program in standard mode (CPU cores + default similarity mode):

  • "python3 domainthreat.py"

Running program in wide similarity mode with 50 threads:

  • "python3 domainthreat.py --similarity wide --threads 50" image

How to update:

  • cd domainthreat
  • git pull
  • In case of a Merge Error: Try "git reset --hard" before "git pull"

Before the first run - How it Works:

  1. Put your brand names or mailing domain names into this TXT file "User Input/keywords.txt" line per line for monitoring operations (without the TLD). Some "TUI" Names are listed per default.

  2. Put common word collisions into this TXT file "User Input/blacklist_keywords.txt" line per line you want to exclude from the results to reduce false positives.

  • e.g. blacklist "lotto" if you monitor keyword "otto", e.g. blacklist "amazonas" if you want to monitor "amazon", e.g. blacklist "intuitive" if you want to monitor "tui" ...
  1. Put commonly used words into this TXT file "User Input/topic_keywords.txt" line per line that are describing your brands, industry, brand names, products on websites. These keywords will be used for searching / matching in source codes of webistes. Default and normalized language is english for performing automated translation operations from HTML Title, Description and Keywords Tag via different translators.
  • e.g. Keyword "fashion" for a fashion company, e.g. "sneaker" for shoe company, e.g. "Zero Sugar" for Coca Cola Inc., e.g. "travel" for travel company...
  1. Put your brand names into this TXT file "User Input/unique_brand_names.txt" line per line for monitoring operations (e.g. "tui"). These keywords will be used for searching / matching in sources codes on websites which neither contain your brand names in domain name nor are similar registered to them (e.g. usa-holiday.net). Some "TUI" Names are listed per default.

Troubleshooting

  • In case of errors with modules "httpcore" or "httpx" - possible fixes:
    • pip uninstall googletrans (in case you have installed older version of domainthreat as of version <= 2.11)
    • pip install --upgrade pip
    • pip install --upgrade httpx
    • pip install --upgrade httpcore

Changelog

Notes

Author

TO DO

  • Add additional fuzzy matching algorithms to increase true positive rate / accurancy (Sequence-based algorithm "Longest Common Substring" is already included but not activated by default)
  • Enhance source code keyword detection on subdomain level
  • AI based Logo Detection by Object Detection

Additional

About

Newly registered Domain Monitoring to detect phishing and brand impersonation with subdomain enumeration and source code scraping

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 100.0%