Skip to content

This is the data that drives the whynohttps.com website

Notifications You must be signed in to change notification settings

jawn/Why-No-HTTPS-Data

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Why No HTTPS? The Raw Data!

This is the raw data that drives the whynohttps.com website. It's ordered by domain name so it's easy to see what changes from revision to revision.

##Input Files There are 4 files which make up the bulk of the input:

  1. http-sites.json : Sites that Scott Helme's crawler finds returning content over HTTP without redirecting to HTTPS with a 301 or 302
  2. https-sites.json : Sites that Scott Helme's crawler finds explicitly redirecting from HTTP to HTTPS with a 301 or 302
  3. top-1m.csv : The Tranco Top 1M websites used to rank the previous 2 lists
  4. transport_security_state_static.json : Sites on the HSTS preload list

Scott publishes all his data publicly at crawler.ninja.

##Output Files There are 2 main classes of file:

  1. countries.json : A list of countries for which data has been prepared
  2. top100.json : The 100 largest websites not redirecting HTTP to HTTPS
  3. top50-[country code].json : The 50 largest websites for the given country not redirecting HTTP to HTTPS

##Notes There's many reasons why a site might appear on this list that you think shouldn't be there. Read the launch blog post Why No HTTPS? Here's the World's Largest Websites Not Redirecting Insecure Requests to HTTPS and follow-up post Why No HTTPS? Questions Answered, New Data, Path Forward for more on that.

At present, the data is updated on an ad hoc basis.

About

This is the data that drives the whynohttps.com website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published