-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
20 changed files
with
565 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,7 @@ | ||
^.*\.Rproj$ | ||
^\.Rproj\.user$ | ||
^README\.Rmd$ | ||
^README-.*\.png$ | ||
^\.travis\.yml$ | ||
^CONDUCT\.md$ | ||
^README\.md$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Sample .travis.yml for R projects | ||
|
||
language: r | ||
warnings_are_errors: true | ||
sudo: required |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Contributor Code of Conduct | ||
|
||
As contributors and maintainers of this project, we pledge to respect all people who | ||
contribute through reporting issues, posting feature requests, updating documentation, | ||
submitting pull requests or patches, and other activities. | ||
|
||
We are committed to making participation in this project a harassment-free experience for | ||
everyone, regardless of level of experience, gender, gender identity and expression, | ||
sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. | ||
|
||
Examples of unacceptable behavior by participants include the use of sexual language or | ||
imagery, derogatory comments or personal attacks, trolling, public or private harassment, | ||
insults, or other unprofessional conduct. | ||
|
||
Project maintainers have the right and responsibility to remove, edit, or reject comments, | ||
commits, code, wiki edits, issues, and other contributions that are not aligned to this | ||
Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed | ||
from the project team. | ||
|
||
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by | ||
opening an issue or contacting one or more of the project maintainers. | ||
|
||
This Code of Conduct is adapted from the Contributor Covenant | ||
(http:contributor-covenant.org), version 1.0.0, available at | ||
http:https://contributor-covenant.org/version/1/0/0/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,17 @@ | ||
Package: safebrowsing | ||
Title: What the Package Does (one line, title case) | ||
Version: 0.0.0.9000 | ||
Version: 0.1.0.9000 | ||
Authors@R: c(person("Bob", "Rudis", email = "[email protected]", role = c("aut", "cre"))) | ||
Description: What the package does (one paragraph). | ||
Depends: R (>= 3.2.3) | ||
Depends: | ||
R (>= 3.2.0) | ||
License: AGPL + file LICENSE | ||
LazyData: true | ||
Suggests: | ||
testthat | ||
Imports: | ||
purrr, | ||
dplyr, | ||
httr, | ||
V8 | ||
RoxygenNote: 5.0.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,11 @@ | ||
# Generated by roxygen2: fake comment so roxygen2 overwrites silently. | ||
exportPattern("^[^\\.]") | ||
# Generated by roxygen2: do not edit by hand | ||
|
||
export(gsb_as_ts) | ||
export(gsb_asinfo) | ||
export(gsb_site_status) | ||
import(V8) | ||
import(httr) | ||
importFrom(dplyr,bind_rows) | ||
importFrom(dplyr,filter) | ||
importFrom(dplyr,select) | ||
importFrom(purrr,map_df) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
S_GET <- purrr::safely(httr::GET) | ||
|
||
UA <- paste0("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) ", | ||
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 ", | ||
"Safari/537.36", sep="", collapse="") | ||
|
||
REF1 <- "https://www.google.com/transparencyreport/safebrowsing/malware/?hl=en" | ||
REF2 <- "https://www.google.com/transparencyreport/safebrowsing/diagnostic/" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
#' Retrieve attack/compromised host time series info for an AS | ||
#' | ||
#' @param as AS number to query (without the "\code{AS}" prefix) | ||
#' @param count return counts (\code{TRUE}) or perecentages (i.e. rate) | ||
#' @return \code{data.frame} (\code{tbl_df}) with (probably) massive time series info | ||
#' @export | ||
#' @examples \dontrun{ | ||
#' gsb_as_ts("10439") | ||
#' } | ||
gsb_as_ts <- function(as, count=TRUE) { # count==FALSE gets % | ||
|
||
count <- if (count) "COUNT" else "RATE" | ||
as <- sub("^as", "", as, ignore.case=TRUE) | ||
|
||
SPATH <- | ||
sprintf("transparencyreport/jsonp/sb/malware/ts/%s/?a=%s&t=ATTACK&t=COMPROMISED&c=", | ||
as, count) | ||
|
||
res <- S_GET("https://www.google.com", | ||
path=SPATH, | ||
add_headers(Accept="*/*", | ||
Referer=REF1, | ||
`User-Agent`=UA)) | ||
|
||
if (!is.null(res$result)) { | ||
|
||
js <- content(res$result, as="text") | ||
|
||
.pkgenv$ctx$eval(sprintf("var dat=%s", js)) | ||
|
||
dat <- data.frame(.pkgenv$ctx$get("dat"), stringsAsFactors=FALSE) | ||
dat$date <- as.Date(as.POSIXct(dat$time.series.1/1000, | ||
origin='1970-01-01', tz="UTC")) | ||
dat <- select(dat, | ||
date, asn, name, description, attack=time.series.2, | ||
compromised=time.series.3, -sites.types, -time.series.1) | ||
|
||
return(dplyr::tbl_df(dat)) | ||
|
||
} else { | ||
stop("Error retrieving data safesearch detailed asn info", call.=FALSE) | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# get the data behind https://www.google.com/transparencyreport/safebrowsing/malware/ | ||
|
||
#' Retrive AS info from Google SafeBrowsing | ||
#' | ||
#' @param as_size return results for \code{all} ASNs or just the \code{largest}? | ||
#' @param time_range number of days of history for the time series | ||
#' @param type_detected one of \code{both}, \code{attack} or \code{compromised}. | ||
#' @param region ISO2 region code or \code{all} | ||
#' @param .progress display a progress bar? | ||
#' @note Depending on the parameters used (especially \code{time_range}), | ||
#' this opereation could take a while. You should probably turn on | ||
#' progress bars for any query > 7 days. | ||
#' @return \code{data.frame} (\code{tbl_df}) with ASN info | ||
#' @references See \href{https://www.google.com/transparencyreport/safebrowsing/malware/?hl=en}{Google's page} | ||
#' for more information on how they scan & report ASN info. | ||
#' @export | ||
#' @examples \dontrun{ | ||
#' gsb_asinfo("largest", 7) | ||
#' } | ||
gsb_asinfo <- function(as_size=c("largest", "all"), | ||
time_range=90, | ||
type_detected=c("both", "attack", "compromised"), | ||
region="all", | ||
.progress=FALSE) { | ||
|
||
as_size <- match.arg(as_size, c("largest", "all")) | ||
type_detected <- match.arg(type_detected, c("both", "attack", "compromised")) | ||
|
||
first <- get_as_info_page(0, as_size, time_range, type_detected, region) | ||
|
||
if (.progress & interactive()) pb <- txtProgressBar(0, first$`page-count`, style=3) | ||
|
||
map_df(1:(first$`page-count`-1), function(pg) { | ||
if (.progress & interactive()) setTxtProgressBar(pb, pg) | ||
res <- get_as_info_page(pg, as_size, time_range, type_detected, region) | ||
res$table | ||
}) -> asn_tbl_pages | ||
|
||
if (.progress & interactive()) close(pb) | ||
|
||
dplyr::bind_rows(first$table, asn_tbl_pages) | ||
|
||
} | ||
|
||
get_as_info_page <- function(page=0, | ||
as_size=c("largest", "all"), | ||
time_range=90, | ||
type_detected=c("both", "attack", "compromised"), | ||
region="all") { | ||
|
||
as_size <- toupper(match.arg(as_size, c("largest", "all"))) | ||
type_detected <- toupper(match.arg(type_detected, c("both", "attack", "compromised"))) | ||
if (region == "all") region <- "" | ||
|
||
res <- S_GET("https://www.google.com", | ||
path="transparencyreport/jsonp/sb/malware/table/", | ||
query=list(t=type_detected, | ||
d=time_range, | ||
z=as_size, | ||
p=page, | ||
r=region, | ||
c=""), | ||
httr::add_headers(Accept="*/*", | ||
Referer=REF1, | ||
`User-Agent`=UA)) | ||
|
||
if (!is.null(res$result)) { | ||
|
||
js <- httr::content(res$result, as="text") | ||
|
||
.pkgenv$ctx$eval(sprintf("var dat=%s", js)) | ||
|
||
dat <- .pkgenv$ctx$get("dat") | ||
|
||
return(dat) | ||
|
||
} else { | ||
stop("Error retrieving data safesearch asn info", call.=FALSE) | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
#' A package to ... | ||
#' | ||
#' @name safebrowsing | ||
#' @docType package | ||
#' @author Bob Rudis (@@hrbrmstr) | ||
#' @import httr V8 | ||
#' @importFrom dplyr select bind_rows filter | ||
#' @importFrom purrr map_df | ||
NULL |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
#' Retrive URL "site status" | ||
#' | ||
#' Pass in a vector of URLs and the Google Safe Browsing Site Status | ||
#' for the URLs will be returned in a \code{data.frame} (\code{tbl_df}) | ||
#' | ||
#' @param site_url non-relative URL to check | ||
#' @return \code{data.frame} (\code{tbl_df}) with site status | ||
#' @export | ||
#' @examples \dontrun{ | ||
#' gsb_site_status(c("http:https://fgcdiesel.cl/encrypted.exe", | ||
#' "http:https://rud.is/", | ||
#' "http:https://dds.ec/")) | ||
#' } | ||
gsb_site_status <- function(site_urls) { | ||
purrr::map_df(site_urls, site_status) | ||
} | ||
|
||
site_status <- function(site_url) { | ||
|
||
res <- S_GET("https://www.google.com", | ||
path="safebrowsing/diagnostic", | ||
query=list(site=site_url, | ||
output="jsonp"), | ||
httr::add_headers(Accept="*/*", | ||
Referer=REF2, | ||
`User-Agent`=UA)) | ||
|
||
if (!is.null(res$result)) { | ||
|
||
js <- httr::content(res$result, as="text", encoding="UTF-8") | ||
|
||
js <- sub("^.*processResponse\\(", "", js) | ||
js <- sub("\\);$", "", js) | ||
|
||
.pkgenv$ctx$eval(sprintf("var dat=%s", js)) | ||
|
||
tmp <- .pkgenv$ctx$get("dat") | ||
tmp <- as.list(unlist(tmp)) | ||
|
||
tmp_names <- names(tmp) | ||
tmp_names <- sub("^website\\.", "", tmp_names) | ||
tmp_names <- gsub("\\.", "_", tmp_names) | ||
tmp_names <- gsub("([a-z])([A-Z])", "\\1_\\L\\2", tmp_names, perl=TRUE) | ||
tmp_names <- tolower(sub("^(.[a-z])", "\\L\\1", tmp_names, perl=TRUE)) | ||
|
||
names(tmp) <- tmp_names | ||
|
||
dat <- dplyr::tbl_df(as.data.frame(tmp, stringsAsFactors=FALSE)) | ||
|
||
return(dat) | ||
|
||
} else { | ||
stop("Error retrieving data safesearch detailed url info", call.=FALSE) | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
.pkgenv <- new.env(parent=emptyenv()) | ||
|
||
.onAttach <- function(...) { | ||
|
||
ctx <- V8::v8() | ||
assign("ctx", ctx, envir=.pkgenv) | ||
|
||
if (!interactive()) return() | ||
|
||
packageStartupMessage(paste0("safebrowsing is under *active* development. ", | ||
"See https://github.com/hrbrmstr/safebrowsing for changes")) | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
output: | ||
md_document: | ||
variant: markdown_github | ||
--- | ||
|
||
<!-- README.md is generated from README.Rmd. Please edit that file --> | ||
|
||
```{r, echo = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE, | ||
comment = "#>", | ||
fig.path = "README-" | ||
) | ||
``` | ||
|
||
`safebrowsing` : retrieve ASN & URL info from Google Safe Browsing | ||
|
||
If you're not familiar with Google's "Safe Browing" "service", you should probably [go here](https://www.google.com/transparencyreport/safebrowsing/?hl=en) and read up on it before using this package. | ||
|
||
Note also that this is package relies on undocumented APIs and could break if Google changes how they call the underlying XHR requests. | ||
|
||
The following functions are implemented: | ||
|
||
- `gsb_as_ts`: Retrieve attack/compromised host time series info for an AS | ||
- `gsb_asinfo`: Retrive AS info from Google SafeBrowsing | ||
- `gsb_site_status`: Retrive URL "site status" | ||
|
||
### News | ||
|
||
- Version released | ||
|
||
### Installation | ||
|
||
```{r eval=FALSE} | ||
devtools::install_github("hrbrmstr/safebrowsing") | ||
``` | ||
|
||
```{r echo=FALSE, message=FALSE, warning=FALSE, error=FALSE} | ||
options(width=120) | ||
``` | ||
|
||
### Usage | ||
|
||
```{r} | ||
library(safebrowsing) | ||
# current verison | ||
packageVersion("safebrowsing") | ||
gsb_site_status(c("http:https://fgcdiesel.cl/encrypted.exe", | ||
"http:https://rud.is/", "http:https://dds.ec/")) | ||
gsb_asinfo("largest", 7) | ||
gsb_as_ts("10439") | ||
``` | ||
|
||
### Test Results | ||
|
||
```{r} | ||
library(safebrowsing) | ||
library(testthat) | ||
date() | ||
test_dir("tests/") | ||
``` | ||
|
||
### Code of Conduct | ||
|
||
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). | ||
By participating in this project you agree to abide by its terms. |
Oops, something went wrong.