Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
hrbrmstr committed Mar 5, 2016
1 parent 9ca4e15 commit e3c7e48
Show file tree
Hide file tree
Showing 20 changed files with 565 additions and 4 deletions.
5 changes: 5 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
^.*\.Rproj$
^\.Rproj\.user$
^README\.Rmd$
^README-.*\.png$
^\.travis\.yml$
^CONDUCT\.md$
^README\.md$
5 changes: 5 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Sample .travis.yml for R projects

language: r
warnings_are_errors: true
sudo: required
25 changes: 25 additions & 0 deletions CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Contributor Code of Conduct

As contributors and maintainers of this project, we pledge to respect all people who
contribute through reporting issues, posting feature requests, updating documentation,
submitting pull requests or patches, and other activities.

We are committed to making participation in this project a harassment-free experience for
everyone, regardless of level of experience, gender, gender identity and expression,
sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.

Examples of unacceptable behavior by participants include the use of sexual language or
imagery, derogatory comments or personal attacks, trolling, public or private harassment,
insults, or other unprofessional conduct.

Project maintainers have the right and responsibility to remove, edit, or reject comments,
commits, code, wiki edits, issues, and other contributions that are not aligned to this
Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed
from the project team.

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
opening an issue or contacting one or more of the project maintainers.

This Code of Conduct is adapted from the Contributor Covenant
(http:contributor-covenant.org), version 1.0.0, available at
http:https://contributor-covenant.org/version/1/0/0/
13 changes: 11 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,17 @@
Package: safebrowsing
Title: What the Package Does (one line, title case)
Version: 0.0.0.9000
Version: 0.1.0.9000
Authors@R: c(person("Bob", "Rudis", email = "[email protected]", role = c("aut", "cre")))
Description: What the package does (one paragraph).
Depends: R (>= 3.2.3)
Depends:
R (>= 3.2.0)
License: AGPL + file LICENSE
LazyData: true
Suggests:
testthat
Imports:
purrr,
dplyr,
httr,
V8
RoxygenNote: 5.0.1
13 changes: 11 additions & 2 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,2 +1,11 @@
# Generated by roxygen2: fake comment so roxygen2 overwrites silently.
exportPattern("^[^\\.]")
# Generated by roxygen2: do not edit by hand

export(gsb_as_ts)
export(gsb_asinfo)
export(gsb_site_status)
import(V8)
import(httr)
importFrom(dplyr,bind_rows)
importFrom(dplyr,filter)
importFrom(dplyr,select)
importFrom(purrr,map_df)
8 changes: 8 additions & 0 deletions R/aaa.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
S_GET <- purrr::safely(httr::GET)

UA <- paste0("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) ",
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 ",
"Safari/537.36", sep="", collapse="")

REF1 <- "https://www.google.com/transparencyreport/safebrowsing/malware/?hl=en"
REF2 <- "https://www.google.com/transparencyreport/safebrowsing/diagnostic/"
44 changes: 44 additions & 0 deletions R/as-series.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#' Retrieve attack/compromised host time series info for an AS
#'
#' @param as AS number to query (without the "\code{AS}" prefix)
#' @param count return counts (\code{TRUE}) or perecentages (i.e. rate)
#' @return \code{data.frame} (\code{tbl_df}) with (probably) massive time series info
#' @export
#' @examples \dontrun{
#' gsb_as_ts("10439")
#' }
gsb_as_ts <- function(as, count=TRUE) { # count==FALSE gets %

count <- if (count) "COUNT" else "RATE"
as <- sub("^as", "", as, ignore.case=TRUE)

SPATH <-
sprintf("transparencyreport/jsonp/sb/malware/ts/%s/?a=%s&t=ATTACK&t=COMPROMISED&c=",
as, count)

res <- S_GET("https://www.google.com",
path=SPATH,
add_headers(Accept="*/*",
Referer=REF1,
`User-Agent`=UA))

if (!is.null(res$result)) {

js <- content(res$result, as="text")

.pkgenv$ctx$eval(sprintf("var dat=%s", js))

dat <- data.frame(.pkgenv$ctx$get("dat"), stringsAsFactors=FALSE)
dat$date <- as.Date(as.POSIXct(dat$time.series.1/1000,
origin='1970-01-01', tz="UTC"))
dat <- select(dat,
date, asn, name, description, attack=time.series.2,
compromised=time.series.3, -sites.types, -time.series.1)

return(dplyr::tbl_df(dat))

} else {
stop("Error retrieving data safesearch detailed asn info", call.=FALSE)
}

}
81 changes: 81 additions & 0 deletions R/gsb-asinfo.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# get the data behind https://www.google.com/transparencyreport/safebrowsing/malware/

#' Retrive AS info from Google SafeBrowsing
#'
#' @param as_size return results for \code{all} ASNs or just the \code{largest}?
#' @param time_range number of days of history for the time series
#' @param type_detected one of \code{both}, \code{attack} or \code{compromised}.
#' @param region ISO2 region code or \code{all}
#' @param .progress display a progress bar?
#' @note Depending on the parameters used (especially \code{time_range}),
#' this opereation could take a while. You should probably turn on
#' progress bars for any query > 7 days.
#' @return \code{data.frame} (\code{tbl_df}) with ASN info
#' @references See \href{https://www.google.com/transparencyreport/safebrowsing/malware/?hl=en}{Google's page}
#' for more information on how they scan & report ASN info.
#' @export
#' @examples \dontrun{
#' gsb_asinfo("largest", 7)
#' }
gsb_asinfo <- function(as_size=c("largest", "all"),
time_range=90,
type_detected=c("both", "attack", "compromised"),
region="all",
.progress=FALSE) {

as_size <- match.arg(as_size, c("largest", "all"))
type_detected <- match.arg(type_detected, c("both", "attack", "compromised"))

first <- get_as_info_page(0, as_size, time_range, type_detected, region)

if (.progress & interactive()) pb <- txtProgressBar(0, first$`page-count`, style=3)

map_df(1:(first$`page-count`-1), function(pg) {
if (.progress & interactive()) setTxtProgressBar(pb, pg)
res <- get_as_info_page(pg, as_size, time_range, type_detected, region)
res$table
}) -> asn_tbl_pages

if (.progress & interactive()) close(pb)

dplyr::bind_rows(first$table, asn_tbl_pages)

}

get_as_info_page <- function(page=0,
as_size=c("largest", "all"),
time_range=90,
type_detected=c("both", "attack", "compromised"),
region="all") {

as_size <- toupper(match.arg(as_size, c("largest", "all")))
type_detected <- toupper(match.arg(type_detected, c("both", "attack", "compromised")))
if (region == "all") region <- ""

res <- S_GET("https://www.google.com",
path="transparencyreport/jsonp/sb/malware/table/",
query=list(t=type_detected,
d=time_range,
z=as_size,
p=page,
r=region,
c=""),
httr::add_headers(Accept="*/*",
Referer=REF1,
`User-Agent`=UA))

if (!is.null(res$result)) {

js <- httr::content(res$result, as="text")

.pkgenv$ctx$eval(sprintf("var dat=%s", js))

dat <- .pkgenv$ctx$get("dat")

return(dat)

} else {
stop("Error retrieving data safesearch asn info", call.=FALSE)
}

}
9 changes: 9 additions & 0 deletions R/safebrowsing-package.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#' A package to ...
#'
#' @name safebrowsing
#' @docType package
#' @author Bob Rudis (@@hrbrmstr)
#' @import httr V8
#' @importFrom dplyr select bind_rows filter
#' @importFrom purrr map_df
NULL
56 changes: 56 additions & 0 deletions R/site-status.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#' Retrive URL "site status"
#'
#' Pass in a vector of URLs and the Google Safe Browsing Site Status
#' for the URLs will be returned in a \code{data.frame} (\code{tbl_df})
#'
#' @param site_url non-relative URL to check
#' @return \code{data.frame} (\code{tbl_df}) with site status
#' @export
#' @examples \dontrun{
#' gsb_site_status(c("http:https://fgcdiesel.cl/encrypted.exe",
#' "http:https://rud.is/",
#' "http:https://dds.ec/"))
#' }
gsb_site_status <- function(site_urls) {
purrr::map_df(site_urls, site_status)
}

site_status <- function(site_url) {

res <- S_GET("https://www.google.com",
path="safebrowsing/diagnostic",
query=list(site=site_url,
output="jsonp"),
httr::add_headers(Accept="*/*",
Referer=REF2,
`User-Agent`=UA))

if (!is.null(res$result)) {

js <- httr::content(res$result, as="text", encoding="UTF-8")

js <- sub("^.*processResponse\\(", "", js)
js <- sub("\\);$", "", js)

.pkgenv$ctx$eval(sprintf("var dat=%s", js))

tmp <- .pkgenv$ctx$get("dat")
tmp <- as.list(unlist(tmp))

tmp_names <- names(tmp)
tmp_names <- sub("^website\\.", "", tmp_names)
tmp_names <- gsub("\\.", "_", tmp_names)
tmp_names <- gsub("([a-z])([A-Z])", "\\1_\\L\\2", tmp_names, perl=TRUE)
tmp_names <- tolower(sub("^(.[a-z])", "\\L\\1", tmp_names, perl=TRUE))

names(tmp) <- tmp_names

dat <- dplyr::tbl_df(as.data.frame(tmp, stringsAsFactors=FALSE))

return(dat)

} else {
stop("Error retrieving data safesearch detailed url info", call.=FALSE)
}

}
13 changes: 13 additions & 0 deletions R/zzz.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.pkgenv <- new.env(parent=emptyenv())

.onAttach <- function(...) {

ctx <- V8::v8()
assign("ctx", ctx, envir=.pkgenv)

if (!interactive()) return()

packageStartupMessage(paste0("safebrowsing is under *active* development. ",
"See https://github.com/hrbrmstr/safebrowsing for changes"))

}
73 changes: 73 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
output:
md_document:
variant: markdown_github
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```

`safebrowsing` : retrieve ASN & URL info from Google Safe Browsing

If you're not familiar with Google's "Safe Browing" "service", you should probably [go here](https://www.google.com/transparencyreport/safebrowsing/?hl=en) and read up on it before using this package.

Note also that this is package relies on undocumented APIs and could break if Google changes how they call the underlying XHR requests.

The following functions are implemented:

- `gsb_as_ts`: Retrieve attack/compromised host time series info for an AS
- `gsb_asinfo`: Retrive AS info from Google SafeBrowsing
- `gsb_site_status`: Retrive URL "site status"

### News

- Version released

### Installation

```{r eval=FALSE}
devtools::install_github("hrbrmstr/safebrowsing")
```

```{r echo=FALSE, message=FALSE, warning=FALSE, error=FALSE}
options(width=120)
```

### Usage

```{r}
library(safebrowsing)
# current verison
packageVersion("safebrowsing")
gsb_site_status(c("http:https://fgcdiesel.cl/encrypted.exe",
"http:https://rud.is/", "http:https://dds.ec/"))
gsb_asinfo("largest", 7)
gsb_as_ts("10439")
```

### Test Results

```{r}
library(safebrowsing)
library(testthat)
date()
test_dir("tests/")
```

### Code of Conduct

Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md).
By participating in this project you agree to abide by its terms.
Loading

0 comments on commit e3c7e48

Please sign in to comment.