Skip to content

🔍 Scour your data frames. Leave no row untouched. 🔍

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

daranzolin/dfdetective

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dfdetective

Column names are sometimes opaque. And sometimes there are hundreds of them. The goal of dfdetective is to scour data frames when you have only a vague idea what you’re looking for. The functions are mostly thin wrappers around combinations of lapply()...unique()...grep().

Installation

You can install the released version of dfdetective from GitHub via:

remotes::install_github("daranzolin/dfdetective")

Example A: You want to search all rows to match a pattern.

library(dfdetective)
library(fivethirtyeight)
library(dplyr)

find_unique_vals(antiquities_act, "chaco")
#> # A tibble: 3 x 3
#>   col_name    unique_values                                 n_unique_values
#>   <chr>       <chr>                                                   <int>
#> 1 current_na… Devils Tower National Monument, El Morro Nat…             151
#> 2 original_n… "NA, Petrified Forest National Monument, Cha…              67
#> 3 action      Established, Enlarged, Deleted, Diminished, …              62

agencies <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-15/agencies.csv")
find_unique_vals(agencies, "^B$")
#> # A tibble: 1 x 3
#>   col_name unique_values n_unique_values
#>   <chr>    <chr>                   <int>
#> 1 class    D, C, B                     3

Example B: You want to find columns with n unique variables

This is especially useful when searching for binary variables.

find_n_unique_vals(comic_characters, 2:4) # You can pass a numeric vector to the n_unique argument
#> # A tibble: 3 x 3
#>   col_name  unique_values                                   n_unique_values
#>   <chr>     <chr>                                                     <int>
#> 1 publisher Marvel, DC                                                    2
#> 2 align     Good Characters, NA, Bad Characters, Reformed …               4
#> 3 alive     Living Characters, Deceased Characters, NA                    3

Example C: You know the column names and you want to check the unique values

get_unique_vals(san_andreas, contains("worry")) #you can pass classic dplyr select helpers, e.g. starts_with(), contains(), etc.
#> # A tibble: 2 x 3
#>   col_name    unique_values                                 n_unique_values
#>   <chr>       <chr>                                                   <int>
#> 1 worry_gene… Not at all worried, Somewhat worried, Not so…               5
#> 2 worry_bigo… Not so worried, Very worried, Somewhat worri…               5

About

🔍 Scour your data frames. Leave no row untouched. 🔍

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages