Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxize and taxizedb return different tibble structures with children when db = "itis" #78

Open
KaiAragaki opened this issue Apr 12, 2024 · 2 comments

Comments

@KaiAragaki
Copy link

Happy to make a PR to attempt to fix this if you'd like, but since this could be a breaking change I thought I'd get your eyes on it first.

I suppose that since taxizedb is a drop-in replacement for taxize, it would probably be best to conform to whatever taxize returns - but that is obviously your call.

taxize

taxize::children(145395, db = "itis")
$`145395`
# A tibble: 2 × 5
  parentname  parenttsn rankname taxonname              tsn   
  <chr>       <chr>     <chr>    <chr>                  <chr> 
1 Toropamecia 145395    Species  Toropamecia punctata   145396
2 Toropamecia 145395    Species  Toropamecia reticulata 145398

attr(,"class")
[1] "children"
attr(,"db")
[1] "itis"

taxizedb

r$> taxizedb::children(145395, db = "itis")
$`145395`
# A tibble: 2 × 4
      id rank_id name                   rank   
   <int>   <int> <chr>                  <chr>  
1 145396     220 Toropamecia punctata   species
2 145398     220 Toropamecia reticulata species

attr(,"class")
[1] "children"
attr(,"db")
[1] "itis"
Session Info
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 21.1

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] testthat_3.2.1 devtools_2.4.5 usethis_2.2.3 

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 remotes_2.4.2.1   lattice_0.22-5    vctrs_0.6.5       tools_4.3.2       generics_0.1.3   
 [7] curl_5.2.1        parallel_4.3.2    RSQLite_2.3.5     tibble_3.2.1      fansi_1.0.6       blob_1.2.4       
[13] pkgconfig_2.0.3   data.table_1.15.0 dbplyr_2.4.0      uuid_1.2-0        lifecycle_1.0.4   conditionz_0.1.0 
[19] compiler_4.3.2    stringr_1.5.1     brio_1.1.4        taxizedb_0.3.1    ritis_1.0.0       codetools_0.2-19 
[25] httpuv_1.6.14     htmltools_0.5.7   later_1.3.2       pillar_1.9.0      crayon_1.5.2      urlchecker_1.0.1 
[31] ellipsis_0.3.2    solrium_1.2.0     cachem_1.0.8      sessioninfo_1.2.2 iterators_1.0.14  foreach_1.5.2    
[37] nlme_3.1-163      mime_0.12         tidyselect_1.2.0  digest_0.6.34     stringi_1.8.3     dplyr_1.1.4      
[43] purrr_1.0.2       fastmap_1.1.1     grid_4.3.2        cli_3.6.2         magrittr_2.0.3    triebeard_0.4.1  
[49] bold_1.3.0        crul_1.4.0        pkgbuild_1.4.3    utf8_1.2.4        ape_5.7-1         withr_3.0.0      
[55] rappdirs_0.3.3    promises_1.2.1    bit64_4.0.5       bit_4.0.5         zoo_1.8-12        memoise_2.0.1    
[61] shiny_1.8.0       taxize_0.9.100    miniUI_0.1.1.1    hoardr_0.5.4      urltools_1.7.3    profvis_0.3.8    
[67] rlang_1.1.3       Rcpp_1.0.12       DBI_1.2.2         xtable_1.8-4      glue_1.7.0        httpcode_0.3.0   
[73] xml2_1.3.6        pkgload_1.3.4     jsonlite_1.8.8    R6_2.5.1          plyr_1.8.9        fs_1.6.3    
@KaiAragaki
Copy link
Author

The only intersects between taxize and taxizedb databases are ncbi and itis (keeping in mind #80, which excludes worms and bold).

A deeper comparison:

NCBI

  • taxizedb: id name rank
  • taxize: childtaxa_id childtaxa_name childtaxa_rank

ITIS

  • taxizedb: id rank_id name rank
  • taxize: parentname parenttsn rankname taxonname tsn

where id = tsn, name = taxonname, and rank is similar to rankname (capitalization differs). rank_id has no equivalent.

It's your call if you think that the lack of harmony is a bug or a feature - frankly I quite like the standard interface of names that taxizedb provides, but if most people who use taxizedb are those moving from taxize, a more harmonized solution might be preferable.

@stitam
Copy link
Collaborator

stitam commented May 29, 2024

Thanks @KaiAragaki for opening this issue.

It would be nice if taxize and taxizedb user interfaces were more harmonised but it's unclear to me whether the extra effort from the two teams to maintain this harmony would be justified. Currently these are independent projects.

What I think "is" an issue here is that the output of taxizedb::children() should have the same structure regardless of db. However, db = "itis" returns an extra column which I think is not needed. A named list with three columns, "id", "name", "rank", in this order, regardless of db would probably be a more streamlined behaviour. What do you think?

I'm okay with breaking changes, CRAN does not list any reverse dependencies and the current version number clearly indicates that taxizedb is under development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants