Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAN task view proposal: Paleontology #57

Open
willgearty opened this issue Sep 19, 2023 · 5 comments
Open

CRAN task view proposal: Paleontology #57

willgearty opened this issue Sep 19, 2023 · 5 comments

Comments

@willgearty
Copy link
Contributor

willgearty commented Sep 19, 2023

Scope

Computational paleontology (or paleobiology) is a thriving field. Gone are the days of just digging up fossils; paleontologists now have the luxury of being able to perform a wide array of complex computational analyses on local and global compendia of fossil occurrence, phylogenetic, and morphological data to study the functional and phylogenetic evolution of organisms, ecosystem function and ecological interactions, paleobiogeographic patterns, and more. Until recently, computational paleontologists have mostly relied on resources designed for evolutionary biologists, ecologists, GISers, and data scientists to accomplish such analyses. However, slowly but surely, resources (including explicit R packages) are being developed to cater to these paleontological tasks.

This CTV brings together a) a collection of traditional packages that are often seen in use in standard computational paleontological workflows, b) more recent paleontological or paleo-adjacent packages that are commonly in use in paleontology, and c) cutting edge paleo-explicit packages that we believe should be adopted by the paleontological community. Therefore, the purpose of this CTV is to provide young and old paleontologists something of a guide to developing a wide variety of computational paleontological workflows. We have included packages (~50 at the moment) that span both the data acquisition/cleaning and analytical components of such workflows, with analyses covering paleoecology, paleobiogeography, phylogenetics, and more (see sections below).

We have excluded many of the most common packages (e.g., tidyverse, sf) because they are often imported by packages in this CTV and they are often covered exhaustively in other CTVs and guides. Further, we have excluded older packages that have been superseded by more robust and/or featureful newer packages (e.g., there are a ~million packages related to ENM, but we have only included a handful). We also recognize that there are many other packages out there that are relevant to or explicitly for paleontology (we originally built a list of ~140 packages that we whittled down to the list below). We excluded most of these packages because we, as a group, had little experience with them or because the packages seemed unfinished or too niche to be useful. However, we'd love to hear from anyone that might have suggestions about other packages to include/exclude. Finally, where applicable, we plan to direct users to other CTVs that overlap in scope (see below).

Packages

Data acquisition

mapast, neotuma2, paleobioDB, rgbif, rgplates, ridigbio, chronosphere

Data cleaning

CoordinateCleaner, fossilbrush, palaeoverse

Data visualization

deeptime, ggtern, ggtree, SDAR, StratigrapheR, tidypaleo, geoChronR, rphylopic

Paleoecology

ade4, dismo, ecospace, ENMeval, ENMTools, fossil, fundiversity, vegan

Paleobiogeography and biodiversity

BAT, Compadre, divDyn, divvy, iNext, sepkoski

Phylogenetics

caper, diversitree, fbdR, FossilSim, geiger, mvMORPH, paleobuddy, paleotree, phytools, strap

Morphology

geomorph, Claddis, dispRity, morphospace

Time series

paleoTS, evoTS, layeranalyzer

Overlap

There is considerable overlap of the scope of this proposed CTV with the scope of other CTVs, including Environmetrics, Phylogenetics, TimeSeries, and Spatial. This stems from the fact that this proposed CTV is subject-oriented, rather than methodology-oriented. This doesn't appear to be an exception, though, given there are already CTVs on other subjects (e.g., ChemPhys). Further, this CTV is focused on which packages in these other CTVs may be used specifically within computational paleontological workflows.

Maintainers

Principal maintainer: @willgearty (also the principal maintainer of the Phylogenetics CTV)
Co-maintainers: @AlfioAlessandroChiarenza, @bethany-j-allen, @ChristopherDavidDean, @KEichenseer, @LewisAJones, and @pedrolgodoy
(this is a @palaeoverse project)

@zeileis
Copy link
Contributor

zeileis commented Oct 2, 2023

Thanks for the proposal, Will @willgearty, and apologies for the slow response! I've finally had a closer look.

I like the proposal but I'm not fully convinced, yet, that the task view will be sufficiently separated from the existing task views. Relatedly, your process of package selection appears to be somewhat subjective - which we try to avoid in task views by adopting clear inclusion/exclusion criteria. Especially, excluding packages that you feel are too old or that you have no experience with, is too subjective.

Hence, I would ask you to establish sufficiently clear rules for inclusion/exclusion of a package, e.g., that it must be explicitly geared towards paleontology or something like that. And rules that would necessitate some individual review process (e.g., to determine whether a package is "useful" or "finished") should be avoided.

Regarding the maintainers: It's great to see an active community proposing a task view. Seven maintainers might still be feasible but maybe a smaller team would be easier to coordinate? Others could still contribute through issues and PRs. Also, I'm not sure whether the palaeoverse community is already so diverse and heterogeneous so that different palaeological views are reflected in it. Or would it help to bring in maybe one person from the outside as well?

I'm also pinging the principal maintainers of the Spatial, SpatioTemporal, and Environmetrics task views here: @rsbivand, @edzer, @gavinsimpson. Maybe you have some thoughts/ideas as well?

@willgearty
Copy link
Contributor Author

willgearty commented Oct 3, 2023

Thanks @zeileis for the helpful comments.

We are certainly open to defining clearer rules for package inclusion/exclusion. I think if we are as exclusive as "explicitly geared towards paleontology", we'll be leaving lots of commonly used packages out (but you are right in that it would then be a very clear rule). However, most, if not all of these excluded packages are already in other task views, so they would at least already be covered there.

We'll give a little time for other folks to provide their thoughts/ideas as well, then we'll look into revising accordingly.

@tuxette
Copy link
Contributor

tuxette commented Oct 4, 2023

Hi all! I am also unsure but, as I see it, the overlap with Phylogenetics is also non negligeable (but you know the TV better than I do). In short, what is not clear for me is: "do you have in mind at least some core packages that are very specific to Paleontology and not just to other related topics but useful for Paleontology in you list?" My question is probably quite naive (maybe these are clearly listed in your proposal but I am not able to identify them). These are the packages that, somehow, should be put forward in your TV, mentioning packages that have a larger broad but can be useful for the field afterward. But again, my comment might be completely wrong.

@willgearty
Copy link
Contributor Author

willgearty commented Jun 24, 2024

My deepest apologies (to my co-maintainers and the CTV editors) for the horrible delay in responding to the feedback here. Despite some reservations, we've decided to go for a more conservative approach, as suggested by @zeileis, that includes only packages that are either explicitly designed for paleontology or are explicitly advertised to paleontologists (it appears this is similar to the approach of the Agriculture CTV, for example).

There are many other packages that paleontologists use as part of their workflows, and so, as part of the development of this CTV, we plan to suggest many of these packages to other CTVs where we believe they will be appropriate. We then plan to link out to these CTVs to ensure that users of the Paleontology CTV can find all of the resources that they may need for their highly interdisciplinary work (see below).

@tuxette there isn't a lot of interpackage dependencies in paleontology, so I wouldn't say any packages really stand out as "core" packages. However, if I had to pick a handful of packages based solely on their breadth of use, I would probably say palaeoverse, paleotree, and paleobioDB, but I'm probably biased. I'd be happy to look into download numbers in the future to identify which packages are most widely used before finalizing the list of "core" packages.

Here is an updated proposal for the Paleontology CTV:

Scope

Computational paleontology (or paleobiology) is a thriving field. Gone are the days of just digging up fossils; paleontologists now have the luxury of being able to perform a wide array of complex computational analyses on local and global compendia of fossil occurrence, phylogenetic, and morphological data to study the functional and phylogenetic evolution of organisms, ecosystem function and ecological interactions, paleobiogeographic patterns, and more. Until recently, computational paleontologists have mostly relied on resources designed for evolutionary biologists, ecologists, GISers, and data scientists to accomplish such analyses. However, slowly but surely, resources (including explicit R packages) are being developed to cater to these paleontological tasks.

This CTV brings together the vast majority of paleontological or paleo-adjacent packages that are in use in paleontology. The purpose of this CTV is to provide young and old paleontologists something of a guide to developing a wide variety of computational paleontological workflows. We have included packages (~50 at the moment) that span both the data acquisition/cleaning and analytical components of such workflows, with analyses covering paleoecology, paleobiogeography, phylogenetics, and more (see sections below).

We have excluded many of the most common packages (e.g., tidyverse, sf) because they are often imported by packages in this CTV and they are often covered exhaustively in other CTVs and guides. Further, to keep the list manageable, we also do not include packages that are often used in paleontological workflows but are not explicitly designed for or advertised to paleontologists. Where applicable, we plan to direct users to other CTVs that include many of these packages (and also plan to submit recommendations to these CTVs as necessary).

Packages

Data acquisition

chronosphere, folio, neotoma2, paleobioDB, rgbif, rgplates, ridigbio, rmacrostrat, rpaleoclim

Data cleaning

CoordinateCleaner, fossilbrush, palaeoverse

Data visualization

deeptime, GEOmap, rphylopic, SDAR, StratigrapheR, tidypaleo

Paleoecology

analogue, ecospace, fossil, rioja (and Environmetrics CTV)

Paleobiogeography and biodiversity

Compadre, divDyn, divvy, hespdiv, ppgm, sepkoski (and Spatial CTV)

Phylogenetics

CladeDate, fbdR, FossilSim/FossilSimShiny, paleobuddy, paleotree, RRphylo, strap (and Phylogenetics CTV)

Morphology

morphospace (and Phylogenetics CTV)

Time series

adePEM, astrocron, evoTS, paleoTS, RRatepol (and TimeSeries CTV)

Paleoclimate and Earth System variables

Bchron, cRacle, DAIME, geoChronR, isogeochem, pastclim, sedproxy

Overlap

Only 10 of the proposed packages are included in other CTVs (rgbif, analogue, rioja, FossilSim, paleobuddy, paleotree, strap, paleoTS, deeptime, and GEOmap).

@willgearty
Copy link
Contributor Author

@zeileis @tuxette Bumping this since the summer is wrapping up. Please let me know what you think of the new proposal!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants