Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAN Task View: NetworkAnalysis #61

Open
FATelarico opened this issue Apr 14, 2024 · 25 comments
Open

CRAN Task View: NetworkAnalysis #61

FATelarico opened this issue Apr 14, 2024 · 25 comments

Comments

@FATelarico
Copy link

FATelarico commented Apr 14, 2024

Scope

The proposed CRAN Task View contains a list of packages that can be used for dealing with networks (also known as relational data and graphs).

Packages

Core packages include:

intergraph
igraph
statnet
sna
network

The other packages:

graph
BoolNet
egor
ionet
networkDynamic
tidygraph
centiserve
birankr
goldfish
amen
ergm
ergm.count
ergm.ego
ergm.multi
ergm.rank
ergmgp
ergmito
biergm
dnr
bootnet
localboot
dyads
fastnet
multinets
nda
baycn
BayesianNetwork
implements
bgms
bnma
econetwork
AnimalHabitatNetwork
aniSNA
assocInd
ATNr
BIEN
bipartite
cassandRa
bibliometrix
bibliometrixData
biblionetwork
Diderot
c3net
Ac3net
ahnr
BASiNET
bionetdata
Cascade
evolqg
NetworkToolbox
qgraph
HospitalNetwork
geonetwork
chessboard
epanet2toolkit
intensitynet
epinet
hybridModels
netdiffuseR
FinNet
ITNr
modnets
multinet
visNetwork
networkD3
bipartiteD3
diagram
ndtv
neatmaps
ggnetwork
ggraph
ggsom
graphlayouts
cencrne
linkcomm
concoR
blockmodeling
BlockmodelingGUI
kmBlock
dBlockmodeling
signnet
blockmodels
sbm
dynsbm
MLVSBM
StochBlock
GREMLINS

Overlap

The only TaskView that could overlap with thematic ones (e.g., epanet2toolkit is also in the hidrology TaskView), but this is ineherent ot a method-oriented CTV as sopposed to a substantive one.

In general, there does not appear to be substantial overlap with existing CRAN task views.

Maintainers

  • Main maintainer: Fabio Ashtar Telarico (@FATelarico)
  • Co-maintainers: Carl Nordlund, Saint-Clair Chabert-Liddell
@zeileis
Copy link
Contributor

zeileis commented Apr 14, 2024

Thanks for the proposal Fabio @FATelarico & Co! I appreciate that you have compiled this list of packages along with the corresponding description. This is a useful overview of a collection of packages on block modeling. However, I think that this is too specialized and not substantial enough for a standalone task view.

I'm also cc'ing Bettina @bettinagruen as the principal maintainer of the "Cluster" task view to see what she thinks and whether this could become a section in the "Cluster" task view.

However, overall my feeling is that this would better fit a "Network Analysis" task view which we currently do not have. Also I'm cc'ing Søren @hojsgaard in case he has any thoughts or recommendations.

@bettinagruen
Copy link

I agree that this would be best suited for a Network Analysis task view.

The Cluster task view contains already a number of sections. One could thus rather easily add a section on block modeling. However, the description of the packages would need to be more detailed covering each package separately to be in line with how other packages are described in the task view. Presumably none of the packages would then also qualify as core for this more general task view.

@FATelarico FATelarico changed the title CRAN Task View: Blockmodeling CRAN Task View: NetworkAnalysis Apr 15, 2024
@FATelarico
Copy link
Author

Following on your suggestions the porposed task view was updated to address the entire field of network analysis

@tuxette
Copy link
Contributor

tuxette commented Apr 15, 2024

Thank you @FATelarico ! I checked your new proposal and to me, the core packages are well identified and relevant. However,

  • The two sections that describe the different functions of some of the packages is not (because task view are mainly made to describe the packages and not their features / functions in such a precise way: the user guide is made for this).
  • Overall, the current proposal is organized by purpose with many redundancy among packages in the different topics (because you describe the functions more than the packages): this would also be good to limite this redundancy.
  • Sometimes you forgot to use the macro r pkg( to cite a package and sometimes the references do not seem to be cited in the Reference section.
  • I think that the package blockmodels could be added to the Block model section.
  • Similarly to Bayesian network inference, other packages perform network inference with other type of models, like huge or glasso (among many) for Gaussian Graphical Models and GENIE3 (Bioconductor) for inference with RF.
  • Some regression methods also use networks as input and could be added to the TV as well (e.g., genlasso).

@zeileis
Copy link
Contributor

zeileis commented Apr 15, 2024

Fabio, thanks for all the work and the quick revision! This is a useful start for a network analysis task view. In addition to Nathalie's comments a few additional thoughts:

  • The co-maintainers are still the ones from your blockmodeling proposal but it would be good to bring in a couple of persons with more expertise in statnet/sna/ergm.
  • The inclusion/exclusion criteria should be worked out better.
  • The overlap with "Graphical Models" (maintained by Søren @hojsgaard) needs to be addresses in the inclusion/exclusion criteria. Especially with regard to graph/network infrastructure (basic computations, manipulations, visualizations) and with regard to Bayesian networks.

@FATelarico
Copy link
Author

Thanks to the editors for their comments.

@tuxette

  • I am aware that usually functions are not described in task views and those sections have now been removed. I had originally thought they could be useful because many people who begin doing network analysis are often perplexed about whether to use igraph or statnet/sna and only realise they picked up the wrong package for their needs after they are already shoulders deep in it.
  • All mentioned packages should be tagged with the correct macro now (the problems were mostly in the last section Clustering-Others);
  • After double-checking, ] blockmodels seems to be already included;
  • As the inclusion/exclusion criteria was refined (see below), this set of packages became less relevant as they would better fit in the GraphicalModels CTV;
  • Similarly, packages offering methods to run regression over graphs representing variables may rather belong to the GraphicalModels CTV than here.

@zeileis

  • Emails were sent out to other possible co-maintainers from the ergm development team
  • The inclusion/exclusion criterion was refined
  • The distinction between network analysis and graphical modeling was introduced, references to the that CTV are provided in the section on Network Modeling as well as in the introduction.

@tuxette
Copy link
Contributor

tuxette commented Apr 18, 2024

Thanks! I liked the new version. However, I am not convinced by your section on "Bio-Chemical Networks": In your answer above, you explain that network inference is out-of-the-scope of your TV (and I could agree with that) but you are citing three packages for network inference that are far from being the most known and used (citing c3net and not WGCNA, among others, seem to a highly biased choice). Also, I don't get the relation between the TV topic and evolqg.

A few additional minor remarks:

  • In Section "Bio-Chemical Networks", you have an empty bullet point line.
  • The titles of your sections are not all capitalized similarly.
  • In Section "Bio-Chemical Networks", the reference "Simoes and Emmert-Streib" is not formatted properly.
  • You have a typo in the title with the word "Psychology" in it.
  • In Section "Social and Economic networks", sna is not cited with the proper macro.
  • I think that "Extension for ggplot2" should be "Extensions for ggplot2".
  • The package https://cran.r-project.org/web/packages/greed/index.html could also be worth citing.

@zeileis
Copy link
Contributor

zeileis commented Apr 18, 2024

I agree that there is good progress here. However, I feel that the inclusion/exclusion criteria are not as clear yet as they should be. Getting contributions/feedback from someone with more ergm expertise would probably be good. And the separation with graphical models has also some room for improvement.

Hence, I'm pinging Søren @hojsgaard again: Could you please have a look at the proposal?

And I suggest that we wait until you have feedback/suggestions from two more potential co-maintainers who can increase the diversity among the maintainer team.

@krivit
Copy link

krivit commented Apr 23, 2024

Pavel Krivitsky from Statnet here. Thank you, @FATelarico, for inviting me. I'll read through the discussion and the draft in detail later, but I want to flag a few items as a matter of first impression.

  1. statnet is a metapackage: all it does is pull in the most popular packages from the project. The actual functionality is in its reverse-dependencies. It's been a while since I've looked at what's in igraph, but loosely, igraphnetwork + sna.
  2. There is a number of dynamic network packages that aren't listed (tergm, relevent, btergm, tsna, just off the top of my head). We may want a dynamic network section.
  3. There is the EpiModel suite of packages that builds on Statnet's for epidemic modelling.
  4. It may make sense to split packages based on the kinds of questions they answer. E.g., clustering tells you which nodes belong to each group, whereas ERGMs tell you about the "big picture" social forces.

@zeileis
Copy link
Contributor

zeileis commented Apr 24, 2024

Pavel @krivit, thank you for your inputs, this is very much appreciated! I think it would be great if you could change the team of co-maintainers of the task view, in order to bring a new perspective to the team based on your expertise.

For a short introduction to the idea of CRAN task views and the corresponding file format, see Documentation.md. If you are interested in more details and some background information, see doi:10.48550/arXiv.2305.17573.

Some quick feedback regarding the points you raised:

  1. For the task view it would be good to list both statnet and the constituting packages and briefly explain what they do. Similarly, both igraph and network + sna should be listed and explained. The main purpose of task views is to provide an overview - and not to endorse/recommend the best packages for a given task.
  2. Sounds like a good idea to me.
  3. EpiModel is listed in the Epidemiology task view. So for the topic of "disease networks" etc. I would simply link to that task view.
  4. Sounds like a good idea to me.

Thanks & best wishes!

@krivit
Copy link

krivit commented Apr 24, 2024

Pavel @krivit, thank you for your inputs, this is very much appreciated! I think it would be great if you could change the team of co-maintainers of the task view, in order to bring a new perspective to the team based on your expertise.

@FATelarico , if I want to make edits, should I use PRs or push to the repository directly?

For a short introduction to the idea of CRAN task views and the corresponding file format, see Documentation.md. If you are interested in more details and some background information, see doi:10.48550/arXiv.2305.17573.

Thanks!

Some quick feedback regarding the points you raised:

1. For the task view it would be good to list both `statnet` and the constituting packages and briefly explain what they do. Similarly, both `igraph` and `network` + `sna` should be listed and explained. The main purpose of task views is to provide an overview - and not to endorse/recommend the best packages for a given task.

This is more about functionality rather than endorsement. The short of it is that network contains tools for managing the data structure, and sna contains EDA tools for networks, which can use both network objects and edgelists, as well as some inferential tools (e.g., QAP and MRQAP). igraph, from what I understand, contains both the data structure management tools and the EDA tools.

3. `EpiModel` is listed in the [Epidemiology](https://CRAN.R-project.org/view=Epidemiology) task view. So for the topic of "disease networks" etc. I would simply link to that task view.

I don't think there is any harm in doing both.

4. Sounds like a good idea to me.

A good phrasing might be challenging to come up with, but I suppose we can play around with it and see what happens.

@zeileis
Copy link
Contributor

zeileis commented Apr 24, 2024

Re: EpiModel. We try to avoid overlap, if possible, in order to keep the task views more focused and more manageable (both for readers and for maintainers).

In this case, my feeling is that the scope of the package belongs rather clearly to "Epidemiology" and thus I would avoid the duplication. Feel free to iterate, if I'm missing something here (e.g., if EpiModel contains algorithms that will often be used in other network analyses, beyond infectious disease modeling).

If you feel that the "Epidemiology" task view should have a dedicated section on disease networks, I would encourage you to raise this with the Epidemiology task view maintainers.

@FATelarico
Copy link
Author

FATelarico commented Apr 26, 2024

@FATelarico , if I want to make edits, should I use PRs or push to the repository directly?

@krivit pushing to the main branch is okay, I have a local copy indexed by version being download after every commit.

This is more about functionality rather than endorsement. The short of it is that network contains tools for managing the data structure, and sna contains EDA tools for networks, which can use both network objects and edgelists, as well as some inferential tools (e.g., QAP and MRQAP). igraph, from what I understand, contains both the data structure management tools and the EDA tools.

I agree. As mentioned in the currnt draft, igraph is more of a one-stop shop for data-managing tasks, basic modeling, and clustering. It provides more or less the same data-centered features as network plus some of sna's inferential tools. But many people do not actually need most of what sna has to offer and igraph has so many specialised add-ons/reverse-dependencies that many people prefer/have to use that. Any suggestion on how to elucidate this point further in the text will be welcome!

@FATelarico
Copy link
Author

FATelarico commented Apr 26, 2024

A new draft is online, I apologise for the delay.


@tuxette

citing c3net and not WGCNA, among others, seem to a highly biased choice

The choice was quite arbitrary because none of us is directly involved in this field, but several colleagues highlighted these packages as the 'most relevant'. Incidentially, evolqg should not have been included, as pointed out. After some reading in specialised journals, I edited the list of packages in this section. Namely, besides removing a few packages, BioNAR and WGCNA were added.

A few additional minor remarks:

  • In Section "Bio-Chemical Networks", you have an empty bullet point line.
  • The titles of your sections are not all capitalized similarly.
  • In Section "Bio-Chemical Networks", the reference "Simoes and Emmert-Streib" is not formatted properly.
  • You have a typo in the title with the word "Psychology" in it.
  • In Section "Social and Economic networks", sna is not cited with the proper macro.
  • I think that "Extension for ggplot2" should be "Extensions for ggplot2".
  • The package https://cran.r-project.org/web/packages/greed/index.html could also be worth citing.

@krivit

There is a number of dynamic network packages that aren't listed (tergm, relevent, btergm, tsna, just off the top of my head). We may want a dynamic network section.

I started by adding them either under ergm or in the most relevant sections. Feel free to move these and other dynamic-network packages to a separate section if you think there is enough material for it.

There is the EpiModel suite of packages that builds on Statnet's for epidemic modelling.

Taking also into account @zeileis arguments, I added only a brief metion of EpiModel (because it is officially part of statnet) and linked the relevant CTV.

It may make sense to split packages based on the kinds of questions they answer. E.g., clustering tells you which nodes belong to each group, whereas ERGMs tell you about the "big picture" social forces.

The fact that ERGM is about modeling, simulation, and everything in between makes it difficult to slap a label on it or even put it on par with other approaches. But if you feel there is a satisfactory way to do so, the result would be incredibly useful for new users!


Thanks everyone for the feedback and active involvement!

FATelarico added a commit to FATelarico/ctv-network that referenced this issue Apr 26, 2024
FATelarico added a commit to FATelarico/ctv-network that referenced this issue Apr 26, 2024
For previous release's changelog see: cran-task-views/ctv#61 (comment)
@hojsgaard
Copy link

Dear all,

I am not entirely sure I understand the proposal.

Regarding the GraphicalModels task view my approach has been very pragmatic: Package authors contact me to have their package on the task view and I usually add it. If a few packages appear in more than one task view, then I do not see that as a problem.

Another topic: Perhaps it could be an idea to agree on how packages are described on the task views? I generally copy the package description unless it is too lengthy. Maybe there are other practices?

Best
Søren

@zeileis
Copy link
Contributor

zeileis commented Apr 29, 2024

Søren, thanks for your feedback. Regarding your comments:

  • Connection between NetworkAnalysis and GraphicalModels: Both the GraphicalModels task view, maintained by you, and the newly proposed NetworkAnalysis task views describe models that can be represented by graphs/networks. Hence, the question is whether both task views have a clear profile, have enough value added, and can be cross-referenced where appropriate. Do you think this is the case here? Do you have any recommendations for how to deal with it?
  • Overlap in general: With increasing number of task views and increasing number of packages per task view, it becomes more important that task views have a sharp profile so that it is clear what should go in and what should stay out. While it is not necessary or desirable to avoid overlap completely, we should still try to not have too much overlap. First, less overlap means less duplication of efforts for the maintainers. Second, less overlap but with cross-references between task views means that users will ideally be pointed to one place with useful documentation for them.
  • Package descriptions: I agree that the package title/description is a useful starting point. However, you can probably often improve the description within the task view if you embed it into the context of the appropriate section. In any case, there probably is no "one size fits all" approach here which is why we put this at the maintainers' discretion.

@krivit
Copy link

krivit commented Apr 30, 2024

Apologies for the silence; down with COVID at the moment.

@krivit pushing to the main branch is okay, I have a local copy indexed by version being download after every commit.

@FATelarico, I don't think I have push access. I just tried it on a test branch.

Connection between NetworkAnalysis and GraphicalModels: Both the GraphicalModels task view, maintained by you, and the newly proposed NetworkAnalysis task views describe models that can be represented by graphs/networks. Hence, the question is whether both task views have a clear profile, have enough value added, and can be cross-referenced where appropriate. Do you think this is the case here? Do you have any recommendations for how to deal with it?

In my experience, the line between graphical models and network analysis that in graphical models (and neural network models, for that matter), the graph is a prespecified component of the model specification that does not depend on the data; whereas in network analysis the graph is the object being observed and summarised or modelled.

@hojsgaard
Copy link

In response to @krivit:

If you have a database / dataset and do a model search for a graphical model (as e.g. the gRim package can do) then I do believe the graph is not specified on beforehand? So this is perhaps not the best way ahead for discriminating between graphical models and network analysis.

A more general comment: In graphical models (at least traditionally), focus is on some kind of (conditional) independence restriction which is a probabilistic statement. A missing edge represent a conditional independence restriction. That is the classical connection between a graph and a probabilistic model. In larger models with many variables, the graphs become less interesting as visual objects. It is hard to make sense of a graph with 1000 variables :)

So in the distinction between graphical models and network analysis, one view is that it comes down to what is being analyzed? What is the key component in network analysis? Is that conditional independence? Is it another well defined mathematical / statistical concept? Perhaps, I can say it more directly: I am uncertain what network analysis really is...

In response to @zeileis:

You are right that small overlaps between task views are desirable but also that overlaps are unavoidable. Would it be feasible to have a package "belonging" primarily to one specific task view and then one can refer to that from any other task view?

In addition to standardizing the description of packages one thing that perhaps could be nice is to be able to automatically generate an "update history" for each package just to give people an idea about how active a package is maintained.

@krivit
Copy link

krivit commented Apr 30, 2024

@hojsgaard

If you have a database / dataset and do a model search for a graphical model (as e.g. the gRim package can do) then I do believe the graph is not specified on beforehand? So this is perhaps not the best way ahead for discriminating between graphical models and network analysis.

I am aware of this type of problem, but I didn't want to get too far into the weeds; the main distinction is that the graph is not the object of observation or analysis. I would classify this problem as a model selection problem for graphical models, rather than a network analysis problem.

However, if one then, as you say, tries to understand the properties of this graph, say by visualising it or by detecting groups of variables with similar structural roles in the graph, then it it becomes a network analysis problem. The tools one would use would often be agnostic to whether the graph represents friendships between people or conditional dependence between variables.

I know there are also other intermediate cases. For example, Frank and Strauss (1986) "Markov Graphs" specified a probability model for network structure by constructing a conditional dependence graph (i.e., a graphical model) for edge variables and then using Hammersley-Clifford Theorem to derive the form for the probability of a given graph under the model. This approach and its extensions were then used to infer social forces affecting the structure of the network ever since.

@FATelarico
Copy link
Author

@zeileis: Connection between NetworkAnalysis and GraphicalModels: Both the GraphicalModels task view, maintained by you, and the newly proposed NetworkAnalysis task views describe models that can be represented by graphs/networks. Hence, the question is whether both task views have a clear profile, have enough value added, and can be cross-referenced where appropriate. Do you think this is the case here? Do you have any recommendations for how to deal with it?

I think the issue in understanding the connection (and difference) between NetworkAnalysis and GraphicalModels is that the former offers tools that are not limited to 'describe statistical models as graphs/networks'. Rather, as @krivit pointed out (rightly, in my humble opinion), network analysis allows to deal with networks representing a/some connection/s between a/some defined set/s of entities. If the entities happen to be variables and the connection between them is conditional dependence (with independence being implied by lack of ties), then you get a graphical model. Obviously, Markovian graphs lie in somewhat of a gray area, but since they are covered in the GraphicalModels CTV, we are not dealing with them.


@krivit : Apologies for the silence; down with COVID at the moment.

Wish you a speedy recovery!

@krivit : I don't think I have push access

It should be fixed now. Let me know

@zeileis
Copy link
Contributor

zeileis commented May 1, 2024

Thanks for the clarifications @FATelarico @krivit @hojsgaard, I think this is very useful and something to build upon.

I suggest that you review the description of the scope of the NetworkAnalysis view to make it sharper with respect to this distinction. Also add a cross-reference to the GraphicalModels task view.

When the NetworkAnalysis task view is published, its scope description should be adapted correspondingly. Similarly, the first section ("Representation, manipulation and display of graphs") should be streamlined (with yet another cross-reference) once NetworkAnalysis is available.

@tuxette
Copy link
Contributor

tuxette commented May 4, 2024

Thank you for the interesting discussion. As @zeileis I think that, for readers, it is important that the distinction is clearly made at the beginning of both task views with cross references: that will help them identify which TV they have to read to answer their specific question.

Also, I share the view that GraphicalModels includes packages dealing with graphs as a way to represent some kind of conditional dependency structure between variables (nodes). For me, Markovian graphs and Bayesian networks are more in this task view than in NetworkAnalysis for instance but I agree that the distinction is not easy and clear to make (at least, that is where I would search for information on this topic). However, I am under the impression that a coordination between the two TV is necessary (maybe if you could find someone to be a maintainer of both TV, that would help).

Finally, just a minor comment: WGCNA deals with gene networks (co-expression networks actually), which is not really "biochemistry" (pure biology instead, even though, I agree that, in the end, everything is mostly chemistry, that is not how most people would think of it).

FATelarico added a commit to FATelarico/ctv-network that referenced this issue May 8, 2024
Renamed a section to Biology and (Bio)-Chemistry Networks in agreement to cran-task-views/ctv#61 (comment)
FATelarico added a commit to FATelarico/ctv-network that referenced this issue May 8, 2024
@FATelarico
Copy link
Author

@zeileis @tuxette
I suggest that you review the description of the scope of the NetworkAnalysis view to make it sharper with respect to this distinction. Also add a cross-reference to the GraphicalModels task view. I think that, for readers, it is important that the distinction is clearly made at the beginning of both task views with cross references: that will help them identify which TV they have to read to answer their specific question.

Thank you for your continued feedback. With the last two edits I improved on the following aspects:

Regarding coordination, I agree that it may be useful. Perhaps @hojsgaard could join our team if he feels okay with it.

@tuxette
Copy link
Contributor

tuxette commented May 13, 2024

@FATelarico : Thanks! you might also want to correct "Notably, the underlying data mining approach has been used beyond biochemistry."? (I was referring to this sentence actually.)

Regarding the rest, I think that it heads in the right direction. I also think that we should wait until the team of maintainers is completely set.

krivit added a commit to FATelarico/ctv-network that referenced this issue May 14, 2024
krivit added a commit to FATelarico/ctv-network that referenced this issue May 14, 2024
@krivit
Copy link

krivit commented May 14, 2024

I've drafted a dynamic network modelling subsection, though it does lead to more questions about how to organise things. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants