Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The future of deep review #810

Open
agitter opened this issue Jan 19, 2018 · 64 comments
Open

The future of deep review #810

agitter opened this issue Jan 19, 2018 · 64 comments

Comments

@agitter
Copy link
Collaborator

agitter commented Jan 19, 2018

We resubmitted version 0.10 to the Journal of the Royal Society Interface and bioRxiv. Thanks everyone for all the great work on the revisions!

I'd like input on where we want to go from here. Should we continue to accept edits to the manuscript even after a version is accepted at a journal? Should we accept only errata corrections but lock the main content?

I don't want to dissolve this amazing group of authors. However, there isn't much precedent for living manuscripts that continue to change after publication, and realistically we are all very busy with other projects. The activity dropped off considerably between the first submission and the reviewer feedback.

@cgreene
Copy link
Member

cgreene commented Jan 20, 2018

I think that if we have a committed group of maintainers there is the opportunity to do something new here in the way of a living scientific manuscript that stays up to date with the field. However, we probably need more than me and @agitter to make it sustainable. Does anybody else have an interest in helping to contribute at the maintainer level?

@cgreene
Copy link
Member

cgreene commented Jan 20, 2018

One quick thought. We should probably be talking more about what we do after v1.0, which I imagine would be the accepted version at the journal. At this point I feel like we should push to that finish line. 😄

@agitter
Copy link
Collaborator Author

agitter commented Jan 20, 2018

We should probably be talking more about what we do after v1.0

Agreed. I think we should only take pull requests on obvious typos until v1.0.

@agapow
Copy link
Contributor

agapow commented Jan 22, 2018

Agreed - hit the (immediate) finish line and then worry about the future.

And when we get to that future, a few discussion points or ideas:

  • Is github the right place?
  • Is building on the paper the right place?
  • Should it be a monolithic paper (or whatever) or broken into subjects?
  • Confess that for this last stretch, I found it very hard to know what was going on. Which is not a dig at anyone, just that a blizzard of issues and alerts was hard to navigate or prioritize. Possibly more experienced github users have some insight or solution?

@evancofer
Copy link
Collaborator

I could be interested in this. It could help to define expectations for maintainer roles, but those obviously depend on a lot of other variables.

Similarly to @agapow , I felt that keeping up with the notifications was sometimes like drinking water from a fire hose. I think this was due to the intermixing of notifications related to content (i.e. tracking new references and writing) and infrastructure (e.g. administrative, repository code, formatting).

I also wonder if github is the best place for this sort of thing, or if such a platform exists.

Lastly, given the size of the paper, does shifting to a different format - one that is designed to organize information on a grander scale (e.g. a book) - make more sense long-term?

@stephenra
Copy link
Collaborator

@cgreene @agitter I realize I'm a bit late to the conversation but fwiw, would definitely be interested in helping in a maintainer function or role.

@evancofer To your pt. about inundation, was the Projects feature (or something similar with GH integration like Trello) used to track progress? I wonder if that might be one way of making the different workstreams a little more manageable and organizing issues based on the topic or sub-topic.

@evancofer
Copy link
Collaborator

@stephenra AFAIK there weren't any project management tools (e.g. Trello, Asana, GH Projects) used. I would guess that @cgreene lab had some internal tracking of general project status, but that probably isn't too useful for our purposes. Enabling contributors to easily subscribe to notifications for one or a few sections/topics could be useful.

@agitter
Copy link
Collaborator Author

agitter commented Feb 5, 2018

@stephenra we used milestones within GitHub and labels (usually, not always) for some form of organization. We also ended up having a few master issues to track progress and link to related issues at various project stages (e.g. #188 and #678 ). @cgreene and I didn't really have any formal internal tracking beyond that, and I'd be open to better organization if other maintainers join in to keep this going.

@stephenra
Copy link
Collaborator

stephenra commented Feb 7, 2018

@evancofer @agitter Thanks for clarifying! I'm tool-agnostic but under the working assumption that this continues to grow in scope, it may be helpful to adopt one (my past experience with Trello and GH Projects have been overall positive but admittedly Kanban board-style project management tools aren't for everyone). Is there an estimation of roughly how many maintainers would be needed to keep the project going?

@agitter
Copy link
Collaborator Author

agitter commented Feb 7, 2018

@stephenra I'd say the number of maintainers depends on what exactly we want to sustain. Is it an up-to-date manuscript or book? A curated list of papers, tools, and data? Something else?

@stephenra
Copy link
Collaborator

@agitter Makes sense. And apologies, I realized I'm getting ahead of the conversation given the immediate focus on v1.0.

@agitter
Copy link
Collaborator Author

agitter commented Feb 8, 2018

@stephenra This is actually a good time to have the conversation while we still have contributors' attention after the recent round of revisions. Let us know if you have more thoughts about what form the future product or project should have.

@cgreene
Copy link
Member

cgreene commented Feb 8, 2018

I agree that now is a good time to figure this out. In terms of tooling, our lab has used waffle.io for other projects and found it useful. I think the same things that it has helped us organize could aid the maintainers in planning what to include.

I also think we'd be breaking new ground on authorship, but I like the idea of a "release" occurring either every 6 or 12 months (from our own experience, I think 12 months is more reasonable). If there were project participants who would like to lead each of those releases, I think the authorship categories could accommodate a reshuffling of the author list on each release (we could stick "maintainers of previous versions" in a category that doesn't shuffle to the last positions - those could be "maintainers of the current version"). Maybe JRSI would like to publish an annual update for a few years, or maybe we could talk with other journals about future releases (imagine a Jan 2019 release date for the next one...).

If any journals are interested, feel free to chime in :)

Anyway, these are just some thoughts.

@benstaf
Copy link

benstaf commented Feb 11, 2018

If you want to move on to another collaborative paper in deep learning for medicine, try:

“DiversityNet: a collaborative benchmark for generative AI models in chemistry”

@cgreene
Copy link
Member

cgreene commented Feb 11, 2018

@mostafachatillon : thanks for raising that. It might be more appropriate to raise this as a new issue since your point doesn't relate directly to the future of this paper.

Also, note that your blog post has an inaccuracy. You say:

But for writing the DiversityNet paper, GitHub is an inappropriate tool, because GitHub does not natively support math formulas.
and you link github/markup#897 (comment).

That is related to GitHub's native system for displaying markdown. Deep-review doesn't use that. It may also be the case that manubot, the build system for deep-review doesn't yet support formulas. However, if that's the case you should correct the inaccurate link in your blog post.

@stephenra
Copy link
Collaborator

stephenra commented Feb 21, 2018

@agitter @cgreene Apologies for the lapse in response.

I agree that now is a good time to figure this out. In terms of tooling, our lab has used waffle.io for other projects and found it useful. I think the same things that it has helped us organize could aid the maintainers in planning what to include.

Agreed on the tooling. I've heard good things about waffle.io and had some success with Asana and Trello, which both integrate with GitHub as well. I'm not particularly opinionated on this so I would imagine whichever platform most contributors feel comfortable with or offers the lowest bar to access is the best way to go. I'd be happy to set up a survey, if that helps.

Apart from GitHub issues, I've found it helpful and more easily manageable in tracking todos and PRs to batch issues by categories (rather than just tags). I'm not sure if the lab(s) adopted this approach but, for example, the different application domains/sub-domains in the paper could be a natural way to think of structuring these categories (e.g. gene expression vs. protein secondary and tertiary structure, etc.).

I also think we'd be breaking new ground on authorship, but I like the idea of a "release" occurring either every 6 or 12 months (from our own experience, I think 12 months is more reasonable).

I favor the idea of 12 month release as well. It gives time to account for difficulties in scheduling and coordination for contributors and, given the speed of the field, it also provides time to understand a broader range of contributions and distinguish what might be meaningful work vs. flag-planting.

@evancofer
Copy link
Collaborator

evancofer commented Mar 9, 2018

@cgreene @agitter @stephenra
A yearly release sounds feasible and reasonable.

I have used Asana and Trello in the past, and I am comfortable with using both. Tentatively, I would lean towards Asana because it seemed to be (at least to me) more flexible and feature-rich than Trello. However, I am not particularly familiar with integrating either of them with GitHub. Is there a way to use any of these project management tools in an "open" manner that allows people to view the current project status without necessarily signing up for an Asana/Trello/Whatever account and so on? At least with respect to content reviews and discussion, it is probably important to maintain this project's transparency.

Obviously, the immediate goal is to finish the initial publication. The next step is to identify and enumerate the specific maintenance tasks, especially those that the current team needs the most help with. With regards to planning for long-term progress, it would also be useful to list any goals/problems that have come up but were too ambitious or not pressing enough for the initial publication.

Thoughts?

@stephenra
Copy link
Collaborator

stephenra commented Mar 11, 2018

@evancofer I believe you can make Asana projects 'public' but this only makes the project viewable to others who are part of your organization but not necessarily a team member (as opposed to anyone, in general).

On the other hand, Trello you can make publicly viewable to anyone and the project page will be indexed on Google. I do agree on the pt. about transparency -- to this end, I've worked on or seen some projects that use some combination of GitHub, Trello, and Gitter. The code/repo is on GitHub, the (public) project management is handled by Trello, and the community and chat is on Gitter. If that's too much added complexity, perhaps GitHub and Trello might be best.

@evancofer
Copy link
Collaborator

evancofer commented Mar 11, 2018

@stephenra Trello and GitHub seems like a good solution without too much added complexity. I'm thinking we could use Trello to track maintenance etc and keep discussion on GitHub (and use continue to use issue labels and other features to track and organize).

@stephenra
Copy link
Collaborator

@evancofer That sounds reasonable to me. 👍

@cgreene
Copy link
Member

cgreene commented Mar 12, 2018

If you have not played around with https://waffle.io I would encourage you to give it a shot. I made a deep-review waffle. It is an overlay on github issues, so it's convenient to work with in this framework:
https://waffle.io/greenelab/deep-review

At this stage, I think we really need 2-3 committed maintainers to develop a new plan, update the README with the plan, and then start to take over the project with the goal of releasing a new release at some point in 2019.

@cgreene
Copy link
Member

cgreene commented Mar 12, 2018

I went through all issues up to 100 and I closed them if we had referenced the paper or if it was a discussion that had concluded.

@evancofer
Copy link
Collaborator

evancofer commented Mar 12, 2018

@stephenra @cgreene The waffle.io view on the project should work fine.

Like Casey said, we should probably find some more committed maintainers interested in long-term work if this is going to be successful. Contributors were obviously a good place to start, but I am unsure where to search next?

I'll get working on an update to the README and submit a PR sometime this evening. This will probably include a status update and a new section about the future of the project.

@cgreene
Copy link
Member

cgreene commented Mar 12, 2018

It might be nice to think about an authorship model where people "decay" towards the middle after a release. The current set of authors would be the "middlest set" of the next release (unless they contribute) and new authors would bookend them. I'd imagine maintainers at the end with the other author classes on the front.

If people understand how these items will be handled, it might help to draw in new contributors. I'm also happy to promote the work towards a 2019 release, and I'll even commit to a bit of writing (though at this time I'd prefer not to be a maintainer 😄 ). It sounds like @evancofer and @stephenra might be interested. Maybe you could snag a third so that votes are resolved via tiebreak, although @agitter and I did survive the pairing.

@evancofer
Copy link
Collaborator

It does seem prudent to get a third person. Most of the people that come to mind are in my current lab or department, so - out of fairness - I am somewhat hesitant to recommend any of them.

It may be best (in terms of ethics and effort) to, as you say, append them in a semi-randomized order. Perhaps we could do this at the end of every month (or some other period of time)? I imagine this could incentivize repeat contributions. Perhaps it would be useful to use a semi-random hierarchical grouping again? Was manually determining author hierarchies time consuming or maintanable?

@agitter
Copy link
Collaborator Author

agitter commented Mar 12, 2018

I agree that it is important to think about authorship, how new contributors will be recognized and incentivized, and what will happen to the existing contributors in a v2.0 release. We can break precedent with the v1.0 author ordering algorithm if that makes it easier to continue deep review in the long term. I wouldn't expect to be kept in my current position if new maintainers take over, and I do see myself more as a standard contributor than a core maintainer for the next release.

However, if you don't find a third maintainer, I'd be willing to help with tie-breaking in special circumstances.

Was manually determining author hierarchies time consuming or maintanable?

We only did this twice, so it wasn't too onerous. We also kept the categories broad to help. It did require considerable manual effort because we reviewed commits as well as contributors' discussion in issues and pull requests. I was initially working toward fully automating the author randomization but stopped once Manubot because a separate project. The deep review author ordering was too specific to this collaborative process.

A fully automated ordering for Manubot should probably take some unordered author yaml with whatever extra metadata is needed for ordering, sort the authors, and pass the sorted list to Manubot as metadata.yaml.

@rgieseke
Copy link
Contributor

Interesting discussion! The guidelines for the "Living Data" paper on the Global Carbon Budget might be useful: https://www.globalcarbonproject.org/carbonbudget/governance.htm#gov2.5

The dataset and paper with Carbon Emissions are updated yearly, but the paper (and data) stay partly the same. The practices with regards to authorship might be very different there due to nature of the project and different fields, but the comments on citation recommendation and "self-plagiarism" seem worth considering ...

@agitter
Copy link
Collaborator Author

agitter commented Mar 22, 2018

Interesting parallel @rgieseke

Going forward, we could also bring more organization to the issues. Would adopting help wanted or good first issue labels help would-be contributors find a place to start?

@evancofer
Copy link
Collaborator

evancofer commented Mar 22, 2018

@agitter that seems like a good idea. I feel like summarizing or discussing any of the articles with an issue might make good first issues.

Also, revisions seem like the central focus of the second release, so creating issues for said revisions (e.g. #847) is probably a good way to elicit meaningful contributions. Perhaps we could do this for each section we felt needed work? Some of the the existing issues (e.g. #598) could also be broken down by subsection into more manageable tasks. I'm not sure about the best way to do this however, and it could just result in too many issues.

@stephenra
Copy link
Collaborator

@agitter @evancofer Yes, I think those would be useful labels to have as well. I've gone ahead and created labels for help wanted and good first issue.

I like the idea of creating issues for each section. I think in terms of management, that structure lends itself to being more easily identifiable/accessible for would-be contributors. From a gut feeling, I think breaking down into different subsections might slowly lead to issue creep as you pointed out.

@nafizh
Copy link

nafizh commented Mar 27, 2018

@evancofer @stephenra If you still need a third maintainer, I would be happy to help. This has been a great work, and I would be happy to contribute to the future versions.

@evancofer
Copy link
Collaborator

@nafizh Yes, your help would be greatly appreciated!

@nafizh
Copy link

nafizh commented Mar 29, 2018

@evancofer Is there an explanation for the labels? I understand most of them are self-explanatory, but I am confused about some of them, for example, paper, treat, study or next.

@agitter
Copy link
Collaborator Author

agitter commented Mar 29, 2018

@nafizh some of the labels come from https://waffle.io/greenelab/deep-review We may want to feature that more prominently in the readme so that it isn't buried in this thread.

manuscript was used for issues about the paper structure, like standardizing whitespace or defining acronyms.

paper and review denote that the issue is for a single manuscript, either a research paper or a review. They haven't been applied consistently.

categorize, discussion, study, and treat pertain to the major sections of our manuscript, and which of them are relevant to the topic or discussion in the issue.

supervised, semi-supervised, and unsupervised weren't used much. They were more relevant for a different vision for organizing the paper that we moved away from. I suggest that we delete these.

The new maintainers should feel welcome to change the label organization.

@cgreene
Copy link
Member

cgreene commented Mar 29, 2018

I agree with @agitter that a reexamination of the labels would be in order. I'll note only that review used to be used to refer to a review paper, but waffle.io uses it to denote something that is under review. My inclination at this stage would be to delete the paper label and allow waffle to use review as it likes. Then the default issues would generally be papers, and labels could be used more consistently to help new people find issues that are primarily discussions.

@evancofer
Copy link
Collaborator

I agree with @agitter that we should delete the supervised, semi-supervised, and unsupervised labels.

I also think we should assume that, unless otherwise marked, an issue corresponds to a paper. Some divisions that come to mind are: community discussion/feedback and project updates, build/orchestration issues, and content revisions?

To revisit our earlier discussion of issue prioritization, I think some good labels might be: help wanted, high priority, low priority, good first issue, and in progress. I also think it could be useful to denote the scale of an issue with something like wide scope or narrow scope, but it may be more productive to make all the issues smaller and more manageable.

@stephenra
Copy link
Collaborator

stephenra commented Mar 29, 2018

Thanks @evancofer. Agree with @agitter @cgreene as well. I'm OK deleting those labels (supervised, etc.). The help wanted and good first issue I created a few days ago and priority labels are always a good idea. Would an in progress label necessarily overlap with the waffle.io In Progress board?

@cgreene
Copy link
Member

cgreene commented Mar 29, 2018

I'd recommend using the waffle labels for state where provided - it'll mean that things will look nice on the waffle regardless of how those labels get assigned - so sounds like that one doesn't need to be added

@stephenra
Copy link
Collaborator

Thanks @cgreene.

For priority labels, I'd prefer having at least three (e.g. Priority: High, Priority: Medium, Priority: Low). Thoughts @evancofer @nafizh?

@evancofer
Copy link
Collaborator

@stephenra Yes, that is a better syntax.

@stephenra
Copy link
Collaborator

stephenra commented Mar 29, 2018

@evancofer @nafizh Great. The priority labels are now all added.

@evancofer
Copy link
Collaborator

What can we do to drum up more discussion in the issues? Aggregating new articles on deep learning is essential, but I get the feeling that we will lose momentum without continued and consistent contributions in the form of discussion and writing as well. Perhaps we should set some very easily attainable goals for activity/discussion/editing contributions? I am currently reviewing and drafting some bits on the various genomics sections (e.g. splicing, variant calling, sequencing), but I realize that there are many other sections of the review that may need attention.

@cgreene
Copy link
Member

cgreene commented Aug 15, 2018

From my experience the path is to start writing + get some small wins in (adjustments to specific sections, etc). Then we can start tweeting about those to build more momentum. If the community is active and the topic remains of interest (as I suspect this one does), I think that's what it'll take. Right now it's unclear if the project is really alive or not, which may make it hard to draw in contributors.

@agitter
Copy link
Collaborator Author

agitter commented Aug 15, 2018

I completely agree with @cgreene. If momentum is restored, it could also help to use issues to recruit contributors to work on specific small sections that need updates. (Though I tried this with #847 and it didn't go anywhere.)

A minor idea would be to rebrand the review. We could adopt the style used in database papers (e.g. The UCSC Genome Browser database: 2018 update) and add : 2019 update or something similar to the title. That may help contributors feel like they are contributing to an active project instead of maintaining something that has already been published.

@cgreene
Copy link
Member

cgreene commented Aug 15, 2018

Major +1 for adding : 2019 update to the title!

@jmschrei
Copy link

I have been following this for a while and I'd like to contribute but I'm not sure what the best way to is. I'm more than happy to link my papers (or those I stumble upon) and discuss them. However, I'm not sure what the goal of discussion in the issues are. Is it to talk about our thoughts / critiques on the methods or to discuss how to best incorporate it into the paper?

@agitter
Copy link
Collaborator Author

agitter commented Aug 15, 2018

@jmschrei both are goals of the issues. @evancofer has some recent examples (e.g. #886) of discussing and critiquing methods. The intent is that this helps us decide what we want to say about a paper if/when we add it to the review.

I'm also proposing that using issues could help restart the writing effort by opening a discussion topic, discussing what should be written, and then making a pull request. For example, there have been several new methods about autoencoders for single-cell RNA-seq data. I could open an issue that notes we only reference two of these in the current review. Then we could re-assesses the state of the subarea with an updated assessment of what has been done well and what challenges remain. Ideally other contributors would help provide relevant papers and form a consensus opinion. We haven't had many issues of this type yet, but I'm hoping it could help re-engage our contributors (past and future).

@evancofer
Copy link
Collaborator

@agitter I agree that this is probably the optimal next step.

@baoilleach
Copy link

I think you've already made your plans but just checking that you are aware of https://www.livecomsjournal.org/, the Living Journal of Computational Molecular Science (c.f. @davidlmobley).

@agitter
Copy link
Collaborator Author

agitter commented Sep 28, 2018

Thanks for the pointer @baoilleach. We did see the Living Journal of Computational Molecular Science and reference it in our manuscript on the Manubot system for collaborative writing https://greenelab.github.io/meta-review/

We'd be happy to discuss that platform versus Manubot more with you and @davidlmobley, but I suggest taking that conversation to a new issue in https://github.com/greenelab/meta-review

@bachev
Copy link

bachev commented Jan 25, 2019

Just in case: the 2019 edition shouldn't be published without mentioning alphafold for structure prediction by Google/Alphabet. They simply crushed everyone else as a first time entrant to the CASP protein structure prediction competition.

@agitter
Copy link
Collaborator Author

agitter commented Jan 25, 2019

@bachev it's definitely relevant. Please open a new issue in this repo if you'd like to discuss AlphaFold. It may be hard for us to write too much about it until they have a complete description of the method. For now, this blog post and comments from Jinbo are the most informative.

@vd4mmind
Copy link

vd4mmind commented Dec 2, 2019

Hi all,

A phenomenal read, and thanks for all the amazing work. Just in case if this work is still possible to enrich/enhance? If so, can we include knowledge from some of the publications in Section of Single Cell concerning Deep Learning published in 2019? Some papers I feel I did not see there (in-case can be added) that have already laid some new additional information with DL and single cell:

  1. Eraslan, G., Avsec, Ž., Gagneur, J. et al. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet 20, 389–403 (2019) doi:10.1038/s41576-019-0122-6

  2. Deep learning does not outperform classical machine learning for cell-type annotation
    Niklas D. Köhler, Maren Büttner, Fabian J. Theis
    bioRxiv 653907; doi: https://doi.org/10.1101/653907 ( I feel to an extent anchoring in Seurat via transfer learning did bring in a lot of value, however, this space of annotation has still more scope of development)

  3. Eraslan, G., Simon, L.M., Mircea, M. et al. Single-cell RNA-seq denoising using a deep count autoencoder. Nat Commun 10, 390 (2019) doi:10.1038/s41467-018-07931-2

  4. Interpretable factor models of single-cell RNA-seq via variational autoencoders
    Valentine Svensson, Lior Pachter
    bioRxiv 737601; doi: https://doi.org/10.1101/737601

Some of my personal notes are here but I have not developed them past few months in single-cell space and some mere ML/DL info:

  1. Blog link1

  2. Blog link 2

  3. Blog Link 3

For now, these are the above that came to my mind to enrich the single-cell space in the current review if possible. I did see mention of Autoencoders as well by @agitter . Will be happy to contribute if possible and if all agree to what I have proposed for now. (PS: let me know if the above information fits in the scope here).

Kind regards,

Vivek

@agitter
Copy link
Collaborator Author

agitter commented Dec 2, 2019

Thanks for the suggestions @vd4mmind. Those topics are certainly in scope, and there has been a lot of recent work in the area that is not covered in the current version of the review. However, it's unclear how much actual updating there will be to this review. We've found that we need to have good editors/reviewers lined up if we're going to make major additions or revisions so that pull requests don't languish.

My latest thoughts are that we should drop the "2019 update" part of the title and say something instead about this being the living version or post-publication update. Ideally we would also have a better way to show a rich diff of what has changed since publication (manubot/rootstock#54), a more dynamic way of adding authors frequently (#959), and a preferred way for readers to refer to the specific version they read or cited.

Finally, to help keep track of papers, we've been opening one issue per paper. The format in #940 shows an example with the title in the issue title, abstract quoted, and DOI link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests