Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(#741): Pretty print #742

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

andreasprlic
Copy link
Member

@andreasprlic andreasprlic commented Jun 10, 2024

This (draft) PR is an attempt to add a context-visualization framework on top of the tooling that we already have in hgvs. Unless somebody has a better suggestion for a name, for now I called it "pretty_print".

Features:

  1. It provides a datacompiler class that merges together all the data needed for the visualization. (no need for uta_align here). This could be used e.g. by a future effort to developer alternative visualisations such as SVG graphics, or hooked up on a web site.
  2. For now there is only a text-based rendering framework that has been added here. It offers a set of elemental "renderer" objects that are responsible for providing the text for one line in the final text.
  3. There are several unit tests that contain variants with various features. They are also the reason why this PR is still only in draft. The fixture that gets compiled for this unit test (tests/cache-py3.hdp) becomes really big and some suggestions for what to do about this would be helpful. Perhaps disable most of the tests? Or offer an ipython runbook with examples? Any feedback/suggestion is welcome here.
  4. There's a configuration class where some parameters around the display can be modified
  5. On a different thread we were discussing improvements around repeats. I wrote a basic repeat-detection script, and added the result of that as a FYI to the visualization here too, so we can see how well this works.

Examples:

In [1]: hgvs_g = "NC_000005.10:g.123346517_123346518insATTA"

In [2]: var_g = parse(hgvs_g)

In [3]: print(pretty.display(var_g))
hgvs_g    : NC_000005.10:g.123346517_123346518insATTA
hgvs_c    : NM_001166226.1:c.*1_*2insTAAT
hgvs_p    : NP_001159698.1:p.?
         :   123,346,500         123,346,520         123,346,540
chrom pos :   |    .    |    .    |    .    |    .    |    .
seq    -> : ATAAAGCTTTTCCAAATGTTATTAATTACTGGCATTGCTTTTTGCCAA
region    :                     |------|
tx seq <- : TATTTCGAAAAGGTTTACAATAATTAATGACCGTAACGAAAAACGGTT
tx pos    :  |    .    |    .   |   |    .    |    .    |
         :  *20       *10      *1  2880      2870      2860
aa seq <- :                      TerAsnSerAlaAsnSerLysAlaLeu
aa pos    :                         |||            ...
         :                         960
ref>alt   : ATTA[2]>ATTA[3]

Here also a screenshot how on the terminal the text can be color coded:

Screenshot 2024-06-30 at 22 53 19

…ranscripts, new option to show reverse-chrom strand sequence.
# Conflicts:
#	tests/data/cache-py3.hdp
@andreasprlic andreasprlic marked this pull request as ready for review July 1, 2024 05:54
@andreasprlic andreasprlic requested a review from a team as a code owner July 1, 2024 05:54
@andreasprlic
Copy link
Member Author

Note: this PR adds two new unit tests. They are pulling a lot of data from UTA and seqrepo, and as such they are slow. Should I tag them and exclude from the CI runs? Also, I did not want to upload an updated cache file for these tests, because the size of the test cache would have jumped from 1MB to ~600MB.

Copy link

github-actions bot commented Aug 1, 2024

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Aug 1, 2024
@davmlaw
Copy link
Contributor

davmlaw commented Aug 1, 2024

Wow, that's pretty neat and it's impressive you were able to do all of that without bringing in any dependencies (other than test framework)

From a very quick skim, it looks like you define your own data classes and do a lot of your own maths - I guess the worry would be if HGVS code is wrong, or yours is, or they're just different, then it would render differently than what's internally going on under the hood. Maybe there's no other way to do it, not sure - I'll try and block out some time next week to look over it

@github-actions github-actions bot removed the stale Issue is stale and subject to automatic closing label Aug 2, 2024
@andreasprlic
Copy link
Member Author

andreasprlic commented Aug 12, 2024

Yes, the datacompilerclass is doing some calculations, but it is pretty much based around the internals of the hgvs library.

One idea is that there could be other renderers that are not text based and that could e.g. create SVG graphics, or other types of visualizations. To enable this, this introduces a model-view-control pattern:

The view (renderer) should be decoupled from the data. Therefore the newly added models here are basic dataclasses to store the knowledge for one specific position in the final display. The renderer classes are using these dataclasses to print the output. For the controller (datacompiler), the goal is to stitch together all the data around the context of a variant and store the results in the models.

One addition to this is the repeats class. This is derived from a conversation ( #113 ) about improving the support in the hgvs library for variants in repetitive regions. The current core of the library won't be easily able to detect repeats in my opinion, but the fully-justified data calculations that are happening as part of the datacompiler make it relatively straightforward to identify basic repeats (that are not interrupted). I found it insightful for my own purposes to add this to the provided output in pretty_print and as such this ended up in here.

@reece
Copy link
Member

reece commented Aug 26, 2024

From 8/26 maintainers' meeting: @andreasprlic: yes, please tag to exclude CI testing.

Also, we decided to keep the similar function https://github.com/biocommons/hgvs/blob/main/src/hgvs/utils/context.py for now. If someone cares, we can combine functionality later.

Copy link
Member

@reece reece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing. It's way better than src/hgvs/utils/context.py. Thank you.

I have a minor suggestion/request for discussion: Having hgvs.pretty_print and hgvs.pretty as siblings seems a bit confusing, and leads to (IMO) oddities like the renderer importing from a pretty_print.py.

Instead, I think a better structure might be to:

  • put everything under pretty
  • rename renderer to console
  • put the color constants under console

I think that will create a better separate of the backend logic (immediately under pretty/) and the console renderer (under pretty/console/).

What say you?

@reece reece linked an issue Sep 2, 2024 that may be closed by this pull request
Copy link

gitguardian bot commented Sep 2, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
7486813 Triggered PostgreSQL Credentials d4f2ac4 docs/contributing.rst View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@andreasprlic
Copy link
Member Author

... with the most recent changes, I believe we are ready for another round of reviews.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RefSeq only... to handle Ensembl maybe you should look at something else?

Not sure if can test c_pos - or will have to look in transcript data to see if it has CDS start

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point. That means this check should be in the data compiler and not here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Context view improvements
3 participants