Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a function to create splice sites similar to create_introns #220

Merged
merged 8 commits into from
Jul 4, 2023

Conversation

Juke34
Copy link
Contributor

@Juke34 Juke34 commented Jun 29, 2023

I needed a function to create splice sites features easily. I propose a create_splice_sites function to do it similar to the create_introns function.
Maybe not optimal because it loops twice (one for the left splice site and once for the right splice site) over the interfeatures function but I didn't succeed to make it work differently.
The loop occurs before

for child in child_gen():
                exons = self.children(
                    child, level=1, featuretype=exon_featuretype, order_by="start"
                )

@daler
Copy link
Owner

daler commented Jul 1, 2023

Thanks. Can you add tests (at the bottom of gffutils/test/test.py) for this? You can run the tests locally with pytest. Still working out how to approve GitHub Actions to run on PRs from a fork, so this is not happening automatically.

@Juke34 Juke34 mentioned this pull request Jul 4, 2023
gffutils/interface.py Outdated Show resolved Hide resolved
gffutils/interface.py Outdated Show resolved Hide resolved
gffutils/interface.py Outdated Show resolved Hide resolved
@Juke34
Copy link
Contributor Author

Juke34 commented Jul 4, 2023

Hi, I didn't succeed to write a proper test. I'm not proficient enough in pytest.
I wanted to use test/data/gff_example1.gff3 as input.
The expected output is:

chr1	ensGene	gene	4763287	4775820	.	-	.	Name=ENSMUSG00000033845;ID=ENSMUSG00000033845;Alias=ENSMUSG00000033845;gid=ENSMUSG00000033845
chr1	ensGene	mRNA	4764517	4775779	.	-	.	Name=ENSMUST00000045689;Parent=ENSMUSG00000033845;ID=ENSMUST00000045689;Alias=ENSMUSG00000033845;gid=ENSMUSG00000033845
chr1	ensGene	CDS	4775654	4775758	.	-	0	Name=ENSMUST00000045689.cds0;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.cds0;gid=ENSMUSG00000033845
chr1	ensGene	CDS	4772761	4772814	.	-	0	Name=ENSMUST00000045689.cds1;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.cds1;gid=ENSMUSG00000033845
chr1	ensGene	exon	4775654	4775779	.	-	.	Name=ENSMUST00000045689.exon0;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.exon0;gid=ENSMUSG00000033845
chr1	ensGene	exon	4772649	4772814	.	-	.	Name=ENSMUST00000045689.exon1;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.exon1;gid=ENSMUSG00000033845
chr1	ensGene	exon	4767606	4767729	.	-	.	Name=ENSMUST00000045689.exon2;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.exon2;gid=ENSMUSG00000033845
chr1	ensGene	exon	4764517	4764597	.	-	.	Name=ENSMUST00000045689.exon3;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.exon3;gid=ENSMUSG00000033845
chr1	ensGene	five_prime_UTR	4775759	4775779	.	-	.	Name=ENSMUST00000045689.utr0;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.utr0;gid=ENSMUSG00000033845
chr1	ensGene	three_prime_UTR	4772649	4772760	.	-	.	Name=ENSMUST00000045689.utr1;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.utr1;gid=ENSMUSG00000033845
chr1	ensGene	three_prime_UTR	4767606	4767729	.	-	.	Name=ENSMUST00000045689.utr2;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.utr2;gid=ENSMUSG00000033845
chr1	ensGene	three_prime_UTR	4764517	4764597	.	-	.	Name=ENSMUST00000045689.utr3;Parent=ENSMUST00000045689;ID=ENSMUST00000045689.utr3;gid=ENSMUSG00000033845
chr1	gffutils_derived	three_prime_cis_splice_site	4764598	4764599	.	-	.	Name=ENSMUST00000045689.exon2,ENSMUST00000045689.exon3;Parent=ENSMUST00000045689;ID=three_prime_cis_splice_site_ENSMUST00000045689.exon2-ENSMUST00000045689.exon3;gid=ENSMUSG00000033845
chr1	gffutils_derived	three_prime_cis_splice_site	4767730	4767731	.	-	.	Name=ENSMUST00000045689.exon1,ENSMUST00000045689.exon2;Parent=ENSMUST00000045689;ID=three_prime_cis_splice_site_ENSMUST00000045689.exon1-ENSMUST00000045689.exon2;gid=ENSMUSG00000033845
chr1	gffutils_derived	three_prime_cis_splice_site	4772815	4772816	.	-	.	Name=ENSMUST00000045689.exon0,ENSMUST00000045689.exon1;Parent=ENSMUST00000045689;ID=three_prime_cis_splice_site_ENSMUST00000045689.exon0-ENSMUST00000045689.exon1;gid=ENSMUSG00000033845
chr1	gffutils_derived	five_prime_cis_splice_site	4767604	4767605	.	-	.	Name=ENSMUST00000045689.exon2,ENSMUST00000045689.exon3;Parent=ENSMUST00000045689;ID=five_prime_cis_splice_site_ENSMUST00000045689.exon2-ENSMUST00000045689.exon3;gid=ENSMUSG00000033845
chr1	gffutils_derived	five_prime_cis_splice_site	4772647	4772648	.	-	.	Name=ENSMUST00000045689.exon1,ENSMUST00000045689.exon2;Parent=ENSMUST00000045689;ID=five_prime_cis_splice_site_ENSMUST00000045689.exon1-ENSMUST00000045689.exon2;gid=ENSMUSG00000033845
chr1	gffutils_derived	five_prime_cis_splice_site	4775652	4775653	.	-	.	Name=ENSMUST00000045689.exon0,ENSMUST00000045689.exon1;Parent=ENSMUST00000045689;ID=five_prime_cis_splice_site_ENSMUST00000045689.exon0-ENSMUST00000045689.exon1;gid=ENSMUSG00000033845

The interesting features are the 6 last lines.

I didn't find any test for create_introns to get inspired from...
Could you help?

@daler
Copy link
Owner

daler commented Jul 4, 2023

I bet you weren't doing anything wrong -- I just found out that gffutils/test/test.py has not been running since porting tests from nosetests to pytest! Renaming the file did the trick. I added the new test in there, and I'll merge into 0.12rc branch.

@daler daler changed the base branch from master to v0.12rc July 4, 2023 21:43
@daler daler merged commit e480e11 into daler:v0.12rc Jul 4, 2023
@daler
Copy link
Owner

daler commented Jul 4, 2023

No need for this now, but I wonder if iterating over the exons in pairs (e.g. with itertools.pairwise or similar) would make this and create_introns more efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants