Skip to content

Commit

Permalink
Update pyvcf submodule to accept "sites-only" VCF
Browse files Browse the repository at this point in the history
  • Loading branch information
sbslee committed Jul 28, 2022
1 parent fe3500d commit f1af964
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 2 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Changelog
0.36.0 (in development)
-----------------------

* Update ``pyvcf`` submodule to accept "sites-only" VCF.

0.35.0 (2022-07-12)
-------------------

Expand Down
7 changes: 7 additions & 0 deletions data/vcf/3.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 100 . A "T,C" . . .
chr1 101 . G T . . .
chr2 1055 . T G . . .
chr2 3345 . A C . . .
chr2 5594 . T G . . .
16 changes: 14 additions & 2 deletions fuc/api/pyvcf.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,8 @@
do not contain the FORMAT column or sample-specific information. These are
called "sites-only" VCF files, and normally represent genetic variation that
has been observed in a large population. Generally, information about the
population of origin should be included in the header.
population of origin should be included in the header. Note that the pyvcf
submodule supports these sites-only VCF files as well.
There are several reserved keywords in the INFO and FORMAT columns that are
standards across the community. Popular keywords are listed below:
Expand Down Expand Up @@ -1577,6 +1578,8 @@ class VcfFrame:
"""
Class for storing VCF data.
Sites-only VCF files are supported.
Parameters
----------
meta : list
Expand Down Expand Up @@ -1624,7 +1627,16 @@ class VcfFrame:

def _check_df(self, df):
df = df.reset_index(drop=True)
df = df.astype(HEADERS)
headers = HEADERS.copy()
# Handle "sites-only" VCF.
if 'FORMAT' not in df.columns:
del headers['FORMAT']
if set(df.columns) != set(headers):
raise ValueError("The input appears to be a sites-only VCF "
"because it's missing the FORMAT column; "
"however, it contains one or more incorrect "
f"columns: {df.columns.to_list()}.")
df = df.astype(headers)
return df

def __init__(self, meta, df):
Expand Down
4 changes: 4 additions & 0 deletions test.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ def test_subset(self):
vf = vf.subset(['Sarah', 'John'])
self.assertEqual(len(vf.samples), 2)

def test_sites_only(self):
vf = pyvcf.VcfFrame.from_file(vcf_file3)
self.assertEqual(vf.shape, (5, 0))

class TestPybed(unittest.TestCase):

def test_intersect(self):
Expand Down

0 comments on commit f1af964

Please sign in to comment.