-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new little requested features #875
Conversation
Added the following features:
|
Tests need to be added |
Please provide test cases and make sure it works with current python and numpy. Thanks! |
I might have some test data but have to find some time. |
799ca0d
to
74f990f
Compare
@joachimwolff @bgruening please review |
for line in file.readlines(): | ||
if line.startswith('#'): | ||
continue | ||
_line = line.strip().split('\t') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the variable name _line
confusing. They are now elements or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent with the remaining project I used _line
. The same line of code can be found several times.
hicexplorer/hicCorrectMatrix.py
Outdated
@@ -548,6 +551,7 @@ def filter_by_zscore(hic_ma, lower_threshold, upper_threshold, perchr=False): | |||
to avoid introducing bias due to different chromosome numbers | |||
|
|||
""" | |||
print("filtering by z-score") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove? Or use logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logged
hicexplorer/hicCorrectMatrix.py
Outdated
@@ -658,9 +662,22 @@ def main(args=None): | |||
restore_masked_bins=False) | |||
|
|||
assert matrix_shape == ma.matrix.shape | |||
for idx in outlier_regions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
hicexplorer/hicCorrectMatrix.py
Outdated
with open(args.filteredBed, 'w') as f: | ||
for outlier_region in set(outlier_regions): | ||
interval = ma.cut_intervals[outlier_region] | ||
f.write('{}\t{}\t{}\t.\t{}\t.\n'.format(interval[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using an f-string here makes it much easier to read I suspect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used a plain join because the interval should anyways contain 4 elements
hicexplorer/hicCorrectMatrix.py
Outdated
# mask filtered regions | ||
ma.maskBins(outlier_regions) | ||
total_filtered_out = set(outlier_regions) | ||
print(outlier_regions, "Bins that are MAD outliers ({:.2f}%) " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
if x != y: | ||
count = sum(1 for a, b in zip(x, y) if a != b) | ||
if count > pDifference: | ||
equal = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you break here the entire loop, correct? than you can also return and remove the equal
altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to return true or false. Again copied this part from somewhere in the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return True, return False ;)
for line in file.readlines(): | ||
if line.startswith('#'): | ||
continue | ||
_line = line.strip().split('\t') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as my comment above
try: | ||
chrom, start, end = _line[:4] | ||
except ValueError: | ||
_line = line.strip().split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a line has less then 3 columns, this exception is raised, and then you are trying the same again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, first split was with tabs. This one with white spaces, if the input BED file is not tab seperated.
chrom, start, end = _line[:4] | ||
except ValueError: | ||
_line = line.strip().split() | ||
chrom, start, end, gene = _line[:4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the 4 wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should contain 4 columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a column, its really not clear that you are parsing here two different file-types.
requirements.txt
Outdated
@@ -23,6 +23,6 @@ future >= 0.18 | |||
tqdm >= 4.66 | |||
hyperopt >= 0.2.7 | |||
python-graphviz >= 0.20 | |||
scikit-learn >= 1.3.1 | |||
scikit-learn == 1.3.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pin looks to strict to me, is >=1,3,2,<1.4 better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, there is no other version between 1.3.2 and 1.4. I switched it to 1.3,1.4
setup.py
Outdated
@@ -116,7 +116,7 @@ def checkProgramIsInstalled(self, program, args, where_to_download, | |||
"tqdm >= 4.66", | |||
"hyperopt >= 0.2.7", | |||
"graphviz >= 0.20", | |||
"scikit-learn >= 1.3.1", | |||
"scikit-learn == 1.3.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here?
Hi there,
As a consequence a conda installation of 3.7.4 gives as version 3.7.3... |
flake8 . --exclude=.venv,.build,planemo_test_env,build --ignore=E501,F403,E402,F999,F405,E712
)py.test hicexplorer --doctest-modules
)