Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some updates #27

Merged
merged 5 commits into from
Sep 28, 2020
Merged

Some updates #27

merged 5 commits into from
Sep 28, 2020

Conversation

gabrielgesteira
Copy link
Contributor

More information in commit messages.

gabrielgesteira and others added 5 commits September 25, 2020 17:45
Example rules:
unwrapped: examples that run in < 4s
\dontrun{}: just for missing software, APIs, etc
\donttest{}: examples that take more than 4s

Functions:
- get_genomic_order: unwrapped (takes less than 1s)
- sim_homologous: (already unwrapped)
- extract_map: unwrapped (takes less than 1s)
- print_mrk: unwrapped (takes less than 1s)
- rev_map: unwrapped (takes less than 1s)
- export_map_list: unwrapped (takes less than 1s); also changed example to avoid writing local files
- drop_marker: unwrapped (takes less than 1s)
- plot_genome_vs_map: unwrapped (takes less than 1s)
- elim_redundant: (already unwrapped)
- summary_maps: removed dontrun and formattable (is not a dependency)
- segreg_poly: (already unwrapped - runs in less than 2s)
- plot_map_list: changed from dontrun to donttest (takes more than 5s)
- import_data_from_polymapR: changed from dontrun to donttest (polymapR is a suggested package)
- filter_segregation: removed dontrun because it takes less than 1s
- ls_linkage_phases: removed dontrun because it takes less than 3s
- calc_genoprob_error: changed from dontrun to donttest
- import_from_updog: changed from dontrun to donttest (updog is a suggested package)
- update_map: removed dontrun and formattable (is not a dependency)
- plot_mrk_info: removed dontrun (runs in less than 3s)
- poly_cross_simulate: (already unwrapped - runs ins less than 1s)
- read_geno_csv: changed from dontrun to donttest
- check_data_sanity: changed from dontrun to donttest
- add_marker: changed from dontrun to donttest
- make_mat_mappoly: changed from dontrun to donttest
- loglike_hmm: changed from dontrun to donttest
- dist_prob_to_class: changed from dontrun to donttest
- dist_prob_to_class: (duplicated)
- import_phased_maplist_from_polymapR: changed from dontrun to donttest (polymapR is a suggested package)
- dist_prob_to_class: (duplicated)
- update_missing: changed from dontrun to donttest
- cache_counts_twopt: changed from dontrun to donttest (also changed n.cores example from 8 to 1 due to problems when checking with --run-donttest)
- read_geno: changed from dontrun to donttest
- calc_genoprob: changed from dontrun to donttest
- export_data_to_polymapR: changed from dontrun to donttest (polymapR is a suggested package)
- get_submap: changed from dontrun to donttest
- est_rf_hmm: changed from dontrun to donttest
- mds_mappoly: changed from dontrun to donttest
- split_and_rephase: changed from dontrun to donttest
- est_rf_hmm_single: changed from dontrun to donttest
- merge_datasets: changed from dontrun to donttest
- calc_genoprob_dist: changed from dontrun to donttest
- read_geno_prob: changed from dontrun to donttest
- calc_prefpair_profiles: changed from dontrun to donttest
- est_rf_hmm_sequential: changed from dontrun to donttest
- merge_maps: changed from dontrun to donttest
- est_full_hmm_with_global_error: changed from dontrun to donttest
- calc_homoprob: changed from dontrun to donttest
- make_pairs_mappoly: changed from dontrun to donttest
- rf_list_to_matrix: changed from dontrun to donttest
- group_mappoly: changed from dontrun to donttest
- est_full_hmm_with_prior_prob: changed from dontrun to donttest
- read_vcf: changed from dontrun to donttest
- est_pairwise_rf: changed from dontrun to donttest
- filter_missing: unwrapped (takes less than 1s)
- make_seq_mappoly: changed from dontrun to donttest
- rf_snp_filter: changed from dontrun to donttest

There isn't any remaining function with \dontrun{}

Check package tests for all \donttest{} examples. Processes can not spawn in multiple cores during this step. Then, the example below was removed from est_pairwise_rf. It wouldn't be possible to achieve the desired goals without multiple-core processing. Also, the tetraploid example was changed to include only the first chromosome.
Removed example:

    ## Hexaploid example
    fl = "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/BT/sweetpotato_chr1.vcf.gz"
    tempfl <- tempfile(pattern = 'chr1_', fileext = '.vcf.gz')
    download.file(fl, destfile = tempfl)
    dat.dose.vcf = read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2")

    ## Filtering dataset by marker
    dat.filt.mrk <- filter_missing(input.data = dat.dose.vcf,
                                   type = "marker",
                                   filter.thres = 0.10,
                                   inter = FALSE)
    ## Filtering dataset by individual
    dat.filt.ind <- filter_missing(input.data = dat.filt.mrk,
                                   type = "individual",
                                   filter.thres = 0.10,
                                   inter = FALSE)
   ## Segregation test
   pval.bonf <- 0.05/dat.filt.ind$n.mrk
   mrks.chi.filt <- filter_segregation(dat.filt.ind,
                                       chisq.pval.thres =  pval.bonf,
                                       inter = FALSE)
   seq.ch1<-make_seq_mappoly(mrks.chi.filt)
   plot(seq.ch1)
   ## will take ~  19 min / peak of memory usage ~ 10GB
   all.pairs.1 <- est_pairwise_rf(input.seq = seq.ch1,
                                  ncpus = 7,
                                  verbose=TRUE)
   ## same thing, but it will take ~  21 min / peak of memory usage ~ 6GB
   all.pairs.2 <- est_pairwise_rf(input.seq = seq.ch1,
                                  ncpus = 7,
                                  n.batch = 10,
                                  verbose=TRUE)
    plot(all.pairs.1, 161, 162)
    mat <- rf_list_to_matrix(all.pairs.1)
    plot(mat)

Also removed this from function make_seq_mappoly (in make_seq.R):
     ## Making a sequence using the intersection between groups and genomic information
     s <- make_seq_mappoly(tetra.solcap, 'all')
     tpt <- est_pairwise_rf(input.seq = s,
                            ncpus = 7)
    mat <- rf_list_to_matrix(tpt)
    grs <- group_mappoly(input.mat = mat,
                         expected.groups = 12,
                         comp.mat = FALSE)
    seq1 = make_seq_mappoly(grs, arg = 1, genomic.info = 1)

Changed number of cores from 7 to 1 in function est_rf_hmm_sequential (in est_map_hmm.R)
Changed number of cores from 7 to 1 in function make_pairs_mappoly (in make_pairs.R)
Changed number of cores from 7 to 1 and arg="all" to arg="seq1" in function rf_list_to_matrix (in rf_list_to_matrix.R)

In file import_from_polymapR.R, changed ncpus from 7 to 1 and replaced the following code:
  #### Reestimating recombination fractions using HMM
  cl <- parallel::makeCluster(5)
  parallel::clusterEvalQ(cl, require(mappoly))
  parallel::clusterExport(cl,  "mappoly.data")
  reest.maps <- parallel::parLapply(cl, mappoly.maplist,
                                    est_full_hmm_with_global_error,
                                    error = 0.05)
  parallel::stopCluster(cl)

by this:

 reest.maps <- lapply(mappoly.maplist,
                      est_full_hmm_with_global_error,
                      error = 0.05)

Also replaced this:

  cl <- parallel::makeCluster(5)
  parallel::clusterEvalQ(cl, require(mappoly))
  parallel::clusterExport(cl,  "mappoly.data")
  recons.maps <- parallel::parLapply(cl, MAPs,
                                     est_full_hmm_with_global_error,
                                     error = 0.05)
  parallel::stopCluster(cl)

by this:

 recons.maps <- lapply(MAPs,
                       est_full_hmm_with_global_error,
                       error = 0.05)

Another error: "T used instead of TRUE"
Searched and fixed this kind of occurrence on all examples.

Also changed number of cores in import_from_updog function example (in import_from_updog.R) and removed this section:
 mydata = import_from_updog(mout, filter.non.conforming = TRUE)
 mydata
 plot(mydata)

Bug detected when running untested examples:
- function plot.mappoly.homoprob (homolog_probs.R) line 131: changed if(lg=="all") to if(all(lg=="all")) to fix a problem when passing a vector of chromosomes as lg argument
- function plot.mappoly.homoprob (homolog_probs.R): added verbose parameter
- function plot.mappoly.homoprob (homolog_probs.R) lines 116 and 139: check for verbosity
@mmollina mmollina merged commit fe13bdf into mmollina:master Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants