Some updates #27

gabrielgesteira · 2020-09-28T02:14:27Z

More information in commit messages.

Updating from mmollina

update DESCRIPTION

Updating from mmollina

Example rules: unwrapped: examples that run in < 4s \dontrun{}: just for missing software, APIs, etc \donttest{}: examples that take more than 4s Functions: - get_genomic_order: unwrapped (takes less than 1s) - sim_homologous: (already unwrapped) - extract_map: unwrapped (takes less than 1s) - print_mrk: unwrapped (takes less than 1s) - rev_map: unwrapped (takes less than 1s) - export_map_list: unwrapped (takes less than 1s); also changed example to avoid writing local files - drop_marker: unwrapped (takes less than 1s) - plot_genome_vs_map: unwrapped (takes less than 1s) - elim_redundant: (already unwrapped) - summary_maps: removed dontrun and formattable (is not a dependency) - segreg_poly: (already unwrapped - runs in less than 2s) - plot_map_list: changed from dontrun to donttest (takes more than 5s) - import_data_from_polymapR: changed from dontrun to donttest (polymapR is a suggested package) - filter_segregation: removed dontrun because it takes less than 1s - ls_linkage_phases: removed dontrun because it takes less than 3s - calc_genoprob_error: changed from dontrun to donttest - import_from_updog: changed from dontrun to donttest (updog is a suggested package) - update_map: removed dontrun and formattable (is not a dependency) - plot_mrk_info: removed dontrun (runs in less than 3s) - poly_cross_simulate: (already unwrapped - runs ins less than 1s) - read_geno_csv: changed from dontrun to donttest - check_data_sanity: changed from dontrun to donttest - add_marker: changed from dontrun to donttest - make_mat_mappoly: changed from dontrun to donttest - loglike_hmm: changed from dontrun to donttest - dist_prob_to_class: changed from dontrun to donttest - dist_prob_to_class: (duplicated) - import_phased_maplist_from_polymapR: changed from dontrun to donttest (polymapR is a suggested package) - dist_prob_to_class: (duplicated) - update_missing: changed from dontrun to donttest - cache_counts_twopt: changed from dontrun to donttest (also changed n.cores example from 8 to 1 due to problems when checking with --run-donttest) - read_geno: changed from dontrun to donttest - calc_genoprob: changed from dontrun to donttest - export_data_to_polymapR: changed from dontrun to donttest (polymapR is a suggested package) - get_submap: changed from dontrun to donttest - est_rf_hmm: changed from dontrun to donttest - mds_mappoly: changed from dontrun to donttest - split_and_rephase: changed from dontrun to donttest - est_rf_hmm_single: changed from dontrun to donttest - merge_datasets: changed from dontrun to donttest - calc_genoprob_dist: changed from dontrun to donttest - read_geno_prob: changed from dontrun to donttest - calc_prefpair_profiles: changed from dontrun to donttest - est_rf_hmm_sequential: changed from dontrun to donttest - merge_maps: changed from dontrun to donttest - est_full_hmm_with_global_error: changed from dontrun to donttest - calc_homoprob: changed from dontrun to donttest - make_pairs_mappoly: changed from dontrun to donttest - rf_list_to_matrix: changed from dontrun to donttest - group_mappoly: changed from dontrun to donttest - est_full_hmm_with_prior_prob: changed from dontrun to donttest - read_vcf: changed from dontrun to donttest - est_pairwise_rf: changed from dontrun to donttest - filter_missing: unwrapped (takes less than 1s) - make_seq_mappoly: changed from dontrun to donttest - rf_snp_filter: changed from dontrun to donttest There isn't any remaining function with \dontrun{} Check package tests for all \donttest{} examples. Processes can not spawn in multiple cores during this step. Then, the example below was removed from est_pairwise_rf. It wouldn't be possible to achieve the desired goals without multiple-core processing. Also, the tetraploid example was changed to include only the first chromosome. Removed example: ## Hexaploid example fl = "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/BT/sweetpotato_chr1.vcf.gz" tempfl <- tempfile(pattern = 'chr1_', fileext = '.vcf.gz') download.file(fl, destfile = tempfl) dat.dose.vcf = read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2") ## Filtering dataset by marker dat.filt.mrk <- filter_missing(input.data = dat.dose.vcf, type = "marker", filter.thres = 0.10, inter = FALSE) ## Filtering dataset by individual dat.filt.ind <- filter_missing(input.data = dat.filt.mrk, type = "individual", filter.thres = 0.10, inter = FALSE) ## Segregation test pval.bonf <- 0.05/dat.filt.ind$n.mrk mrks.chi.filt <- filter_segregation(dat.filt.ind, chisq.pval.thres = pval.bonf, inter = FALSE) seq.ch1<-make_seq_mappoly(mrks.chi.filt) plot(seq.ch1) ## will take ~ 19 min / peak of memory usage ~ 10GB all.pairs.1 <- est_pairwise_rf(input.seq = seq.ch1, ncpus = 7, verbose=TRUE) ## same thing, but it will take ~ 21 min / peak of memory usage ~ 6GB all.pairs.2 <- est_pairwise_rf(input.seq = seq.ch1, ncpus = 7, n.batch = 10, verbose=TRUE) plot(all.pairs.1, 161, 162) mat <- rf_list_to_matrix(all.pairs.1) plot(mat) Also removed this from function make_seq_mappoly (in make_seq.R): ## Making a sequence using the intersection between groups and genomic information s <- make_seq_mappoly(tetra.solcap, 'all') tpt <- est_pairwise_rf(input.seq = s, ncpus = 7) mat <- rf_list_to_matrix(tpt) grs <- group_mappoly(input.mat = mat, expected.groups = 12, comp.mat = FALSE) seq1 = make_seq_mappoly(grs, arg = 1, genomic.info = 1) Changed number of cores from 7 to 1 in function est_rf_hmm_sequential (in est_map_hmm.R) Changed number of cores from 7 to 1 in function make_pairs_mappoly (in make_pairs.R) Changed number of cores from 7 to 1 and arg="all" to arg="seq1" in function rf_list_to_matrix (in rf_list_to_matrix.R) In file import_from_polymapR.R, changed ncpus from 7 to 1 and replaced the following code: #### Reestimating recombination fractions using HMM cl <- parallel::makeCluster(5) parallel::clusterEvalQ(cl, require(mappoly)) parallel::clusterExport(cl, "mappoly.data") reest.maps <- parallel::parLapply(cl, mappoly.maplist, est_full_hmm_with_global_error, error = 0.05) parallel::stopCluster(cl) by this: reest.maps <- lapply(mappoly.maplist, est_full_hmm_with_global_error, error = 0.05) Also replaced this: cl <- parallel::makeCluster(5) parallel::clusterEvalQ(cl, require(mappoly)) parallel::clusterExport(cl, "mappoly.data") recons.maps <- parallel::parLapply(cl, MAPs, est_full_hmm_with_global_error, error = 0.05) parallel::stopCluster(cl) by this: recons.maps <- lapply(MAPs, est_full_hmm_with_global_error, error = 0.05) Another error: "T used instead of TRUE" Searched and fixed this kind of occurrence on all examples. Also changed number of cores in import_from_updog function example (in import_from_updog.R) and removed this section: mydata = import_from_updog(mout, filter.non.conforming = TRUE) mydata plot(mydata) Bug detected when running untested examples: - function plot.mappoly.homoprob (homolog_probs.R) line 131: changed if(lg=="all") to if(all(lg=="all")) to fix a problem when passing a vector of chromosomes as lg argument - function plot.mappoly.homoprob (homolog_probs.R): added verbose parameter - function plot.mappoly.homoprob (homolog_probs.R) lines 116 and 139: check for verbosity

gabrielgesteira and others added 5 commits September 25, 2020 17:45

Merge pull request #28 from mmollina/master

55dbb42

Updating from mmollina

Merge pull request #29 from mmollina/master

d55e080

update DESCRIPTION

Merge pull request #30 from mmollina/master

13249ab

Updating from mmollina

Merge branch 'master' of github.com:gabrielgesteira/MAPpoly

befe072

mmollina merged commit fe13bdf into mmollina:master Sep 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some updates #27

Some updates #27

gabrielgesteira commented Sep 28, 2020

Some updates #27

Some updates #27

Conversation

gabrielgesteira commented Sep 28, 2020