-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hdf5 bootstrap representation #44
Comments
that's cool but if you do hdf5 <- paste0(path.expand(path), "/", h5file) if bootstraps are found, summarize them...if (bootstraps > 0) { it's pretty easy to handle them the way they are Also if you plot a few of the bootstraps their error typically (not Statistics is the grammar of science. On Tue, May 19, 2015 at 3:33 PM, Martin Morgan [email protected]
|
|
Hi guys, Thanks for the input. @mtmorgan I don't see a super compelling reason (other than it being more visually appealing) to use the left padded numbers, and as @ttriche this is why we store When we were developing kallisto, I did some non-comprehensive testing and there was little to no difference in storing them in vectors vs storing them in a huge matrix and reading into R. The existing C code is much simpler as currently written. Which direction do you want to subset the HDF5 in (e.g. by transcript or bootstrap round)? BTW, I'm actually on vacation right now, but when I return next week I'll post the code we have for reading bootstraps and aggregating kallisto results (this is all part of the |
I guess I'm just scared by having my chromosomes sort chr1, chr10, instead of chr1, chr2, ... One can easily imagine thinking that the results of the third bootstrap were interesting, but then confusing bs2 with bs3 with bs10, all of which are 'third' in some sense (if the main consumer of these files is envisioned to be R, then it would also make sense to number from 1 rather than 0). I think one would often access by bootstrap for some purposes (e.g., normalization-like activities) and by transcript for others (e.g., drilling down on 'interesting' results); this two-dimensional access and the homogeneity of data type makes one think of a matrix, rather than list-of-vectors. It also seems less error-prone to rely on existing software to subset an hdf5 matrix in hdf5, rather than to come up with ad hoc solutions (like here) for sub-setting a list-of-vectors in a client language. A matrix would also make padding bootstrap ids moot. Obviously these are not earth-shaking features, so please ignore if you like... |
Bootstrap names left-padded with zeros bs00001, etc, have a natural sort order, currently these sort as bs0, bs1, bs10, bs11, ..., bs2, bs20, ...
/bootstrap as a 2-dimensional array would be more readily subset, especially by id. This would seem to be a common use case.
The text was updated successfully, but these errors were encountered: