Performance improvement: record-batching #779
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a third performance PR, following after #765 and #774.
This affects all Miller-supported file formats. Some performance numbers for CSV, on a Mac laptop:
cat
: 2.423 secondscat
before this PR: 4.175 secondscat
with this PR: 3.106 secondscat
with this PR andmlr -S
: 1.873 secondsNote that the last number is the runtime we'll have after the WIP JIT branch is PR'ed and merged. At that point, single-verb performance will be noticeably better than Miller 5. Furthermore, for multi-verb then-chains, the performance improvement will be even more noticeable, since Miller <= 5 (in C) is single-core whereas Miller 6 (in Go) is multicore.
See also new files in the
scripts
directory on this PR which help for doing measurements of this sort.