Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constructor interpartition_merges & fixes/small improvements #13

Merged
merged 14 commits into from
Nov 5, 2022

Conversation

krynju
Copy link
Member

@krynju krynju commented Oct 2, 2022

WIP: trying out some things on constructors to make them faster/more reliable

Findings/features

  1. materialization before ingesting the input avoids the huge allocation numbers that scale with input length
# pre change
julia> @time DTable(CSV.File("test.csv"), 12000)

  0.141167 seconds (4.92 M allocations: 123.415 MiB)
DTable with 84 partitions
Tabletype: NamedTuple

# post change

julia> @time DTable(CSV.File("test.csv"), 12000)

  0.021110 seconds (8.09 k allocations: 48.421 MiB)
DTable with 84 partitions
Tabletype: NamedTuple
  1. interpartition merges
julia> DTable(CSV.Chunks("test.csv"), 70_000) |> DTables.chunk_lengths
15-element Vector{Int64}:
 70000
 70000
 70000
 70000
 70000
 70000
 70000
 70000
 70000
 70000
 70000
 70000
 70000
 70000
 20000

julia> DTable(CSV.Chunks("test.csv"), 70_000, interpartition_merges=false) |> DTables.chunk_lengths
16-element Vector{Int64}:
 62495
 62502
 62505
 62493
 62505
 62501
 62492
 62507
 62495
 62502
 62505
 62493
 62505
 62501
 62492
 62507

julia> DTable(CSV.Chunks("test.csv"), 50_000) |> DTables.chunk_lengths
20-element Vector{Int64}:
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000
 50000

julia> DTable(CSV.Chunks("test.csv"), 50_000, interpartition_merges=false) |> DTables.chunk_lengths
32-element Vector{Int64}:
 50000
 12495
 50000
 12502
 50000
 12505
 50000
 12493
 50000
 12505
 50000
 12501
     
 12505
 50000
 12493
 50000
 12505
 50000
 12501
 50000
 12492
 50000
 12507
  1. getting columns throgh properties works now and returns a lazy DTableColumn iterator
    it can be used for efficiently iterating through the column and it has a specialized collect function defined that uses the Tables.getcolumn implementation that efficiently links all column pieces into a SentinelArray
julia> d.a |> collect
100000-element SentinelArrays.ChainedVector{Int64, Vector{Int64}}:
 1
 2
 0
 1
 2
 0
 1
 
 1
 2
 0
 1
 2
 0
 1

julia> d.a |> sum
100000
  1. docs updates

@codecov-commenter
Copy link

codecov-commenter commented Oct 2, 2022

Codecov Report

Base: 90.25% // Head: 92.49% // Increases project coverage by +2.23% 🎉

Coverage data is based on head (9472c23) compared to base (dea0da6).
Patch coverage: 96.92% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #13      +/-   ##
==========================================
+ Coverage   90.25%   92.49%   +2.23%     
==========================================
  Files          10       10              
  Lines         852      879      +27     
==========================================
+ Hits          769      813      +44     
+ Misses         83       66      -17     
Impacted Files Coverage Δ
src/table/dtable_column.jl 97.50% <66.66%> (-2.50%) ⬇️
src/table/dtable.jl 93.27% <98.07%> (+9.23%) ⬆️
src/table/dataframes_interface_utils.jl 92.72% <100.00%> (ø)
src/table/gdtable.jl 76.27% <100.00%> (ø)
src/table/operations.jl 98.48% <100.00%> (ø)
src/table/tables.jl 100.00% <100.00%> (+11.95%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@krynju krynju force-pushed the kr/constructor_experiments_1 branch from 476c339 to 528d152 Compare October 8, 2022 10:06
@krynju krynju changed the title [WIP] Experimenting with constructor improvements Constructor interpartition_merges & fixes/small improvements Oct 9, 2022
@krynju krynju marked this pull request as ready for review October 9, 2022 18:01
@krynju krynju enabled auto-merge (squash) November 5, 2022 12:54
@krynju krynju disabled auto-merge November 5, 2022 12:54
@krynju krynju merged commit 91576e1 into main Nov 5, 2022
@krynju krynju deleted the kr/constructor_experiments_1 branch November 5, 2022 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants