-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Insights: apache/datafusion
Overview
Could not load contribution data
Please try again later
52 Pull requests merged by 21 people
-
Fix Bug in Display for ScalarValue::Struct
#12856 merged
Oct 10, 2024 -
Fix panic on wrong number of arguments to substr
#12837 merged
Oct 10, 2024 -
Fix clippy error on wasmtest
#12844 merged
Oct 10, 2024 -
Fix: approx_percentile_cont_with_weight Panic
#12823 merged
Oct 10, 2024 -
Improve description of function migration
#12743 merged
Oct 9, 2024 -
Fix convert_to_state bug in
GroupsAccumulatorAdapter
#12834 merged
Oct 9, 2024 -
Make HashJoinExec::join_schema public
#12807 merged
Oct 9, 2024 -
Support creating tables via SQL with
FixedSizeList
column (e.g.a int[3]
)#12810 merged
Oct 9, 2024 -
Retry apt-get and rustup on CI
#12714 merged
Oct 9, 2024 -
[logical-types] use Scalar in Expr::Logical
#12793 merged
Oct 9, 2024 -
Add
PartitionEvaluatorArgs
toWindowUDFImpl::partition_evaluator
#12804 merged
Oct 9, 2024 -
Add Aggregation fuzzer framework
#12667 merged
Oct 9, 2024 -
Remove unused dependencies and features
#12808 merged
Oct 9, 2024 -
Bump cookie and express in /datafusion/wasmtest/datafusion-wasm-app
#12825 merged
Oct 9, 2024 -
Clarify documentation on ArrowBytesMap and ArrowBytesViewMap
#12789 merged
Oct 9, 2024 -
Chore: Move
aggregate statistics
optimizer test from core to optimizer crate#12783 merged
Oct 9, 2024 -
Minor: add documentation note about
NullState
#12791 merged
Oct 9, 2024 -
[logical-types] fix conflicts
#12820 merged
Oct 9, 2024 -
API from
ParquetExec
toParquetExecBuilder
#12799 merged
Oct 8, 2024 -
Remove unnecessary
DFSchema::check_ambiguous_name
#12805 merged
Oct 8, 2024 -
Refactor
DependencyMap
andDependencies
into structs#12761 merged
Oct 8, 2024 -
Minor: clean up TODO comments in unnest.slt
#12795 merged
Oct 8, 2024 -
Fix bug in TopK aggregates
#12766 merged
Oct 8, 2024 -
[logical-types] update working branch
#12812 merged
Oct 8, 2024 -
Remove redundant aggregate/window/scalar function documentation
#12745 merged
Oct 8, 2024 -
Minor: add README to Catalog Folder
#12797 merged
Oct 8, 2024 -
Add union sorting equivalence end to end tests
#12721 merged
Oct 8, 2024 -
Minor: improve docs on MovingMin/MovingMax
#12790 merged
Oct 8, 2024 -
Introduce Signature::String and return error if input of
strpos
is integer#12751 merged
Oct 8, 2024 -
Minor: clarify comment about empty dependencies
#12786 merged
Oct 8, 2024 -
Account for constant equivalence properties in union, tests
#12562 merged
Oct 7, 2024 -
Migrate documentation for all string functions from scalar_functions.md to code
#12775 merged
Oct 7, 2024 -
fix: Correct results for grouping sets when columns contain nulls
#12571 merged
Oct 7, 2024 -
Transformed::new_transformed: Fix documentation formatting
#12787 merged
Oct 7, 2024 -
feat: add support for Substrait ExtendedExpression
#12728 merged
Oct 7, 2024 -
Upgrade arrow/parquet to
53.1.0
/ fix clippy#12724 merged
Oct 7, 2024 -
Port / Add Documentation for
VarianceSample
andVariancePopulation
#12742 merged
Oct 7, 2024 -
Fix stack overflow calculating projected orderings
#12759 merged
Oct 6, 2024 -
Improve
round
scalar function unparsing for Postgres#12744 merged
Oct 6, 2024 -
Fix unnest conjunction with selecting wildcard expression
#12760 merged
Oct 6, 2024 -
Allow boolean Expr simplification even when nullable
#12746 merged
Oct 6, 2024 -
Fix
equal_to
inByteGroupValueBuilder
#12770 merged
Oct 6, 2024 -
Minor: doc how field name is to be set for
WindowUDF
#12757 merged
Oct 5, 2024 -
fix
equal_to
inPrimitiveGroupValueBuilder
#12758 merged
Oct 5, 2024 -
Add
DocumentationBuilder::with_standard_argument
to reduce copy/paste#12747 merged
Oct 5, 2024 -
Apply
type_union_resolution
to array and values#12753 merged
Oct 5, 2024 -
Minor: Update string tests for strpos
#12739 merged
Oct 4, 2024 -
Provide field and schema metadata missing on cross joins, and union with null fields.
#12729 merged
Oct 4, 2024 -
Fix misformatted links on project index page
#12750 merged
Oct 4, 2024 -
Simplify streaming_merge function parameters
#12719 merged
Oct 4, 2024 -
Minor: avoid clone while calculating union equivalence properties
#12722 merged
Oct 4, 2024 -
Add IMDB(JOB) Benchmark [2/N] (imdb queries)
#12529 merged
Oct 3, 2024
30 Pull requests opened by 20 people
-
feat: support inner iejoin
#12754 opened
Oct 4, 2024 -
WIP: move SMJ join filtered part out of join_output stage. LeftOuter experiment
#12764 opened
Oct 4, 2024 -
Move equivalence fuzz testing to fuzz test binary
#12767 opened
Oct 4, 2024 -
Support DictionaryString for Regex matching operators
#12768 opened
Oct 4, 2024 -
Parquet binary
#12777 opened
Oct 6, 2024 -
Wordsmith project description
#12778 opened
Oct 6, 2024 -
Dynamic filter pushdown to probe side
#12781 opened
Oct 7, 2024 -
Update substrait requirement from 0.42 to 0.44
#12782 opened
Oct 7, 2024 -
Implement special min/max accumulator for Strings and Binary (10% faster for Clickbench Q28)
#12792 opened
Oct 7, 2024 -
fix(substrait): remove optimize calls from substrait consumer
#12800 opened
Oct 7, 2024 -
[WIP] Impl byte view column
#12809 opened
Oct 8, 2024 -
Fix: handle NULL input in lead/lag window function
#12811 opened
Oct 8, 2024 -
Introduce `binary_as_string` parquet option
#12816 opened
Oct 8, 2024 -
WIP: Generate docs from macros.
#12822 opened
Oct 8, 2024 -
feat(substrait): add intersect support to consumer
#12830 opened
Oct 9, 2024 -
Improve AggregationFuzzer error reporting
#12832 opened
Oct 9, 2024 -
Cleanup TODO in recursive unnest
#12836 opened
Oct 9, 2024 -
Minor: Small comment changes in sql folder
#12838 opened
Oct 9, 2024 -
Support struct coercion in `type_union_resolution`
#12839 opened
Oct 10, 2024 -
Crypto Function Migration
#12840 opened
Oct 10, 2024 -
Add DuckDB struct test and row as alias
#12841 opened
Oct 10, 2024 -
TEST for allocation strategy in min/max accumulator
#12845 opened
Oct 10, 2024 -
Macro for creating record batch from literal slice
#12846 opened
Oct 10, 2024 -
Improve `AggregateFuzz` testing
#12847 opened
Oct 10, 2024 -
Minor: more doc to `MemoryPool` module
#12849 opened
Oct 10, 2024 -
Make PruningPredicate's rewrite public
#12850 opened
Oct 10, 2024 -
Fix: handle NULL offset of NTH_VALUE window function
#12851 opened
Oct 10, 2024 -
[logical-types] add NativeType and LogicalType
#12853 opened
Oct 10, 2024 -
Migrate documentation for all core functions from scalar_functions.md to code
#12854 opened
Oct 10, 2024 -
wip: Convert `BuiltInWindowFunction::{Lead, Lag}` to a user defined window function
#12857 opened
Oct 10, 2024
31 Issues closed by 8 people
-
Bug in `Display` for `ScalarValue::Struct`
#12855 closed
Oct 10, 2024 -
Stack overflow with LEAD and LAG functions
#12731 closed
Oct 10, 2024 -
Panic when an invalid expression in GROUP BY clause (SQLancer)
#12699 closed
Oct 10, 2024 -
CI: Clippy error on `datafusion-wasmtest`
#12842 closed
Oct 10, 2024 -
Panic in scalar function `approx_percentile_cont_with_weight` (SQLancer)
#12716 closed
Oct 10, 2024 -
Physical optimizers cannot rewrite `HashJoinExec` due to `join_schema` being private
#12806 closed
Oct 9, 2024 -
Create fixed size list table with syntax <type name>[<length>]
#10303 closed
Oct 9, 2024 -
`Setup rust toolchain` build step is flaky
#12713 closed
Oct 9, 2024 -
Add `PartitionEvaluatorArgs` to `WindowUDFImpl::partition_evaluator`
#12803 closed
Oct 9, 2024 -
Change `MaybeNullBufferBuilder` into `Option<usize>` in `GroupColumn` code
#12826 closed
Oct 9, 2024 -
The Eq method in HashAggregate takes up a lot of time, how to optimize it
#1456 closed
Oct 8, 2024 -
Easier way to convert between `ParquetExec` and `ParquetExecBuilder`
#12737 closed
Oct 8, 2024 -
Probable bug in TopKAggregate
#12748 closed
Oct 8, 2024 -
Add README to catalog folder
#12796 closed
Oct 8, 2024 -
Add `groupby` to dataframe API
#12696 closed
Oct 8, 2024 -
SanityChecker rejects certain valid `UNION` plans
#12446 closed
Oct 8, 2024 -
Migrate documentation for all string functions from scalar_functions.md to code
#12774 closed
Oct 7, 2024 -
Incorrect results when using grouping sets with data containing nulls
#12570 closed
Oct 7, 2024 -
Arrow update to 53.1.0 caused clippy issues with missing and deprecated methods and test failures
#12780 closed
Oct 7, 2024 -
Bug with csv type inference
#3174 closed
Oct 7, 2024 -
Casting from Binary --> Utf8 to evaluate `LIKE` slows down some ClickBench queries
#12509 closed
Oct 7, 2024 -
Stack overflow calling `EqvalenceProperties::project` with certain order
#12700 closed
Oct 6, 2024 -
Add insert_or_update and get_payloads methods to binary_map/binary_view_map
#12594 closed
Oct 6, 2024 -
unnest errors in conjunction with `SELECT *`
#12684 closed
Oct 6, 2024 -
Improve boolean simplifications for nullable fields
#12769 closed
Oct 6, 2024 -
Doc how "name" for returned field is to be set
#12756 closed
Oct 5, 2024 -
Filter pushed down too far
#12762 closed
Oct 4, 2024 -
Field Metadata Lost on UNION
#12735 closed
Oct 4, 2024 -
Field Metadata Lost on CROSS JOIN
#12734 closed
Oct 4, 2024 -
Fix misformatted links on project index page
#12749 closed
Oct 4, 2024
25 Issues opened by 16 people
-
CSVReader behavior with dataset that has duplicate column headers is confusing
#12852 opened
Oct 10, 2024 -
Regression on coercing Array of Structs
#12843 opened
Oct 10, 2024 -
Migrate documentation for all crypto functions from scalar_functions.md to code
#12828 opened
Oct 9, 2024 -
Migrate documentation for aggregate functions from aggregate_functions.md to code
#12827 opened
Oct 9, 2024 -
Epic: Ordered Set Aggregate Functions
#12824 opened
Oct 9, 2024 -
[DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench
#12821 opened
Oct 8, 2024 -
Access children `DataType` or return-type in `ScalarUDFImpl::invoke`
#12819 opened
Oct 8, 2024 -
Parse real number literals as the Decimal type
#12817 opened
Oct 8, 2024 -
Panic in `nth_value` window function (SQLancer)
#12815 opened
Oct 8, 2024 -
Panic in `simplify_expressions` optimizer rules when running an aggregate query (SQLancer)
#12814 opened
Oct 8, 2024 -
Release DataFusion 42.1.0
#12813 opened
Oct 8, 2024 -
Convert `BuiltInWindowFunction::{Lead, Lag}` to a user defined window function
#12802 opened
Oct 8, 2024 -
Migrate documentation for all core functions from scalar_functions.md to code
#12801 opened
Oct 8, 2024 -
Substrait: roundtrip_logical_plan shouldn't optimize plans
#12798 opened
Oct 7, 2024 -
Unnest struct expression can't be aliased
#12794 opened
Oct 7, 2024 -
Performance: Add "read strings as binary" option for parquet
#12788 opened
Oct 7, 2024 -
Looser coupling with tokio
#12784 opened
Oct 7, 2024 -
`datafusion-query-cache` - caching intermediate results for faster repeated queries
#12779 opened
Oct 6, 2024 -
Support setseed function
#12776 opened
Oct 6, 2024 -
`plan_to_sql` produces incorrect SQL for optimised `Aggregate` plans
#12773 opened
Oct 5, 2024 -
add support for reading csv with variable number of columns
#12772 opened
Oct 5, 2024 -
Implement special Groups for StringViews
#12771 opened
Oct 5, 2024 -
Optimize Parquet PageIndex by Reusing StringView Prefix
#12755 opened
Oct 4, 2024 -
String sqllogictest error when running the test with `complete`
#12752 opened
Oct 4, 2024
41 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Skip partial aggregation based on the cardinality of hash value instead of group values
#12697 commented on
Oct 7, 2024 • 17 new comments -
Convert `rank` / `dense_rank` and `percent_rank` builtin functions to UDWF
#12718 commented on
Oct 10, 2024 • 15 new comments -
feat(udf): POC faster min max accumulator
#12677 commented on
Oct 9, 2024 • 3 new comments -
Minor: add flags for temporary ddl
#12561 commented on
Oct 6, 2024 • 3 new comments -
Limit nested loop join record batch size
#12634 commented on
Oct 4, 2024 • 1 new comment -
Support DictionaryString for Regex matching operators
#12618 commented on
Oct 3, 2024 • 0 new comments -
Improve error message for invalid aggregate queries
#12006 commented on
Oct 10, 2024 • 0 new comments -
Implement fast min/max accumulator for binary / strings (now it uses the slower path)
#6906 commented on
Oct 10, 2024 • 0 new comments -
Enable `datafusion.execution.parquet.schema_force_string_view` by default
#11682 commented on
Oct 10, 2024 • 0 new comments -
Update `REGEXP_LIKE` scalar function to support Utf8View
#11910 commented on
Oct 10, 2024 • 0 new comments -
[EPIC] Decouple logical from physical types
#12622 commented on
Oct 10, 2024 • 0 new comments -
Simple Functions
#12635 commented on
Oct 10, 2024 • 0 new comments -
[EPIC] Schema metadata handling / bugs
#12733 commented on
Oct 10, 2024 • 0 new comments -
ASOF join support / Specialize Range Joins
#318 commented on
Oct 10, 2024 • 0 new comments -
Fix parse `'1'::interval` as month by default
#11454 commented on
Oct 8, 2024 • 0 new comments -
feat: Add MaxTime type for a Time that returns the max on aggregation
#11755 commented on
Oct 8, 2024 • 0 new comments -
Better multi-column aggregation support with StringView
#11794 commented on
Oct 6, 2024 • 0 new comments -
Add additional regexp function regexp_count()
#12080 commented on
Oct 10, 2024 • 0 new comments -
Enable reading `StringViewArray` by default from Parquet
#12092 commented on
Oct 9, 2024 • 0 new comments -
feat: Implement grouping function using grouping id
#12704 commented on
Oct 8, 2024 • 0 new comments -
Update hashbrown requirement from 0.14.5 to 0.15.0
#12710 commented on
Oct 9, 2024 • 0 new comments -
[Epic] Make DataFusion a reliable foundation for building query engines
#12723 commented on
Oct 4, 2024 • 0 new comments -
Range/inequality joins are slow
#8393 commented on
Oct 4, 2024 • 0 new comments -
Convert `BuiltInWindowFunction::CumeDist` to a user defined window function
#12695 commented on
Oct 7, 2024 • 0 new comments -
Any plan to support JSON or JSONB?
#7845 commented on
Oct 7, 2024 • 0 new comments -
[Epic] Complete Initial `StringView` in DataFusion
#11752 commented on
Oct 7, 2024 • 0 new comments -
Document DataFusion Threading (and how to separate IO and CPU bound work)
#12393 commented on
Oct 7, 2024 • 0 new comments -
Optimized version of `SortPreservingMerge` that doesn't actually compare sort keys of the key ranges are ordered
#10316 commented on
Oct 8, 2024 • 0 new comments -
Execution plan creation performance degradation oppose to 0.40.0
#12738 commented on
Oct 8, 2024 • 0 new comments -
Release DataFusion 43.0.0
#12470 commented on
Oct 8, 2024 • 0 new comments -
Running tests uses 50.1GB on Ubuntu
#11105 commented on
Oct 8, 2024 • 0 new comments -
[Epic] Unify `WindowFunction` Interface (remove built in list of `BuiltInWindowFunction` s)
#8709 commented on
Oct 8, 2024 • 0 new comments -
Incorrect NULL handling in `lead` window function (SQLancer)
#12717 commented on
Oct 8, 2024 • 0 new comments -
Implement physical optimizer rule for common subexpression elimination
#12599 commented on
Oct 8, 2024 • 0 new comments -
[Proposal] Decouple logical from physical types
#11513 commented on
Oct 9, 2024 • 0 new comments -
Row groups are read out of order or with completely different values
#10572 commented on
Oct 9, 2024 • 0 new comments -
[EPIC] Automatically generate all function content from code
#12740 commented on
Oct 9, 2024 • 0 new comments -
[EPIC] Improvements to GroupColumn multi-column aggregation performance
#12680 commented on
Oct 9, 2024 • 0 new comments -
Aggregation fuzz testing
#12114 commented on
Oct 9, 2024 • 0 new comments -
Optimize `lower()/upper()` string function with ASCII fast path
#12365 commented on
Oct 9, 2024 • 0 new comments -
Unify the error handling for the RecordBatchStream
#12641 commented on
Oct 10, 2024 • 0 new comments