Tuple.collectors() should be able to share state #344

lukaseder · 2019-02-12T08:35:31Z

The current Tuple.collectors() implementation is a canonical one, where collectors are completely independent of one another. It would be better if they could share state in case they're related. For example, when calculating a set of percentiles:

var percentiles =
Stream.of(1, 2, 3, 4, 10, 9, 3, 3).collect(
  Tuple.collectors(
    Agg.<Integer>percentile(0.0),
    Agg.<Integer>percentile(0.25),
    Agg.<Integer>percentile(0.5),
    Agg.<Integer>percentile(0.75),
    Agg.<Integer>percentile(1.0)
  )
);
 
System.out.println(percentiles);

It would be great if the 5 collectors could somehow share their internal state, which is a sorted list of all values in the stream. Executing 1 sort instead of 5 is definitely preferrable.

This can be implemented in 2 ways:

Optimising for specific collector usage
Optimising this in general

The text was updated successfully, but these errors were encountered:

rwperrott · 2021-01-13T11:49:37Z

BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R1> sf1

could be declared as a sub-class of BiFunction, like:

@FunctionalInterface
public interface Tuple2ListMapping<T,U,R> 
extends BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R> {}

allowing much shorter column declarations, like

Tuple2ListMapping<T,U,R> sf1

I don't think that column source sharing needs to be complex, because id binding is simpler and is probably safe than introspection guesses:

Specify each result column with a String id.
The first result column with a unique id, causes a base collector to be created, wrapped in a collector which stores the base collector finisher result, andThen returns the first column result, with this wrapper collector mapped by id inside a context object, possibly Seq. This could allow a separate map for Window, assuming it has a subset Seq, so prevent out-of-context id use.
All the later result columns with a used id would have secondary collectors created by the primary collector, using the primary saved base collector finisher result as input for it's finisher function.
If other than a list type collector needs to be supported (e.g. for non-percentile statistics), base collector result type validation would be needed to be before secondary collector creation, with all id columns specifying the required type.

Something like this may work (not tested yet):

public class IdCollector<T,A,R,RR> implements Collector<T,A,RR> {
    final Collector<T,A,R> baseCollector;
    final Function<R,RR> primaryFinisher;
    private R r; // baseCollector.finisher().apply(a) result.

    public IdCollector(final Collector<T, A, R> baseCollector, final Function<R, RR> primaryFinisher) {
        this.baseCollector= baseCollector;
        this.primaryFinisher= primaryFinisher;
    }
 
    // Assumes that created secondary collector finisher() will always be called after primary collector finisher()
    public Collector<T,A,RR> secondary(final Function<R,RR> secondaryFinisher) {
        return Collector.of(
                ()->null,
                (a,t) -> {},
                (a1,a2) -> null,
                a -> secondaryFinisher.apply(r));
    }

    @Override
    public Supplier<A> supplier() {
        return baseCollector.supplier();
    }

    @Override
    public BiConsumer<A, T> accumulator() {
        return baseCollector.accumulator();
    }

    @Override
    public BinaryOperator<A> combiner() {
        return baseCollector.combiner();
    }

    @Override
    public Function<A, RR> finisher() {
        return baseCollector.finisher()
                        .andThen(r -> primaryFinisher.apply(this.r = r));
    }

    @Override
    public Set<Characteristics> characteristics() {
        return baseCollector.characteristics();
    }
}

rwperrott · 2021-01-24T11:12:11Z

For *By columns this would be most efficient if the T to U keyExtractor function and keyComparator are both reference equal for all columns, to allow sharing a collected, the pre-sorted List, sorted by the same U values, otherwise the list would need to be copied and resorted for each column.

It looks more memory efficient to base the *By functionality off a list of the T value, then using keyComparator with the keyExtractor e.g via this method in Comparator (Java 1.8+)

    public static <T, U> Comparator<T> comparing(
            Function<? super T, ? extends U> keyExtractor,
            Comparator<? super U> keyComparator)

Likewise if the List collector based sourced columns Comparator must be reference equal for all columns, otherwise the list would need to be copied and resorted for each column.

rwperrott · 2021-01-28T15:46:21Z

See my latest Sharable Collector code, with support for reference and mapped id based results sharing at https://github.com/rwperrott/sharable-collector

lukaseder · 2021-01-28T16:36:11Z

@rwperrott I appreciate your efforts, thank you very much. To make sure expectations are set right, I currently cannot invest much time in jOOL. This includes reviewing your various messages and code (as this is by no means a trivial task).

At some point in the future, I'll be able to allocate some dedicate time to see what's currently open and missing, but I cannot make any promises.

rwperrott · 2021-01-28T17:56:20Z

@lukaseder Thanks. I expected you were busy. given prior feedback. I also developed this speculative solution because it's a fun exercise and I think it'll be useful for my projects too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuple.collectors() should be able to share state #344

Tuple.collectors() should be able to share state #344

lukaseder commented Feb 12, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rwperrott commented Jan 13, 2021 •

edited

Loading

rwperrott commented Jan 24, 2021 •

edited

Loading

rwperrott commented Jan 28, 2021 •

edited

Loading

lukaseder commented Jan 28, 2021

rwperrott commented Jan 28, 2021

Tuple.collectors() should be able to share state #344

Tuple.collectors() should be able to share state #344

Comments

lukaseder commented Feb 12, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rwperrott commented Jan 13, 2021 • edited Loading

rwperrott commented Jan 24, 2021 • edited Loading

rwperrott commented Jan 28, 2021 • edited Loading

lukaseder commented Jan 28, 2021

rwperrott commented Jan 28, 2021

rwperrott commented Jan 13, 2021 •

edited

Loading

rwperrott commented Jan 24, 2021 •

edited

Loading

rwperrott commented Jan 28, 2021 •

edited

Loading