Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fold / reduce missing #33

Open
tnovotny opened this issue Oct 22, 2019 · 17 comments
Open

fold / reduce missing #33

tnovotny opened this issue Oct 22, 2019 · 17 comments

Comments

@tnovotny
Copy link

Maybe I missed some way to combine things, but to me it looks as if something like fold/reduce is missing.

@joboccara
Copy link
Owner

That's an interesting question.
Can you see a syntax you'd like to see working?

@tnovotny
Copy link
Author

tnovotny commented Oct 22, 2019

Hmm, not sure. I initially thought as an end pipe, but that makes a messier API as you need to pass the operation and the destination. Maybe a better approach would be to pass the intermediate results on and store them in some variable assigning end pipe (named to in the following pseudocode).

auto const input = std::vector<std::string>{"1", "2", "3"};
int result;

input >>= pipes::transform(std::stoi)
      >>= pipes::reduce(std::plus)
      >>= pipes::to(result);

@joboccara
Copy link
Owner

joboccara commented Oct 23, 2019

I see. Don't you find it natural that the accumulated value and the operation are passed together?

auto const input = std::vector<std::string>{"1", "2", "3"};
int result = 0;

input >>= pipes::transform(std::stoi)
      >>= pipes::reduce(result, std::plus);

@tnovotny
Copy link
Author

tnovotny commented Oct 23, 2019

TL;DR that would be fine

In my initial sketch (not given) it looked messier becaue I had used an in situ lambda, but looking at it now in a cleaner version the end pipe does seem more appealing. I'm a little more partial to reduce than to fold, but I am ok with either choice as long as it's not accumulate which has too much association with summation IMHO.

I don't know whether there is any use for it, but the separated version would have the additional flexibility to pass the stream on

 ...
 >>= pipes::reduce(std::plus)
 >>= pipes::demux(pipes::to(result),...);

@godefv
Copy link

godefv commented Oct 5, 2020

I suggest to return by value :

auto const input = std::vector<std::string>{"1", "2", "3"};
auto const result = (
    input 
    >>= pipes::transform(std::stoi)
    >>= pipes::reduce(std::plus)
);
  1. It makes it possible to use const in the result.
  2. It makes it possible to use auto in the result.
  3. In case of a more complicated object, it calls the constructor only once (return value optimization).

Also, you could construct a vector of results directly :

auto const input = std::vector<std::string>{"1", "2", "3"};
auto const result = (
    input 
    >>= pipes::transform(std::stoi)
    >>= pipes::to_vector
);

Here, you "reduce" the pipe to a vector.

@godefv
Copy link

godefv commented Oct 5, 2020

Common reduction operators should probably have a name like in python : count all any sum

@joboccara
Copy link
Owner

Returning by value is the most appealing syntax, but each time I've thought about implementing it, I'm facing the issue that a pipe can have multiple outputs. For example, if the pipeline has demux and only one branch ends with reduce, what should the overall expression return?

@godefv
Copy link

godefv commented Oct 6, 2020

That's a good point. The first thing I can think of is to return a std::tie of results, so that structured bindings and std::ignore can be used.

The use of pipes::to_vector would almost solve the issue.

auto const [B, C, D, E]=(
  A 
  >>= pipes::transform(f)
  >>= pipes::filter(p)
  >>= pipes::unzip(
    pipes::to_vector,
    pipes::fork(
      pipes::to_vector,
      pipes::filter(q) >>= pipes::to_vector,
      pipes::filter(r) >>= pipes::to_vector
    )
  )
);

But there are still pipes with no return value such as for_each or dev_null.
One way would be that such pipes return some empty type and should be std::ignored in a muxed pipe.

auto const [B, C, std::ignore, E]=(
  A 
  >>= pipes::transform(f)
  >>= pipes::filter(p)
  >>= pipes::unzip(
    pipes::to_vector,
    pipes::fork(
      pipes::to_vector,
      pipes::filter(q) >>= pipes::for_each(print),
      pipes::filter(r) >>= pipes::to_vector
    )
  )
);

Another way would be to return a std::tie of returned values only.

auto const [B, C, E]=(
  A 
  >>= pipes::transform(f)
  >>= pipes::filter(p)
  >>= pipes::unzip(
    pipes::to_vector,
    pipes::fork(
      pipes::to_vector,
      pipes::filter(q) >>= pipes::for_each(print),
      pipes::filter(r) >>= pipes::to_vector
    )
  )
);

@joboccara
Copy link
Owner

joboccara commented Oct 6, 2020

This looks like an interesting solution. pipes::reduce would then return the reduced value, potentially in a tuple too? And what if there is only one way out, in a linear pipeline, should we return a value and not a tuple?
And we'd need a bit of user input to determine if this syntax is overall more convenient than pipes::push_back.

@godefv
Copy link

godefv commented Oct 6, 2020

Regarding pipes::push_back, you can keep it even if you have pipes::to_vector. It allows to push_back to an existing vector which might not be empty, or which might already have allocated memory. Like pipes::for_each, pipes::push_back would return no value. So, the user can choose.

From the user point of view, I think that single pipes are expected to return a value, not a tuple.
But from the implementer point of view, it seems that either everything have to return a tuple, so that pipes like fork can flatten the outputs to a single tuple, or each pipe like fork would nest its results in a new tuple, resulting in potentially nested tuples. Because values can be tuples, and because of the push model, a fork needs to know that every pipe returns a flattened tuple of results to be able to flatten its own results to a new flattened tuple.
C++ does not allow nested structured bindings, so I would prefer avoiding nested tuples.
Always returning a tuple would also provide an obvious return value for non-returning pipes : the empty tuple.
It is also more consistent across all kinds of pipes.

Is it too weird to write auto const [x]=... instead of auto const x=...` ?
What do you think ?

@fried-water
Copy link
Contributor

Ya I was just thinking of a mechanism for pipe expressions to return a value (in #58). Fold/Reduce and all its derivatives seems like an obvious use case for it. Until the semantics for non-SISO pipes is decided we could always implement returning a value for SISO pipes and leave fork and similar pipes to return void.

As for the semantics for fork() I think returning a tuple by value with some special void type to represent terminal pipes that don't have a return value (for_each/push_back). I don't think its worth flattening nested tuples, realistically nested forks are probably going to be relatively rare and therefore making the semantics more complicated to accommodate them doesn't make sense. Although if there are some common use cases it might be worth putting more thought. Returning a tie seems like it may lead to dangling reference issues.

@godefv
Copy link

godefv commented Oct 13, 2020

After thinking about it, I think that the single output case is by far the most common, and should return a simple value, not a tuple.

The behaviour for fork could probably be decided later.

@godefv
Copy link

godefv commented Oct 13, 2020

Actually, the type of the TailPipelines should have enough information to know in advance which tails return a value and which tails do not. So, flattening everything and/or removing voids should be possible, if wanted.

@godefv
Copy link

godefv commented Oct 16, 2020

As you can see, I have made a first implementation of pipes::to_<Container> which is compatibe with pipes::fork.
In this implementation, a forked pipeline returns a tuple, containing std::ignore for non returning paths.

A common forked pipeline with returning tails and non returning tails is a tee, with one returning path and one debugging or logging path.

auto const [dummy,y]=(
  x 
  >>= pipes::transform(f)
  >>= pipes::fork(
    pipes::to_out_stream(debug_stream),
    pipes::transform(g) >>= pipes::to_<std::vector>
  )
)

In this example, it might seem better to return a single value to avoid the dummy.

However, always mirroring the pipeline structure in the return type makes it easier to replace parts of the pipeline, even dynamically.

auto get_debug_pipeline=[&debug_enabled](){
  // dynamically choose this pipeline
  if(!debug_enabled) return pipes::dev_null;

  // or this one
  return pipes::transform(get_debug_info) >>= pipes::to_<std::vector<debug_info_t>>;

  // you can replace previous line with this one without breaking client code
  // return pipes::transform(get_debug_info) >>= pipes::to_out_stream(debug_stream);
};

auto cont [debug,y]=(
  x 
  >>= pipes::transform(f)
  >>= pipes::fork(get_debug_pipeline(),
    >>= pipes::transform(g)
    >>= pipes::to_<std::vector<int>>
  )
)

handle_debug_info(debug); //overload this function to handle different types of debug info

To conclude, I believe that it is the correct choice to return something even for non-returning pipelines (returning a custom type instead of std::ignore would probably be better though).

@godefv
Copy link

godefv commented Oct 16, 2020

About nested tuples, it is actually acceptable to return them even without nested structured bindings, because one can do the following.

auto const [A,BC]=(
  x
  >>= pipes::transform(f)
  >>=pipes::fork(
    pipes::to_<std::vector<int>>,
    pipes::fork(
      pipes::transform(g) >>= pipes::to_<std::vector<int>>,
      pipes::transform(h) >>= pipes::to_<std::vector<int>>,
    )
  )
);

auto const [B,C]=BC;

So, the API of my current implementation seems satisfactory to me.

@bradphelan
Copy link

bradphelan commented Feb 16, 2021

I'm curious. Have you thought about if it is possible to write

 >>= pipes::to_vector

rather than

 >>= pipes::to_<std::vector<int>>

and have the type system figure out what the element type should be?

@godefv
Copy link

godefv commented Feb 16, 2021

Of course, it would be more convenient to omit the type of the element, which is redundant. However, it would be more complicated to implement because the pipe object has to be constructed before any input goes through it. Besides, pipes::to_ accepts anything with .push_back(), not only vectors.

So, pipes::to_ should probably be kept, and I could add pipes::to_vector which would have a dummy pipe type with an overload operator>>=(Range&& range, pipes::to_vector_t&& pipeline) which would extract the element type of range and forward to operator>>=(Range&& range, pipes::to_<std::vector<element_type_t<decltype(range)>>>>&& pipeline).

Short answer: yes, it can be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants