Reduce

From Functional and Heterogeneous computing to C++1z fold expressions

whoami

Nick Athanasiou

BETA CAE Systems S.A

https://ngathanasiou.wordpress.com/

Some motivation

Why is there no transform_if in the Standard Library ?

Example use case

Contitional copy (copy_if) from a container of values to a container of pointers to those values (transform).

struct ha { 
    int i;
    explicit ha(int a) : i(a) {}
};

int main() 
{
    vector<ha> v{ ha{1}, ha{7}, ha{1} }; // initial vector

    vector<ha*> ph; // target vector
    vector<ha*> pv; // temporary vector

   // 1. 
    transform(v.begin(), v.end(), back_inserter(pv), 
        [](ha &arg) { return &arg; }); 

   // 2. 
    copy_if(pv.begin(), pv.end(), back_inserter(ph),
        [](ha *parg) { return parg->i < 2;  });

    return 0; // notice the inconsistency ? 
}

The obvious "fix"

template <
    class InputIterator, class OutputIterator, 
    class UnaryOperator, class Pred
>
OutputIterator transform_if(InputIterator first1, InputIterator last1,
                            OutputIterator result, UnaryOperator op, Pred pred)
{
    while (first1 != last1) 
    {
        if (pred(*first1)) {
            *result = op(*first1);
            ++result;
        }
        ++first1;
    }
    return result;
}

// example call 
transform_if(v.begin(), v.end(), back_inserter(ph), 
[](ha &arg) { return &arg;      }, // 1. 
[](ha &arg) { return arg.i < 2; });// 2.

A scholar's fix

#include <boost/range/algorithm.hpp>
#include <boost/range/adaptors.hpp>

using namespace boost::adaptors;

// only for succinct predicates without lambdas
#include <boost/phoenix.hpp>
using namespace boost::phoenix::arg_names;

#include <iostream>

int main()
{
    std::vector<int> const v { 1,2,3,4,5 };

    boost::copy(
            v | filtered(arg1 % 2) | transformed(arg1 * arg1 / 7.0),
            std::ostream_iterator<double>(std::cout, "\n"));
}

The wat

std::accumulate( v.begin(), v.end(), std::back_inserter( v1 ),
    []( decltype( std::back_inserter( v1 ) ) &it, int &x )
    {
        return ( x % 2 == 0 ? ( it = &x, ++it ) : it  );
    });

That was a reduction, but let's take it from the top

Roadmap

"reduce" the ubiquitous
transducers
on composition
C++1z fold expressions
arbitrary fold expressions
lazy evaluation
using transducers and arbitrary fold expressions

1. "reduce" the ubiquitous

1.1 The functional genome of reduce

The concept of reducing is central to the "programming experience" in functional languages. A formal definition would go like this:

In functional programming, fold – also known variously as reduce, accumulate, aggregate, compress, or inject – refers to a family of higher-order functions that analyze a recursive data structure and through use of a given combining operation, recombine the results of recursively processing its constituent parts, building up a return value.

higher order function

recursive data structure

combining operation

accumulator

example

`reduce` the list `[1, 2, 3, 4, 5]` using `(+)` = `15`

Depending on where we "fold from" we get left or right folds.

-- type of the left fold operation
foldl :: (b -> a -> b) -> b -> [a] -> b

The above informs us that `foldl` is a (higher order) function that accepts:

a function (b -> a -> b)

an accumulator of type b

a list of a

and returns a reduced value of type `b`.

Let's make a visualization of the left fold operation for this expression

-- add numbers 1 to 5 with a zero accumulator
foldl (+) 0 [1..5]



In [1]:

    
%%ghc
main = do
    putStrLn $ foldl (\x y -> concat ["(",x,"+",y,")"]) "0" (map show [1..5])









    Out[1]:





['(((((0+1)+2)+3)+4)+5)']

type definition for the right fold operation

foldr :: (a -> b -> b) -> b -> [a] -> b

Notice however the type of the reducer. We say that the accumulator `b` "consumes" the operants from the right.

Visualizing the expression

foldr (+) 0 [1..5]



In [2]:

    
%%ghc
main = do
    putStrLn $ foldr (\x y -> concat ["(",x,"+",y,")"]) "0" (map show [1..5])









    Out[2]:





['(1+(2+(3+(4+(5+0)))))']

1.2 Reduce is more general than map and filter

map (or transform) is a higher order function that given a mutation function and an iterable produces new values for the iterable (eagerly or lazily depending on the language / implementation)



In [37]:

    
help(map)









    



Help on class map in module builtins:

class map(object)
 |  map(func, *iterables) --> map object
 |  
 |  Make an iterator that computes the function using arguments from
 |  each of the iterables.  Stops when the shortest iterable is exhausted.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.



In [38]:

    
list(map(lambda uc : uc.upper(), "Capitalize every word".split()))









    Out[38]:





['CAPITALIZE', 'EVERY', 'WORD']

map can be expressed as a reduction



In [41]:

    
%%ghc
     
-- guess why we use a right fold
map' :: (a -> b) -> [a] -> [b]  
map' f xs = foldr (\x acc -> f x : acc) [] xs  

main = do
    print $ map' (*2) [1..4]
    print $ map' (\x -> 2*x) [1..4]
    print $ map' (\x -> show x) [1..4]
    print $ foldr ((++).show) "" [1..4]









    Out[41]:





['[2,4,6,8]', '[2,4,6,8]', '["1","2","3","4"]', '"1234"']



In [40]:

    
from functools import reduce

def fizz_buzz(out, item):
    """ solution to the fizz buzz puzzle
        demonstration of the separation of concerns
    """
    a = ""
    if 0 == (item % 3): a += 'Fizz'
    if 0 == (item % 5): a += 'Buzz'
    elif (item % 3):    a  = str(item)
    
    out.append(a)
    return out

reduce(fizz_buzz, range(1, 101), [])









    Out[40]:





['1',
 '2',
 'Fizz',
 '4',
 'Buzz',
 'Fizz',
 '7',
 '8',
 'Fizz',
 'Buzz',
 '11',
 'Fizz',
 '13',
 '14',
 'FizzBuzz',
 '16',
 '17',
 'Fizz',
 '19',
 'Buzz',
 'Fizz',
 '22',
 '23',
 'Fizz',
 'Buzz',
 '26',
 'Fizz',
 '28',
 '29',
 'FizzBuzz',
 '31',
 '32',
 'Fizz',
 '34',
 'Buzz',
 'Fizz',
 '37',
 '38',
 'Fizz',
 'Buzz',
 '41',
 'Fizz',
 '43',
 '44',
 'FizzBuzz',
 '46',
 '47',
 'Fizz',
 '49',
 'Buzz',
 'Fizz',
 '52',
 '53',
 'Fizz',
 'Buzz',
 '56',
 'Fizz',
 '58',
 '59',
 'FizzBuzz',
 '61',
 '62',
 'Fizz',
 '64',
 'Buzz',
 'Fizz',
 '67',
 '68',
 'Fizz',
 'Buzz',
 '71',
 'Fizz',
 '73',
 '74',
 'FizzBuzz',
 '76',
 '77',
 'Fizz',
 '79',
 'Buzz',
 'Fizz',
 '82',
 '83',
 'Fizz',
 'Buzz',
 '86',
 'Fizz',
 '88',
 '89',
 'FizzBuzz',
 '91',
 '92',
 'Fizz',
 '94',
 'Buzz',
 'Fizz',
 '97',
 '98',
 'Fizz',
 'Buzz']

filter can be expressed as a reduction



In [48]:

    
%%ghc

filter' :: (a -> Bool) -> [a] -> [a]  
filter' p = foldr (\x acc -> if p x then x : acc else acc) []

main = do
    print $ filter' (>10) [1..20]
    print $ filter' even [1..20]









    Out[48]:





['[11,12,13,14,15,16,17,18,19,20]', '[2,4,6,8,10,12,14,16,18,20]']



In [56]:

    
from functools import reduce

def my_filter(predicate, iterable):
    """ filter as a reducer
    """
    def filter_reducer(sequence, item):
        if predicate(item):
            sequence.append(item)
        return sequence

    return reduce(filter_reducer, iterable, [])

my_filter(lambda x: x % 2, range(1, 20))









    Out[56]:





[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

2. transducers

A concept introduced by Rich Hickey in his 8/2014 article "Transducers are coming"

a transducer is a function that takes one reducing function and returns another

(whatever, input -> whatever) -> (whatever, input -> whatever)

shown in closure

The primary power of transducers comes from their fundamental decoupling - they don't care (or know about):

the 'job' being done (what the reducing function does)
the context of use (what 'whatever' is)
the source of inputs (where input comes from)

extract the essence of map and filter

C++ implementation in atria (presented by Juan Pedro Bolivar Puente in CppCon 2015)

two of the most common STL functions

template <typename In, typename Out, typename F>
Out transform(In first, In last, Out out, F fn) {
    for (; first != last; ++first) {
        *out++ = fn(*first);
    }
    return out;
}

template <typename In, typename Out, typename P>
Out filter(In first, In last, Out out, P pred) {
    for (; first != last; ++first) {
        if (pred(*first))
            *out++ = *first;
    }
    return out;
}

have very much in common

template <typename In, typename Out, typename F>
Out transform(In first, In last, Out out, F fn) {
    for (; first != last; ++first) {
        *out++ = fn(*first);
    }
    return out;
}

template <typename In, typename Out, typename P>
Out filter(In first, In last, Out out, P pred) {
    for (; first != last; ++first) {
        if (pred(*first))
            *out++ = *first;
    }
    return out;
}

and an almost different way of producing an output

template <typename In, typename Out, typename F>
Out transform(In first, In last, Out out, F fn) {
    for (; first != last; ++first) {
        *out++ = fn(*first);
    }
    return out;
}

template <typename In, typename Out, typename P>
Out filter(In first, In last, Out out, P pred) {
    for (; first != last; ++first) {
        if (pred(*first))
            *out++ = *first;
    }
    return out;
}

accumulate is the "purest way" of stepping through a sequence

template <typename In, typename S, typename Rf>
S accumulate(In first, In last, S state, Rf step) {
    for (; first != last; ++first) {
        state = step(state, *first);
    }
    return state;
}

just applying a step function

template <typename In, typename S, typename Rf>
S accumulate(In first, In last, S state, Rf step) {
    for (; first != last; ++first) {
        state = step(state, *first);
    }
    return state;
}

that could be an abstraction of what we expect an "appender" to do

auto output_rf = [] (auto out, auto input) {
    *out++ = input;
    return out;
};

so map and filter can be epxressed in terms of reduce in c++ as well

template <typename In, typename Out, typename F>
Out transform(In first, In last, Out out, F fn) {
    return accumulate(first, last, out, 
        [&](auto state, auto input) {
           return output_rf(state, fn(in));
        });
}

template <typename In, typename Out, typename P>
Out filter(In first, In last, Out out, P pred) {
    return accumulate(first, last, out, 
        [&](auto state, auto input) {
          return pred(input) ? 
              output_rf(state, input) : state;
        });
}

template <typename In, typename Out, typename F>
Out transform(In first, In last, Out out, F fn) {
    return accumulate(first, last, out, 
        [&](auto state, auto input) {
           return output_rf(state, fn(in));
        });
}

template <typename In, typename Out, typename P>
Out filter(In first, In last, Out out, P pred) {
    return accumulate(first, last, out, 
        [&](auto state, auto input) {
          return pred(input) ? 
              output_rf(state, input) : state;
        });
}

template <typename In, typename Out, typename F>
Out transform(In first, In last, Out out, F fn) {
    return accumulate(first, last, out, 
        [&](auto state, auto input) {
           return output_rf(state, fn(in));
        });
}

template <typename In, typename Out, typename P>
Out filter(In first, In last, Out out, P pred) {
    return accumulate(first, last, out, 
        [&](auto state, auto input) {
          return pred(input) ? 
              output_rf(state, input) : state;
        });
}

auto map = [] (auto fn) {
  return [=] (auto step) {
    return [=] (auto s, auto ...ins) {
      return step(s, fn(ins...));
    };
  };
};

// transform(first, last, fn) ==

accumulate(first, last, out,
    map(fn)(output_rf));

auto filter = [] (auto pred) {
  return [=] (auto step) {
    return [=] (auto s, auto ...ins) {
      return pred(ins...)
             ? step(s, ins...)
             : s;
    };
  };
};

// filter(first, last, pred) ==
accumulate(first, last, out,
    filter(fn)(output_rf));

composing to produce new execution patterns

examples in Haskell

mapping :: (a -> b) -> (r -> b -> r) -> (r -> a -> r)
mapping f xf r a = xf r (f a)

filtering :: (a -> Bool) -> (r -> a -> r) -> (r -> a -> r)
filtering p xf r a = if p a then xf r a else r

flatmapping :: (a -> [b]) -> (r -> b -> r) -> (r -> a -> r)
flatmapping f xf r a = foldl xf r (f a)

3. on composition ...

In computer science

$composition \equiv combination$

whether we refer to object or function composition, the notion of creating "entities" of increasing complexity is common

objectives:

maintainability
code reuse
readability (?)
code beauty (?)

reuse brings familiarity with certain code components hence readability implicitly increaces and code beauty, well ... what could we say about that ?

In trying to discover metrics of "code beauty" it's interesting to study composition in the visual arts

in cinematography

composition : how the elements inside the frame are positioned and exhibited to the viewer; a skill of knowing what to show and what not to show as well as how to show it or how not to show it

most of the aplications of composition revolve around visual necessity, checkboxes that need to be checked

cinema	arbitrary programming example
enough lighting	indent your code even when not mandated by the language
don't block important visual information	break at column 80

visual artists have realized successful compositional templates that are used up to this day

the rule of thirds

the golden ratio

triangular composition

but that what composition offers is a way to tell stories with one single shot

how composition works

Objective 1 :

attract the audience's attention (what should the viewer be looking at and how we get them to look at it)

compositional influencers :

geometry
diagonals
framing
eyeline of subjects
focus
scale
subject close to light
guiding lines

/****************************/
static void f(void)
/****************************/
{
    return; 
}


struct window
{
    // ----------------------------------
    window() = default; 
    ~window() = default; 
    // ----------------------------------
    void scale(double factor)
    {
        perform_scaling(data, factor); 
    }
    void manage(Action *action)
    {
        perform_action(data, action)
    }
    // ----------------------------------
private:
    HANDLE data; 
};

we set aside the function of creating subtext e.g. frame within a frame

Objective 2

Show which subject has "control of the scene"

composition is a tool to :

display the power dynamics within an image
translate visually the degree of control that characters hold in a scene through the positioning of objects

the two types of control

artificial control

primal control

domination of the frame requires that you strip away all the detail and focus on a single element

switch(count % 8) {
            case 0: do { *to++ = *from++;
                        case 7: *to++ = *from++;
                        case 6: *to++ = *from++;
                        case 5: *to++ = *from++;
                        case 4: *to++ = *from++;
                        case 3: *to++ = *from++;
                        case 2: *to++ = *from++;
                        case 1: *to++ = *from++;
                    } while(--n > 0);
        }

std::transform(r1, r2, functor);

std::copy(v1, v2);

SO ...

std::accumulate(
    v.begin(), v.end(), std::back_inserter(vp), 
    filter(pred)(map(fn)(output_rf))
);

4. C++1z fold expressions

C++ expressed the reduction logic through the standard library function `std::accumulate`

highly underestimated

decayed to the mandane task of summing (hence its dwelling in <numeric>)

cannot express:
- folding in compile time contexts
- the pure expression logic underlying reductions
- handle variadicness of input

C++1z ammends this by adding support for fold expressions (Sutton and Smith). According to the related proposal:

A fold expression performs a fold of a template parameter pack ([temp.variadic]) over a binary operator.

4.1 Syntax

Let "e" $= e_1, e_2, \dotso, e_n$ be an expression that contains an unexpanded parameter pack and $\otimes$ the fold operator, then fold expressions have the form:

Unary left folds

$(\dotso\; \otimes\; e)$

which expands to $ (((e_1 \otimes e_2) \dotso ) \otimes e_n)$

Unary right folds

$(e\; \otimes\; \dotso)$

which expands to $(e_1 \otimes ( \dotso (e_{n-1} \otimes e_n)))$

If we add a non pack argument on the dots' side of each of the above we get their binary versions that have identical expansion behavior:

Binary left folds

$(a \otimes\; \dotso\; \otimes\; e)$

which expands to $ (((a \otimes e_1) \dotso ) \otimes e_n)$

Binary right folds

$(e\; \otimes\; \dotso\; \otimes\; a)$

which expands to $(e_1 \otimes ( \dotso (e_n \otimes a)))$

The "⊗" operator can be one of:

+  -  *  /  %  ^  &  |  ~  =  <  >  <<  >>
+=  -=  *=  /=  %=  ^=  &=  |=  <<=  >>=
==  !=  <=  >=  &&  ||  ,  .*  ->*

4.2 Identity elements

The fold of an empty parameter pack evaluates to a specific value. The choice of value depends on the operator. The set of operators and their empty expansions are in the table below.

Operator	Value when parameter pack is empty
$*$	1
$+$	0
$&$	-1
$\mid$	0
$&&$	true
$\parallel$	false
$,$	void()

If a fold of an empty parameter pack is produced for any other operator, the program is ill-formed

4.3 Examples

/// Summing the contents of an array at compile time

#include <array>
#include <utility>
#include <iostream>

using namespace std; 

namespace detail
{
    template <class... Ts>
    constexpr auto sum_(Ts&&... args)
    {
        return (args + ...);
    }

    template <typename T, size_t N, size_t... Is>
    constexpr T sum_impl(array<T, N> const &arr, index_sequence<Is...>)
    {
        return sum_(get<Is>(arr)...);
    }
}

template <typename T, size_t N>
constexpr T sum(array<T, N> const &arr)
{
    return detail::sum_impl(arr, make_index_sequence<N>{});
} 

int main()
{
    constexpr array<int, 4> arr1{ { 1, 1, 2, 3 } };
    constexpr array<int, 0> arr2{ };

    cout << integral_constant<int, sum(arr1)>{} << endl;
    cout << integral_constant<int, sum(arr2)>{} << endl;
}

Output : ['7', '0']

// iterating over different types

#include <iostream>

#define fw(...) ::std::forward<decltype(__VA_ARGS__)>(__VA_ARGS__)

struct Window {
    void show() { std::cout << "showing Window\n"; }
} win;

struct Widget {
    void show(){ std::cout << "showing Widget\n"; }
} wid;

struct Toolbar {
    void show(){ std::cout << "showing Toolbar\n"; }
} tlb;

int main()
{
    auto printer = [](auto&&... args) { (fw(args).show(), ...); };

    printer(win, wid, tlb);
    printer(); // remember void() ? 
}

Output: ['showing Window', 'showing Widget', 'showing Toolbar']

// a for_each lambda

#include <iostream>

struct Printer 
{
    template <class T> 
    void operator()(T&& arg) { std::cout << arg; }
};

int main()
{
    auto ForEach = [](auto&& fun, auto&&... args) 
    { 
        (..., std::invoke(fw(fun), fw(args)));
    };

    ForEach(Printer{}, 0.5, " a loaf is better than ", 0, " bread", '\n');
}

Output: ['0.5 a loaf is better than 0 bread']

// a modern summing style - showcasing the unfolding properties of std::apply

#include <array>
#include <tuple>
#include <utility>
#include <iostream>
#include <type_traits>

namespace cpp17
{
    template< class F, class... ArgTypes>
    std::result_of_t<F&&(ArgTypes&&...)> invoke(F&& f, ArgTypes&&... args);

    namespace detail 
    {
        template <class F, class Tuple, std::size_t... I>
        constexpr decltype(auto) apply_impl(
            F&& f, Tuple&& t, std::index_sequence<I...>) 
        {
#if 1
            return (std::forward<F>(f))(std::get<I>(std::forward<Tuple>(t))...);
#else
            return invoke(
                std::forward<F>(f), std::get<I>(std::forward<Tuple>(t))...);
#endif // TODO: Elaborate on the inconsistency of invoke
        }
    }  

    template <class F, class Tuple>
    constexpr decltype(auto) apply(F&& f, Tuple&& t)
    {
        return detail::apply_impl(
            std::forward<F>(f), 
            std::forward<Tuple>(t),
            std::make_index_sequence<std::tuple_size<std::decay_t<Tuple>>{}>{});
    }
}

struct Summer 
{
    template <class... Ts> 
    constexpr auto operator()(Ts&&... args) 
    { 
        return (args + ...); 
    }
};

int main()
{
    constexpr std::array<int, 4>        arr{ { 1, 2, 3 } };
    constexpr std::tuple<int, int, int> tup{   1, 2, 3   }; 

    std::cout << "Array sum : " << cpp17::apply(Summer{}, arr) << std::endl;
    std::cout << "Tuple sum : " << cpp17::apply(Summer{}, tup) << std::endl;
}

Output: ['Array sum : 6', 'Tuple sum : 6']

once upon a time

// mp_plus -----------------------------------------------------------
template <class... T>
struct mp_plus_impl;

template <class... T>
using mp_plus = typename mp_plus_impl<T...>::type;

template <>
struct mp_plus_impl<>
{
    using type = std::integral_constant<int, 0>;
};

template <class T1, class... T>
struct mp_plus_impl<T1, T...>
{
    static constexpr auto _v = T1::value + mp_plus<T...>::value;

    using type =
        std::integral_constant<typename std::remove_const<decltype(_v)>::type,
                               _v>;
};

template <class T1, class T2, class T3, class T4, class T5, class T6, class T7,
          class T8, class T9, class T10, class... T>
struct mp_plus_impl<T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T...>
{
    static constexpr auto _v = T1::value + T2::value + T3::value + T4::value +
                               T5::value + T6::value + T7::value + T8::value +
                               T9::value + T10::value + mp_plus<T...>::value;

    using type =
        std::integral_constant<typename std::remove_const<decltype(_v)>::type,
                               _v>;
};

// mp_count ----------------------------------------------------------
template <class L, class V>
struct mp_count_impl;

template <template <class...> class L, class... T, class V>
struct mp_count_impl<L<T...>, V>
{
    using type = mp_plus<std::is_same<T, V>...>;
};

template <class L, class V>
using mp_count = typename mp_count_impl<L, V>::type;

// bool_constant -----------------------------------------------------
template <bool B>
using bool_constant = std::integral_constant<bool, B>;

// mp_contains -------------------------------------------------------
template <class L, class V>
using mp_contains = bool_constant<mp_count<L, V>::value != 0>;

// test whether H exists in Ts 
template <class H, class... Ts>
constexpr bool h_in_ts = (... || std::is_same<H, Ts>::value);

// count results of type queries

#include <type_traits>
#include <iostream>

using namespace std; 

// count the times a predicate P is satisfied in a typelist L
template <template<class> class P, class... L>
constexpr size_t count_if = (P<L>::value + ...); 

// count the occurences of a type V in a typelist L
template <class V, class... L>
constexpr size_t count = (is_same<V, L>::value + ...); 

int main()
{
    cout << count_if <is_integral, float, unsigned, int, double, long> << endl;
    cout << count<float, unsigned, int, double, float> << endl;
}

Output: ['3', '1']

// assign to a given std::array index and beyond, from a list of arguments

#include <array>
#include <utility>
#include <iostream>

using namespace std; 

template <class T, size_t N>
struct A
{
    array<T, N> arr;

    template <class... Ts, size_t... Is>
    void set_from_list(size_t offset, Ts&&... vals)
    {
        return set_from_list(
            offset, index_sequence_for<Ts...>{}, forward<Ts>(vals)...); 
    }

private:
    template <class... Ts, size_t... Is>
    void set_from_list(size_t offset, index_sequence<Is...>&&, Ts&&... vals)
    {
        ((arr[offset + Is] = vals), ...); 
        //             ^^    ^^^^
        // two or more pack aruments can exist in the expression
        // provided they are of the same **cardinality**
    }
};

int main()
{
    A<int, 5> a{{{1, 2, 3, 4, 5}}};

    // the length of the list must equal the length of remaining array elements
    a.set_from_list(2, 6, 6, 6); 

    printf("{%d, %d, %d, %d, %d}", 
        get<0>(a.arr), get<1>(a.arr), 
        get<2>(a.arr), get<3>(a.arr), get<4>(a.arr)); 
}

Output: ['{1, 2, 6, 6, 6}']

5. arbitrary fold expressions

A somewhat limiting feature of c++ fold expressions, is that they’re only available for certain operators.

... so doing

$(\dotso\; \otimes\; e)$

is not allowed for arbitrary "callables".

`boost.reduce` (boost library incubator) solves this. Here's how

5.1 Creating arbitrary fold expressions

Haskell’s fold diagram bears a strong resemblance to Veldhuizen’s expression parse trees:

> create add-hoc expression templates for arbitrary operators using thin wrapper types that can be parametrized by the operant / operator pair

a wrapper type that apart from data, records the type of the operant we want to use:

template <class F, class T>
struct XX
{
    T data;

    template <class D>
    constexpr XX(D&& data)
        : data(std::forward<D>(data))
    { }

    constexpr T give() const
    {
        return data;
    }
};

create an expression class that records the operants and the type of the operation we want to perform

template <class L, class F, class R>
struct YY
{
    L l;
    R r;

    template <class LL, class RR>
    constexpr YY(LL&& l, RR&& r)
        : l(std::forward<LL>(l))
        , r(std::forward<RR>(r))
    { }

    constexpr auto give() const
    {
        return F{}(l.give(), r.give());
    }
};

We’ll be using the + operator in our ad-hoc expression templates so we overload it to perform expression parsing for our wrapper class (the two versions are created in way that satisfies the types involved in a left to right evaluation)

template <class F, class R, class T>
constexpr auto operator+(XX<F, T> const& lhs, R const& rhs)
{
    return YY< XX<F, T>, F, R >(lhs, rhs);
}

template <class F, class T1, class T2, class R>
constexpr auto operator+(YY<T1, F, T2> const& lhs, R const& rhs)
{
    return YY< YY<T1, F, T2>, F, R >(lhs, rhs);
}

Finally the interface functions use the above machinery

namespace detail
{
    template <class... Ts>
    constexpr auto _foldl(Ts&&... args)
    {
        return (... + args);
    }

    template <class... Ts>
    constexpr auto _foldr(Ts&&... args)
    {
        return (args + ...);
    }
}

template <class F, class... Ts>
constexpr decltype(auto) foldl(Ts&&... args)
{
    return detail::_foldl(XX<F, Ts>(args)...).give();
}

template <class F, class... Ts>
constexpr decltype(auto) foldr(Ts&&... args)
{
    return detail::_foldr(XX<F, Ts>(args)...).give();
}

foldl and a foldr function that, much like the Haskell counterparts, can be parameterized by the type of a binary operator

// concatenate the stringification of sequence elements
foldl<Join>(
    1,  
    std::string(" bird in the hand, is worth "), 
    10, 
    std::string(" in the bush"));

// create a vector out of a sequence
auto k = foldl<Vettore>(1, 2, 30, 12);

// find the maximum element of a sequence
foldl<Max>(1, 20, 3, 5);

syntactic sugar

template <class F, class T>
XX<F, T> Op(T const& val)
{
    return XX<F, T>(val); 
}

// now this is possible
(Op<Max>(args) + ...);

5.2 Becoming move aware

obviously we don’t have to make a copy of our (expression) data every time we build such an expression node

The type O_x is winking the right (to our perspective) eye which according to the Taranki Herald guide to eye flirtation means “I love you”

template <class F, class T>
struct O_x
{
    T& mem;

    constexpr O_x(T& data) : mem(data) {}

    constexpr decltype(auto) give()
    {
        return mem;
    } 
    constexpr decltype(auto) clone()
    {
        return mem;
    } 
};

when using emoticons remember

Reserved in any scope, including for use as implementation macros:
- identifiers beginning with an underscore followed immediately by an uppercase letter
- identifiers containing adjacent underscores (or "double underscore")
Reserved in the global namespace: *identifiers beginning with an underscore
Also, everything in the std namespace is reserved. (You are allowed to add template specializations, though.)

left and right fold epxression node types should be discrete for the operators to be able to correctly build the expression tree.

template <class F, class T>
struct O_x<F, T&&>
{
    T mem;

    constexpr O_x(T&& data) : mem(std::move(data)) {}

    constexpr decltype(auto) give()
    {
        return (mem);
    } 
    constexpr decltype(auto) clone()
    {
        return mem;
    }
};

It goes without saying that we have to sprinkle `std::move(...)` and `std::forward(...)` all over the place

Forwarding references need to be used (evaluated, passed to another function, called) with std::forward to preserve their value category.

std::move is like playing catch, it doesn’t work if there’s nobody on the other side. No matter how “move aware” a library is, it’ll end up copying stuff when handling non-moveable types.

Moving from an object places it in a valid but unspecified state so be really thorough and explicit; even when creating dangling references or invalid state things may appear to work and that’s the worst kind of bug, the kind that bites when you’ve moved on.

6. lazy evaluation

6.1 The Haskell Case

in a nutshell :

expressions are not evaluated when they are bound to variables, but their evaluation is deferred until their results are needed by other computations.

Haskell programs are executed by evaluating expressions. The main idea behind evaluation is function application.

square x = x*x

square (1+2)

square (1+2)
=> (1+2)*(1+2)

(1+2)*(1+2)
=> 3*(1+2)
=> 3*3
=> 9

Unnecessary duplication is avoided by means of graph reduction. Every expression can be represented as a graph (the following representation resembles the way a compiler represents expressions with pointers in memory)

Every function defined by the programmer corresponds to a reduction rule

The square labelled x is a placeholder for a subgraph. Sharing a subgraph is the key to avoiding duplication

Any subgraph that matches a rule is called a reducible expression, or redex for short

Normal form is the state of an expression (graph)

that is finite
w/o redexes
w/o cycles

Constructors have no reduction rule hence give rise to normal forms

                               ... the list 1 : 2 : 3 : [ ]

We say that a graph is in WHNF (weak head normal form) if its topmost node is a constructor

this form is used to express infinite lists:

ones = 1 : ones

any graph that is not in WHNF is called an unevaluated expression

Order of evaluation determines its type

eager evaluation : evaluate function arguments to normal form before reducing the function application itself
lazy evaluation : try to reduce the topmost function application first

imperative languages use short circuiting to emulate lazy evaluation, but this is a hard coded language feature



In [3]:

    
%%ghc 
import Debug.Trace

and' :: Bool -> Bool -> Bool
and' x False = False
and' x True  = x

or' :: Bool -> Bool -> Bool
or' x False = x
or' x True  = True

even' :: (Integral a) => a -> Bool
even' 0 = True
even' 1 = False
even' x
    | x > 0 = even' (x - 2)
    | otherwise = even' (x + 2)

loop = not loop

even_dbg x = trace("call even for " ++ show x) $ even' x

main  = do
    print $ and' (loop) (even_dbg 11)
    print $ or'  (loop) (even_dbg 14)









    Out[3]:





['call even for 11', 'call even for 14', 'False', 'True']

The AST we get for a valid Haskell program is “evaluated” according to lazy evaluation rules, which most commonly manifest themselves in the following three ways:

arguments to functions are evaluated only when this is necessary for evaluation to continue.
an argument is not necessarily evaluated fully: only the parts that are needed are examined.
an argument is evaluated at most only once. This is done in the implementation by replacing expressions by graphs and calculating over them.
While lazy evaluation has many advantages, its main drawback is that memory usage becomes hard to predict.

6.2 The Python Case

Python is an interpreted language and solely code paths that are executed need be checked in any way so you have lazy evaluation right there

(as opposed to C++ not compiling due to type errors in unevaluated contexts)

... would be an example of how to miss the point on lazy evaluation

Python’s lazy evaluation is not built into expression evaluation, take the following example:



In [54]:

    
def f1(a, b):
    print('call')
    return a

def main():
    f1(f1(1, 2), f1(2, 3))

main()









    



call
call
call

So how can Python handle infinite lists, have lazy evaluation and perform calculation on demand ?



In [58]:

    
# the iterator protocol
class CustomRange:
  def __init__(self, max):
    self.max = max
  
  def __iter__(self):
    self.curr = 0
    return self

  def __next__(self): # this would be next in Python2
    numb = self.curr
    if self.curr >= self.max:
        raise StopIteration
    self.curr += 1
    return numb

for i in CustomRange(4):
  print(i)



In [59]:

    
# the generator pattern
def custom_range(max):
  a = 0
  while a < max:
    yield a
    a += 1

for i in CustomRange(4):
  print(i)

This pretty cool piece of code can provide “computation on demand” and has endless applications, from mundane ones to mind boggling. We'll be following this paradigm to implement lazy folds

6.3 The Curious case of C++

C++ is an imperative language and does not implement lazy evaluation

...

except for short circuiting

and template metaprogramming, that allows for things like

#include <utility>
#include <iostream>

namespace detail
{
    template <
        typename F,
        typename... Args,
        typename = decltype(std::declval<F&&>()(std::declval<Args&&>()...))
    >
    std::true_type is_valid_impl(int);

    template <typename ...Args>
    std::false_type is_valid_impl(...);

    template <typename F>
    struct is_valid_fun
    {
        template <typename ...Args>
        constexpr decltype(is_valid_impl<F, Args&&...>(int{})) 
        operator()(Args&&...) const
        {
            return{};
        }
    };
}

template <typename F>
constexpr detail::is_valid_fun<F&&> is_valid(F&&)
{
    return {};
}

struct example1 { void foo() {} };
struct example2 {               };

int main()
{
    auto has_foo = is_valid([](auto&& x) -> decltype(x.foo()) { }); 

    std::cout << "check whether a foo method is present\n"; 
    std::cout << "For struct example1 : " << has_foo(example1{}) << std::endl;
    std::cout << "For struct example2 : " << has_foo(example2{}) << std::endl;
}

#include <iostream>
#include <type_traits>
#include <utility>

namespace lut
{
    template <class... Fs>
    struct overload_set
    {
    };

    template <class F0, class... Fs>
    struct overload_set<F0, Fs...> : F0, overload_set<Fs...>
    {
        overload_set(F0 f0, Fs... fs)
            : F0(std::move(f0))
            ,

            overload_set<Fs...>(std::move(fs)...)
        {
        }

        using F0::operator();
        using overload_set<Fs...>::operator();
    };

    template <class F>
    struct overload_set<F> : F
    {
        overload_set(F f) : F(std::move(f)){};
        using F::operator();
    };

    template <class... Fs>
    overload_set<typename std::decay<Fs>::type...> overload(Fs&&... fs)
    {
        return {std::forward<Fs>(fs)...};
    }
} // ~ namespace lut

struct A
{
    bool f1() { return true; }
    bool f2() { return false; }
    int serialize(int) { return 1; }
};

template <typename T>
struct typi : T
{
};

int main()
{
    auto hs = lut::overload(
        [](auto&& x) -> decltype(x.serialize(2), std::true_type{})
        {
            return {};
        },
        [](...) -> std::false_type
        {
            return {};
        });

    typi<decltype(hs(A{}))> a;

    std::cout << a;
}

don't try this at home

template <int V>
struct A
{
    static constexpr bool value = !!V;
};

template <int V>
struct B
{ 
};

// compile time short circuiting won't work !!
template <class... Bs>
constexpr bool any_true = (... || Bs::value); 

template <class... Ts>
constexpr bool are_non_zero_vals = any_true<Ts...>;

int main()
{
    std::cout << are_non_zero_vals<A<1>, B<2> > << std::endl;

    return 0;
}

6.4 Flat expression templates

Lazy evaluation problems with expression templates :

the content is embedded in the form of the data structure
we need to be able to step through intermediate states easily
we have to express clearly the resulting types and the advancement of our computation

It’s a common technique in GPU programming to “flat out” data structures in order to make them more “stream-able”, so this is not a “virgin birth”

To illustrate what this variation of expression templates does take for example a right fold expression, say:

(a + (b + (c + d)))

what we’ll do is convert this

into this

We need a type to hold the expression in a reverse polish notation fashion and we choose to store the “nodes” in a tuple to be as “compile time friendly” as possible

template <class F, class... Ts>
struct O_Om
{
    std::tuple<Ts...> nodes;

    gut::member_result_t<F, decltype(std::get<1>(nodes).clone()),
                         decltype(std::get<0>(nodes).clone())> state;
    template <class A, class... Us>
    constexpr O_Om(A&& state, Us&&... args)
        : nodes{fw(args)...}, state{fw(state)}
    {
    }

    /* nested iterator type here */

    iterator begin() { return iterator(state, 0, nodes); }
    iterator end() { return iterator(state, sizeof...(Ts), nodes); }

    constexpr decltype(auto) yield()
    {
        for (std::size_t i(1); i < sizeof...(Ts); ++i)
        {
            state = vtu::call_with_tuple_element_first(
                F{}, nodes, i, state);
        }

        return state;
    }
};

The type O_Om holds the expression (in its m hand) and has both its eyes open (we’ll be using it for left folds as well). To highlight some of the code we note that:

state is the accumulator and has the type of applying the callable on the expression. If no accumulator is explicitly provided the rightmost argument becomes it. As in Haskell the accumulator consumes the expression from the right to left for right folds etc (so write your callables accordingly).
A nested iterator type will keep track of the nodes that have been computed
yield returns the result of the expression right away. It can be evaluated at compile time and it’s a good example of a constexpr member function not being const since it alters the state and wouldn’t be accepted by C++11 compilers that implicitly made constexpr member functions const. Luckily this is not the case in C++14 and beyond so we write code like this.
vtu::call_with_tuple_element_... is a mechanism to visit a tuple with a “runtime” index (quoted because inside yield the i is evaluated at compile time)

6.5 Lazy folds

The last piece of the puzzle is the expression iterator, which will be a nested member of O_Om

class iterator
{ // ... Publicly define the standard typedefs 
    reference _state;
    std::size_t _pos;
    std::tuple<Ts...>& _nodes;

public:
    /*
    ... construction
    ... copy construction
    ... copy assignment operator
    */

    // I'll be using std::rel_ops so I get to be minimal on comparisons
    bool operator==(const iterator& other) const { return _pos == other._pos; }
    bool operator<(const iterator& other) const { return _pos < other._pos; }

    iterator& operator++() {
        ++_pos;
        _state = vtu::call_with_tuple_element_first(F{}, _nodes,
                                _pos, _state);
        return *this;
    }

    reference operator*() const { return _state; }
    pointer operator->() const { return &_state; }
};

Having this in place we can write code like this

int main()
{
    std::string acc;
    auto k = fld::foldr<JoinAc>(
        std::string(" < in the bush >"),
        10, 
        std::string(" < bird in the hand, is worth > "), 
        1, 
        acc);

    using namespace std::rel_ops; 
    for (auto&& state : k)
    {
        std::cout << "computed value so far : " << state << std::endl; 
    }

    constexpr auto mm = fld::foldr<Max>(11, 2, 5, 2, 4).yield(); 
    std::cout << "\nmax is " << mm << std::endl;
}

Outputs ['1', '1 < bird in the hand is worth > ', '1 < bird in the hand is worth > 10', '1 < bird in the hand is worth > 10 < in the bush >']

The Haskell counterpart are the scanl and scanr functions, which report all the intermediate accumulator states in the form of a list



In [3]:

    
%%ghc

main = do
    print $ scanl (+) 0 [3,5,2,1]  
    print $ scanr (+) 0 [3,5,2,1]









    Out[3]:





['[0,3,8,10,11]', '[11,8,3,1,0]']

7. using transducers and arbitrary fold expressions

7.1 The instantiation complexity problem

generating types to stress test the mechanism is easy

struct Max
{
    template <class T1, class T2>
    constexpr auto operator()(T1 lhs, T2 rhs)
    {
        ++use_count;
        return lhs > rhs ? lhs : rhs;
    }
};

template <size_t... Is>
void stress_test(std::index_sequence<Is...>&&)
{
    std::cout << "max of sequence = " << fld::foldr(Max{}, Is...).yield()
              << std::endl;
}

int main()
{
    stress_test(std::make_index_sequence<17>{});
}

and now 218'350 characters of error message ...

In file included from fold_examples.cpp:1:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/array:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/stdexcept:39:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/string:40:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/bits/char_traits.h:39:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/bits/stl_algobase.h:64:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/bits/stl_pair.h:59:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/bits/move.h:57:
/usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/type_traits:545:14: fatal error: recursive template instantiation exceeded maximum depth of 256
    : public __or_<is_lvalue_reference<_Tp>,
             ^
/usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/type_traits:115:26: note: in instantiation of template class 'std::is_reference<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &>' requested here
    : public conditional<_B1::value, _B1, _B2>::type
                         ^
/usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/type_traits:120:14: note: in instantiation of template class 'std::__or_<std::is_reference<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &>, std::is_void<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &> >' requested here
    : public conditional<_B1::value, _B1, __or_<_B2, _B3, _Bn...>>::type
             ^
/usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/type_traits:148:39: note: in instantiation of template class 'std::__or_<std::is_function<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &>, std::is_reference<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &>, std::is_void<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &> >' requested here
    : public integral_constant<bool, !_Pp::value>
                                      ^
/usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/type_traits:565:14: note: in instantiation of template class 'std::__not_<std::__or_<std::is_function<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &>, std::is_reference<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &>, std::is_void<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &> > >' requested here
    : public __not_<__or_<is_function<_Tp>, is_reference<_Tp>,
             ^
/usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/type_traits:115:26: note: in instantiation of template class 'std::is_object<std::_Tuple_impl<16, fld::detail::O_x<unsigned long &&> > &>' requested here
    : public conditional<_B1::value, _B1, _B2>::type
                         ^
/usr/lib/gcc/x86_64-linux-gnu/5.3.0/../../../../include/c++/5.3.0/type_traits:602:14: note: (skipping 247 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
    : public __or_<is_object<_Tp>, is_reference<_Tp>>::type
             ^
./any_fold_lazy.h:350:11: note: in instantiation of function template specialization 'fld::detail::makeO_Om<fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&> , fld::detail::O_x<unsigned long &&>, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15>' requested here
                        return makeO_Om(std::move(lhs), std::move(acc),
                               ^
./any_fold_lazy.h:357:23: note: in instantiation of function template specialization 'fld::detail::operator+<unsigned long &&, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&> >' requested here
                        return (fw(args) + ...);
                                           ^
./any_fold_lazy.h:373:24: note: in instantiation of function template specialization 'fld::detail::_foldr<fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&>, fld::detail::O_x<unsigned long &&> , 0>' requested here
                    fw(fun), detail::_foldr(detail::makeO_x(fw(args))...));
                                     ^
fold_examples.cpp:138:44: note: in instantiation of function template specialization 'fld::foldr<Max, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, 0>' requested here
        std::cout << "max of sequence = " << fld::foldr(Max{}, Is...).yield()
                                                  ^
fold_examples.cpp:165:2: note: in instantiation of function template specialization 'foo<0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16>' requested here
        foo(std::make_index_sequence<17>{});
        ^
1 error generated.

The problem is not make_tuple, but the move constructor of tuple
The move constructor has a conditional noexcept that is implemented recursively. Therefore, for each template argument, a constant number of additional instantiations is required
The implementation e.g. of is_nothrow_move_constructible in terms of is_nothrow_constructible which is implemented in terms of __is_nt_constructible and so on, for 15 instantiation levels
This means that each template argument for tuple requires 15 additional instantiation levels for this check. On top of that, 9 levels are always required (constant depth)
Therefore, 17 arguments require an instantiation depth of 17*15+9 == 264.

so this should fail as well (note I'm using clang 3.7)

#include <iostream>
#include <type_traits>
#include <tuple>

template <typename... Arguments>
struct Testing
{
    std::tuple<Arguments...> t;
    Testing(Arguments... args) : t{args...} {}
};

template <typename... Arguments>
Testing<Arguments...> create(Arguments... args)
{
    return Testing<Arguments...>(args...);
}

template <std::size_t... Is>
void test(std::integer_sequence<size_t, Is...>&&)
{
    auto t = create(std::integral_constant<int, Is>{}...);
}

int main()
{
    test(std::make_integer_sequence<size_t, 17>{});
}

Solution 1 : Increase instantiation depth

this will take its toll sooner or later
the default instantiation depth for clang (256) is way low for C++11 (1024 is recommended)
Visual C++ compiler has a default (non tweakable) instantiation depth value of 2048

Solution 2 : flatten instantiations

// predicate logic
template <bool... Ei>
constexpr bool any_true = (... || Ei); 

template <bool... Ei>
constexpr bool all_true = (... && Ei); 

template <bool... Ei>
constexpr bool all_false = !any_true<Ei...>;

template <bool... Ei>
constexpr bool any_false = !all_true<Ei...>;

// variadic type traits
template <class... Ts>
constexpr bool are_move_no_throw = std::integral_constant<
    bool, all_true<std::is_nothrow_move_constructible<Ts>::value...>>::value;

// how a "tuple" could avoid diving too deep into instantiations
template <class... Ts>
struct Tuple
{
    std::tuple<Ts...> data;

    constexpr Tuple() = default;

    template <class... Args>
    constexpr Tuple(Args&&... args)
        : data{args...}
    {
    }

    constexpr Tuple(Tuple const&) = default;

    constexpr Tuple(Tuple&& other) noexcept(are_move_no_throw<Ts...>)
    //                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        : Tuple(std::move(other), std::index_sequence_for<Ts...>{})
    {
    }

private:
    template <std::size_t... I>
    constexpr Tuple(Tuple&& other, std::index_sequence<I...>&&)
        : data{std::get<I>(std::move(other).data)...}
    {
    }
};

// just to provide an interface usable for testing
template <std::size_t I, class T>
constexpr decltype(auto) Get(T&& tuple)
{
    return std::get<I>(std::forward<T>(tuple).data);
}

this implementation stops at

test(std::make_integer_sequence<size_t, 238>{});

due to type traits instantiations for a tuple of 238 `integral_constant`s

quite the improvement considering we only changed the logic in one type trait and maintained the instantiation depth

7.2 The reducer as an argument (no more sugar) - reduction as a type

a reducer should be aware of the type of fold it'll be used for

auto map = [](auto fn)
{
    return [=](auto step)
    {
        return [=](auto in, auto&& s)
        { //       ^^^^^^^^^^^^^^^^^
            return step(fn(in), s);
        };
    };
};

auto filter = [](auto pred)
{
    return [=](auto step)
    {
        return [=](auto in, auto s)
        { //       ^^^^^^^^^^^^^^^
            return pred(in) ? step(in, s) : s;
        };
    };
};

The output function is not affected
Define an example (unary) predicate and mutator

auto output_rf = [](auto input, auto out)
{
    *out++ = input;
    return out;
};

auto pred = [](int k)
{
    return k % 2;
};

auto fn = [](int k)
{
    return 10 * k;
};

Putting it all together, or transform_if revisited

int main()  
{
    std::vector<int> vp;

    // remember the pipeline goes left to right
    fld::foldr(filter(pred)(map(fn)(output_rf)), 1, 2, 3, 4, 5,
               std::back_inserter(vp))
        .yield();

    using namespace std::rel_ops;
    for (auto&& elem : vp)
        std::cout << elem << std::endl;
}

mapcating over different types

int main()  
{
    std::list<int>     l1{  10,  20,  30}; 
    std::vector<int>   v1{   1,   2,   3}; 
    std::array<int, 3> a1{{100, 200, 300}}; 

    auto output_vf = [](auto input, auto out) 
    {
        for (auto&& elem : input) *out++ = elem;
        return out;
    };

    fld::foldr(output_vf, a1, l1, v1, std::back_inserter(vp)).yield();

    using namespace std::rel_ops;
    for (auto&& elem : vp)
        std::cout << elem << std::endl;
}

Output ['1, 2, 3, 10, 20, 30, 100, 200, 300']

7.3 Corner cases

folds of >1 arguments : the rightmost (or leftmost) argument becomes the accumulator

folds of 1 argument : the argument is the accumulator and returned as is

folds without arguments ...

we cannot predict identity elements for every possible { type , operator } pair
we should be able to pick an algebra and use it

define your reducers like this

struct Max
{
    template <class T1, class T2>
    constexpr auto operator()(T1 lhs, T2 rhs)
    {
        return lhs > rhs ? lhs : rhs;
    }

    constexpr auto operator()()
    { // ^^ this defines an identity element
        return std::numeric_limits<int>::min(); 
    }
};

8. reduce in a parallel & heterogeneous context

Recap

Reduce is universal
Transducers are awesome
Composition is aesthetically pleasing
C++1z will have folds!
boost.reduce gives you folds for any callable
lazy evaluation is a complex matter