How to Unroll

There are two general ways to implement loop unrolling in Julia—recursively splatting iterator contents and manually constructing unrolled expressions. For example, a recursively unrolled version of the foreach function is

unrolled_foreach(f, itr) = _unrolled_foreach(f, itr...)
_unrolled_foreach(f) = nothing
_unrolled_foreach(f, item, items...) = (f(item); _unrolled_foreach(f, items...))

In contrast, a manually unrolled implementation of this function looks like

unrolled_foreach(f, itr) = _unrolled_foreach(Val(length(itr)), f, itr)
@generated _unrolled_foreach(::Val{N}, f, itr) where {N} =
    Expr(:block, (:(f(generic_getindex(itr, $n))) for n in 1:N)..., nothing)

Julia's compiler can only pass up to 32 values through function arguments without allocating heap memory, so recursive unrolling is not type-stable for iterators with lengths greater than 32. However, automatically generating functions often requires more time and memory resources during compilation than writing hard-coded functions. Recursive inlining adds overhead to compilation as well, but this is typically smaller than the overhead of generated functions for short iterators. To avoid sacrificing latency by using generated functions, several hard-coded methods can be added to the manually unrolled implementation:

_unrolled_foreach(::Val{0}, f, itr) = nothing
_unrolled_foreach(::Val{1}, f, itr) = (f(generic_getindex(itr, 1)); nothing)
_unrolled_foreach(::Val{2}, f, itr) =
    (f(generic_getindex(itr, 1)); f(generic_getindex(itr, 2)); nothing)
_unrolled_foreach(::Val{3}, f, itr) =
    (f(generic_getindex(itr, 1)); f(generic_getindex(itr, 2)); f(generic_getindex(itr, 3)); nothing)

With this modification, manual unrolling does not exceed the compilation requirements of recursive unrolling across a wide range of use cases. Since it also avoids type instabilities for arbitrarily large iterators, a combination of hard-coded and generated functions with manual unrolling serves as the basis of all unrolled functions defined in this package. Similarly, the ntuple(f, ::Val{N}) function in Base uses this strategy to implement loop unrolling.

For benchmarks that compare these two implementations, see Manual vs. Recursive Unrolling.

Interface API

The functions exported by this package can be used with any statically sized iterators, as long as those iterators make appropriate use of the following interface:

UnrolledUtilities.generic_getindexFunction
generic_getindex(itr, n)

Identical to getindex(itr, n), but with the added ability to handle lazy iterator types defined in the standard library, such as Base.Generator and Iterators.Enumerate.

source
UnrolledUtilities.output_type_for_promotionFunction
output_type_for_promotion(itr)

The type of output that unrolled functions should try to generate for the input iterator itr, or a ConditionalOutputType if the output type depends on the type of items that need to be stored in it, or NoOutputType() if itr is a lazy iterator without any associated output type. Defaults to Tuple.

source
UnrolledUtilities.ConditionalOutputTypeType
ConditionalOutputType(allowed_item_type, output_type, [fallback_type])

An AmbiguousOutputType that can have one of two possible values. If the first item in the output is a subtype of allowed_item_type, the output will have the type output_type; otherwise, it will have the type fallback_type, which is set to Tuple by default.

source
UnrolledUtilities.output_promote_ruleFunction
output_promote_rule(output_type1, output_type2)

The type of output that should be generated when two iterators do not have the same output_type_for_promotion, or Union{} if these iterators should not be used together. Only one method of output_promote_rule needs to be defined for any pair of output types.

By default, all types take precedence over NoOutputType(), and the conditional part of any ConditionalOutputType takes precedence over an unconditional type (so that only the fallback_type of any conditional type gets promoted). The default result for all other pairs of unequal output types is Union{}.

source
UnrolledUtilities.constructor_from_tupleFunction
constructor_from_tuple(output_type)

A function that can be used to efficiently construct an output of type output_type from a Tuple, or identity if such an output should not be constructed from a Tuple. Defaults to identity, which also handles the case where output_type is already Tuple. The output_type here is guaranteed to be a Type, rather than a ConditionalOutputType or NoOutputType.

Many statically sized iterators (e.g., SVectors) are essentially wrappers for Tuples, and their constructors for Tuples can be reduced to no-ops. The main exceptions are StaticOneTos and StaticBitVectors, which do not provide constructors for Tuples because there is no performance benefit to making a lazy or low-storage data structure once a corresponding high-storage data structure has already been constructed.

source
UnrolledUtilities.empty_outputFunction
empty_output(output_type)

An empty output of type output_type. Defaults to applying the constructor_from_tuple for the given type to an empty Tuple.

source
UnrolledUtilities.StaticSequenceType
StaticSequence{N}

Abstract type that can represent any iterable of constant length N. Subtypes include the low-storage data structures StaticOneTo and StaticBitVector, which have optimized methods for certain unrolled functions.

source

How to Use the Interface

To unroll over a statically sized iterator of some user-defined type T, follow these steps:

  • Add a method for getindex(::T, n), or for generic_getindex(::T, n) if getindex should not be defined for iterators of type T
  • If every unrolled function that needs to construct an iterator when given an iterator of type T can return a Tuple instead, stop here
  • Otherwise, to return a non-Tuple iterator whenever it is efficient to do so, follow these steps:
    • Add a method for output_type_for_promotion(::T) = O, where O can be T, a supertype of T, some other Type, or an AmbiguousOutputType
    • If an iterator whose output type is O can be used together with an iterator whose output type is O′, add a method for output_promote_rule(O, O′)
    • If O is a NoOutputType, stop here
    • Otherwise, to handle the unambiguous output type U that underlies O (where U is equivalent to O unless O is a ConditionalOutputType), follow these steps:
      • If an iterator of type U can be efficiently constructed from a Tuple, add a method for constructor_from_tuple(U)
      • Otherwise, for each of the following functions, add a method if it can be implemented to construct an iterator of type U without first storing the iterator's contents in a Tuple:
        • empty_output(U)
        • unrolled_push_into(U, itr, item)
        • unrolled_append_into(U, itr1, itr2)
        • unrolled_take_into(U, itr, val_N)
        • unrolled_drop_into(U, itr, val_N)
        • unrolled_map_into(U, f, itr)
        • unrolled_accumulate_into(U, op, itr, init, transform)
Note

When a relevant method for the interface is not defined, unrolled functions will typically fall back to using Tuples instead of other iterator types.