How to Unroll
There are two general ways to implement loop unrolling in Julia—recursively splatting iterator contents and manually constructing unrolled expressions. For example, a recursively unrolled version of the foreach
function is
unrolled_foreach(f, itr) = _unrolled_foreach(f, itr...)
_unrolled_foreach(f) = nothing
_unrolled_foreach(f, item, items...) = (f(item); _unrolled_foreach(f, items...))
In contrast, a manually unrolled implementation of this function looks like
unrolled_foreach(f, itr) = _unrolled_foreach(Val(length(itr)), f, itr)
@generated _unrolled_foreach(::Val{N}, f, itr) where {N} =
Expr(:block, (:(f(generic_getindex(itr, $n))) for n in 1:N)..., nothing)
Julia's compiler can only pass up to 32 values through function arguments without allocating heap memory, so recursive unrolling is not type-stable for iterators with lengths greater than 32. However, automatically generating functions often requires more time and memory resources during compilation than writing hard-coded functions. Recursive inlining adds overhead to compilation as well, but this is typically smaller than the overhead of generated functions for short iterators. To avoid sacrificing latency by using generated functions, several hard-coded methods can be added to the manually unrolled implementation:
_unrolled_foreach(::Val{0}, f, itr) = nothing
_unrolled_foreach(::Val{1}, f, itr) = (f(generic_getindex(itr, 1)); nothing)
_unrolled_foreach(::Val{2}, f, itr) =
(f(generic_getindex(itr, 1)); f(generic_getindex(itr, 2)); nothing)
_unrolled_foreach(::Val{3}, f, itr) =
(f(generic_getindex(itr, 1)); f(generic_getindex(itr, 2)); f(generic_getindex(itr, 3)); nothing)
With this modification, manual unrolling does not exceed the compilation requirements of recursive unrolling across a wide range of use cases. Since it also avoids type instabilities for arbitrarily large iterators, a combination of hard-coded and generated functions with manual unrolling serves as the basis of all unrolled functions defined in this package. Similarly, the ntuple(f, ::Val{N})
function in Base
uses this strategy to implement loop unrolling.
For benchmarks that compare these two implementations, see Manual vs. Recursive Unrolling.
Interface API
The functions exported by this package can be used with any statically sized iterators, as long as those iterators make appropriate use of the following interface:
UnrolledUtilities.generic_getindex
— Functiongeneric_getindex(itr, n)
Identical to getindex(itr, n)
, but with the added ability to handle lazy iterator types defined in the standard library, such as Base.Generator
and Iterators.Enumerate
.
UnrolledUtilities.output_type_for_promotion
— Functionoutput_type_for_promotion(itr)
The type of output that unrolled functions should try to generate for the input iterator itr
, or a ConditionalOutputType
if the output type depends on the type of items that need to be stored in it, or NoOutputType()
if itr
is a lazy iterator without any associated output type. Defaults to Tuple
.
UnrolledUtilities.AmbiguousOutputType
— TypeAmbiguousOutputType
The result of output_type_for_promotion
for iterators that do not have well-defined output types.
UnrolledUtilities.NoOutputType
— TypeNoOutputType()
The AmbiguousOutputType
of lazy iterators.
UnrolledUtilities.ConditionalOutputType
— TypeConditionalOutputType(allowed_item_type, output_type, [fallback_type])
An AmbiguousOutputType
that can have one of two possible values. If the first item in the output is a subtype of allowed_item_type
, the output will have the type output_type
; otherwise, it will have the type fallback_type
, which is set to Tuple
by default.
UnrolledUtilities.output_promote_rule
— Functionoutput_promote_rule(output_type1, output_type2)
The type of output that should be generated when two iterators do not have the same output_type_for_promotion
, or Union{}
if these iterators should not be used together. Only one method of output_promote_rule
needs to be defined for any pair of output types.
By default, all types take precedence over NoOutputType()
, and the conditional part of any ConditionalOutputType
takes precedence over an unconditional type (so that only the fallback_type
of any conditional type gets promoted). The default result for all other pairs of unequal output types is Union{}
.
UnrolledUtilities.constructor_from_tuple
— Functionconstructor_from_tuple(output_type)
A function that can be used to efficiently construct an output of type output_type
from a Tuple
, or identity
if such an output should not be constructed from a Tuple
. Defaults to identity
, which also handles the case where output_type
is already Tuple
. The output_type
here is guaranteed to be a Type
, rather than a ConditionalOutputType
or NoOutputType
.
Many statically sized iterators (e.g., SVector
s) are essentially wrappers for Tuple
s, and their constructors for Tuple
s can be reduced to no-ops. The main exceptions are StaticOneTo
s and StaticBitVector
s, which do not provide constructors for Tuple
s because there is no performance benefit to making a lazy or low-storage data structure once a corresponding high-storage data structure has already been constructed.
UnrolledUtilities.empty_output
— Functionempty_output(output_type)
An empty output of type output_type
. Defaults to applying the constructor_from_tuple
for the given type to an empty Tuple
.
UnrolledUtilities.StaticSequence
— TypeStaticSequence{N}
Abstract type that can represent any iterable of constant length N
. Subtypes include the low-storage data structures StaticOneTo
and StaticBitVector
, which have optimized methods for certain unrolled functions.
How to Use the Interface
To unroll over a statically sized iterator of some user-defined type T
, follow these steps:
- Add a method for
getindex(::T, n)
, or forgeneric_getindex(::T, n)
ifgetindex
should not be defined for iterators of typeT
- If every unrolled function that needs to construct an iterator when given an iterator of type
T
can return aTuple
instead, stop here - Otherwise, to return a non-
Tuple
iterator whenever it is efficient to do so, follow these steps:- Add a method for
output_type_for_promotion(::T) = O
, whereO
can beT
, a supertype ofT
, some otherType
, or anAmbiguousOutputType
- If an iterator whose output type is
O
can be used together with an iterator whose output type isO′
, add a method foroutput_promote_rule(O, O′)
- If
O
is aNoOutputType
, stop here - Otherwise, to handle the unambiguous output type
U
that underliesO
(whereU
is equivalent toO
unlessO
is aConditionalOutputType
), follow these steps:- If an iterator of type
U
can be efficiently constructed from aTuple
, add a method forconstructor_from_tuple(U)
- Otherwise, for each of the following functions, add a method if it can be implemented to construct an iterator of type
U
without first storing the iterator's contents in aTuple
:empty_output(U)
unrolled_push_into(U, itr, item)
unrolled_append_into(U, itr1, itr2)
unrolled_take_into(U, itr, val_N)
unrolled_drop_into(U, itr, val_N)
unrolled_map_into(U, f, itr)
unrolled_accumulate_into(U, op, itr, init, transform)
- If an iterator of type
- Add a method for
When a relevant method for the interface is not defined, unrolled functions will typically fall back to using Tuple
s instead of other iterator types.