Quick start

This example demonstrates how LazyBroadcast.jl can significantly improve the performance of Julia code.

Let's compute the sum of the elements in a vector z, derived from another vector x in two steps: y = x .+ x and z = 2 .* y.

The simplest code that accomplishes this task is:

y = x .+ x
z = 2 .* y
sum(z)

Let us use BenchmarkTools to benchmark this code:

using BenchmarkTools  # For accurate benchmarking

function foo(x)
   y = x .+ x
   z = 2 .* y
   sum(z)
end;

print(@btime foo(v) setup=(v=rand(10)))

  50.499 ns (4 allocations: 288 bytes)
18.92523138352986

BenchmarkTools identifies that are heap allocations allocations, which are know to severely impacting performance.

LazyBroadcast.jl provides a simple way to remove these allocations. To do so, we just need to add lazy_broadcast to the broadcasted operations (operations with a dot .):

using LazyBroadcast: lazy_broadcast

function foo_lazy(x)
   # use lazy_broadcast to avoid intermediate allocations
   y = lazy_broadcast.(x .+ x)
   z = lazy_broadcast.(2 .* y)
   sum(z)
end;

print(@btime foo_lazy(v) setup=(v=rand(10)))

  10.188 ns (0 allocations: 0 bytes)
17.07102344012761

As we can see, BenchmarkTools now reports 0 bytes allocated and a significant reduction in the overall runtime (on my computer, the benchmarks take 43.433 ns and 5.917 ns, respectively, a 7x speedup!).

What happened here is that y and z are no longer Arrays, but Broadcasted objects, which are unevaluated representations of expressions. Then, the function sum implementation efficiently evaluates the Broadcasted expression, removing the need for any intermediate allocations.

Now that you know what to expect from lazy_broadcast, jump to the Usage section to read more about how to work with LazyBroadcast and Broadcasted objects.

If you're interested in the implementation, check out the Internals of how lazy_broadcast works section.