Syntax reference for the new Allo Python frontend.
This document describes the new Allo frontend syntax. The implementation is
currently staged under allo.exp, but this document uses the intended top-level
API:
import allo
from allo import f32, i32, kernelThe Python frontend is a restricted Python-embedded DSL (eDSL). It uses Python syntax for readability, but only the constructs described here are part of the kernel language.
Allo kernels are Python functions decorated with @kernel. Every parameter must
have a type annotation.
from allo import f32, i32, kernel
@kernel
def saxpy(a: f32, x: "f32[16]", y: "f32[16]", out: "f32[16]"):
for i in range(16):
out[i] = a * x[i] + y[i]Scalar annotations use type objects directly. Shaped annotations use strings:
@kernel
def scalar_add(x: i32, y: i32) -> i32:
return x + y
@kernel
def vector_add(x: "i32[16]", y: "i32[16]") -> "i32[16]":
out: "i32[16]" = 0
for i in range(16):
out[i] = x[i] + y[i]
return outFunctions with no return value can omit the return annotation or use -> None.
Returning a value requires an explicit return annotation.
@kernel
def fill(out: "i32[4]"):
for i in range(4):
out[i] = i
@kernel
def no_result(out: "i32[4]") -> None:
returnMultiple return values are written as tuple annotations.
@kernel
def split_pair(x: i32, y: f32) -> (i32, f32):
return x + 1, y + 1.0
@kernel
def caller(x: i32, y: f32, out: "f32[1]"):
lhs, rhs = split_pair(x, y)
out[0] = rhs + lhsReturn placement is intentionally restricted. A return may appear at the top
level of the kernel body or in a first-level if/else branch. Returns inside
loops and nested if statements are rejected.
@kernel
def choose(cond: allo.bool, x: i32, y: i32) -> i32:
if cond:
return x
return yKernels can define nested kernels as local helper kernels. A nested kernel must
be declared at the top level of the enclosing kernel body, must use exactly one
@kernel decorator, and can be called like any other kernel.
@kernel
def outer(x: i32, out: "i32[1]"):
@kernel
def add_one(v: i32) -> i32:
return v + 1
out[0] = add_one(x)Nested kernel definitions are not allowed inside if, for, grid, or
while bodies. Recursive kernel calls are also rejected, including indirect
recursion across multiple top-level or nested kernels.
The scalar types are:
| Category | Types |
|---|---|
| Signed integers | i2 through i16, plus i32, i64, i128, i256 |
| Unsigned integers | u1 through u16, plus u32, u64, u128, u256 |
| Floating point | f16, f32, f64, bf16 |
| Special | index, bool, constexpr |
bool is an alias for u1. index is the preferred type for loop indices and
values used as dynamic indices.
Use apint(width, signed=False) to define custom integer widths beyond the
predefined aliases. Unsigned custom integers are the default; pass signed=True
for signed integers.
from allo import apint, kernel
u17 = apint(17)
i23 = apint(23, signed=True)
@kernel
def custom_width(x: u17, y: i23, out: "u17[1]"):
out[0] = x + yShaped values are written as "dtype[shape]". A rank-0 shaped value uses an
empty shape list.
@kernel
def shapes(a: "f32[8]", b: "i32[4, 4]", acc: "f32[]"):
acc[()] = a[0] + b[0, 0]Shape expressions are compile-time integer expressions. They may use integer
literals, visible constants, template parameters, unary +/-, and the binary
operators +, -, *, and //.
M = 4
N = 8
@kernel
def reshape_like(inp: "i32[M * N]", out: "i32[M, N]"):
for i, j in allo.grid(M, N):
out[i, j] = inp[i * N + j]By default, shaped annotations describe mutable buffers. With
KernelOptions(enable_tensor=True), the same annotation syntax describes MLIR
tensors.
from allo import KernelOptions
@kernel(options=KernelOptions(enable_tensor=True))
def tensor_add(x: "f32[4]", y: "f32[4]") -> "f32[4]":
return x + yAnnotated assignments declare variables.
@kernel
def declarations(x: i32, out: "i32[4]"):
base: i32 = x
tmp: "i32[4]" = 0
for i in range(4):
tmp[i] = base + i
out[i] = tmp[i]Shaped locals may be declared without an initializer. This allocates a local buffer in the default mode or an empty tensor in tensor mode.
@kernel
def local_buffer(out: "i32[4]"):
buf: "i32[4]"
for i in range(4):
buf[i] = i
out[i] = buf[i]Scalar variables must be initialized when declared. A runtime local can also be introduced by assigning an existing runtime value.
@kernel
def inferred_local(cond: allo.bool, x: i32, y: i32, out: "i32[1]"):
v = x
if cond:
v = y
else:
v = x + y
out[0] = vCompile-time variables must be declared with constexpr. They are evaluated
during compilation and cannot be reassigned.
from allo import constexpr
@kernel
def constexpr_bound(out: "i32[4]"):
N: constexpr = 4
for i in range(N):
out[i] = iList initializers are supported for shaped values when every element is a
compile-time int or float. The list shape must exactly match the annotation.
@kernel
def constants(out: "i32[2, 2]"):
scale: constexpr = 3
table: "i32[2, 2]" = [[1, scale], [scale + 1, scale + 2]]
for i, j in allo.grid(2, 2):
out[i, j] = table[i, j]Allo uses block scope. Variables declared inside an if, for, grid, or
while body are local to that block. Declare a variable before the block if it
must be used afterward.
@kernel
def scoped(cond: allo.bool, x: i32, out: "i32[1]"):
value: i32 = 0
if cond:
value = x
else:
value = x + 1
out[0] = valueA name cannot be redeclared in the same scope. Later assignments are cast back to the variable's original type.
Nested kernels follow the same scoping model, but their captures are deliberately
limited. They may capture compile-time symbols from the enclosing scope:
constexpr values, concrete types, type aliases, other kernels, consteval
functions, Allo operators, and modules. They may not capture runtime values such
as enclosing kernel parameters, local scalar variables, loop indices, or buffers.
Pass runtime values explicitly as nested-kernel arguments.
@kernel
def captures(x: i32, out: "i32[1]"):
offset: constexpr = 2
T: constexpr = i32
@kernel
def add_offset(v: T) -> T:
return v + offset
out[0] = add_offset(x)Both Python range and allo.range are supported in kernels. They accept the
same one-, two-, or three-argument forms.
from allo import range as allo_range
@kernel
def ranges(out: "i32[20]"):
for i in range(10):
out[i] = i
for i in range(10, 20):
out[i] = i
for i in allo_range(0, 20, 2):
out[i] = i * 2Loop bounds may depend on runtime values. Loop steps must be positive if they are not constexpr.
@kernel
def variable_bounds(a: "i32[10]", out: "i32[10]"):
for i in range(10):
for j in range(a[i], 10, a[i]):
out[j] += iallo.grid is a shorthand for a multidimensional parallel loop. It requires at
least two dimensions, and the loop target must be a tuple with the same number of
variables.
@kernel
def matmul(a: "f32[32, 32]", b: "f32[32, 32]") -> "f32[32, 32]":
c: "f32[32, 32]" = 0.0
for i, j in allo.grid(32, 32):
for k in range(32):
c[i, j] += a[i, k] * b[k, j]
return cGrid dimensions may also be written as (start, stop) or
(start, stop, step) tuples.
@kernel
def strided_grid(out: "i32[8, 8]"):
for i, j in allo.grid((0, 8, 2), (1, 8, 2)):
out[i, j] = i + jAt the moment, grid does not support non-trivial loop-carried scalar
dependencies. Use nested range loops when the loop body needs to update a
scalar accumulator across iterations.
while loops are supported for runtime conditions. A while loop may update
loop-carried scalar values.
@kernel
def count(out: "i32[1]"):
i: i32 = 0
acc: i32 = 0
while i < 4:
acc += i
i += 1
out[0] = accbreak, continue, for ... else, and while ... else are not supported.
Runtime if/elif/else statements lower to structured control flow. Variables declared
outside the conditional can be assigned in either branch and used afterward.
@kernel
def classify(x: i32, y: i32) -> i32:
result: i32 = 0
if x == 0:
result = 1
elif y > x:
result = 2
else:
result = 3
return resultConditions may use comparison operators, and, or, and not.
@kernel
def logic(a: "i32[3]", b: i32) -> i32:
out: i32 = 0
if a[0] > 0 and b < 0:
out = 1
elif a[1] <= 1 or not (a[2] == 3):
out = 2
return outTernary expressions lower to a select operation when the condition is runtime. At least one branch must be a runtime value so the result type can be inferred.
@kernel
def select(cond: allo.bool, x: i32, y: i32) -> i32:
return x if cond else yIf a condition is a constexpr, the frontend evaluates the condition during
compilation and only emits the selected branch.
The frontend supports the following Python operators.
| Category | Operators |
|---|---|
| Arithmetic | +, -, *, /, //, %, ** |
| Unary | +x, -x, ~x, not x |
| Comparison | ==, !=, <, <=, >, >= |
| Boolean | and, or |
| Bitwise | &, |, ^, <<, >> |
| Assignment | =, +=, -=, *=, /=, //=, %=, **=, &=, |=, ^=, <<=, >>= |
Multi-way comparisons such as a < b < c are not supported; write them with
and.
@kernel
def comparisons(a: i32, b: i32, c: i32) -> allo.bool:
return a < b and b < cThe default typing_style is "hls", which uses HLS-oriented integer
promotion. For example, an addition may widen internally and then cast back to
the destination type. KernelOptions(typing_style="cpp") selects C++-style
promotion rules.
@kernel(options=KernelOptions(typing_style="cpp"))
def cpp_style(x: allo.u32, y: i32, out: "u32[1]"):
out[0] = x + ymin and max are supported as built-ins and lower to Allo arithmetic
operators.
@kernel
def clamp(x: i32, lo: i32, hi: i32) -> i32:
return min(max(x, lo), hi)Only Allo kernels, Allo operators, and consteval functions may be called from
inside a kernel. The static built-ins print and len are evaluated during
compilation when their arguments are compile-time values.
Shaped values use tuple-style indexing. The number of indices must match the rank.
@kernel
def copy_2d(src: "f32[4, 4]", dst: "f32[4, 4]"):
for i, j in allo.grid(4, 4):
dst[i, j] = src[i, j]Rank-0 shaped values are indexed with ().
@kernel(options=allo.KernelOptions(enable_tensor=True))
def dot_scalar(a: "f32[4]", b: "f32[4]") -> f32:
return allo.linalg.dot(a, b)[()]Integer scalar values support single-bit extraction and insertion with subscript syntax.
@kernel
def bit_ops(x: allo.u32, out: "u1[1]"):
out[0] = x[0]Python slice indices such as A[0:4], partial subviews such as A[i] for a
rank-2 buffer, and bit ranges such as x[0:4] are not part of the current new
frontend.
Python operators cover scalar arithmetic and shaped elementwise expressions. Explicit operator calls are useful when an operation needs an output accumulator.
@kernel
def memref_elementwise(x: "f32[4]", y: "f32[4]", out: "f32[4]"):
allo.arith.add(x, y, acc=out)Math operators include exp, exp2, log, log2, abs, pow, sqrt,
rsqrt, sin, cos, tan, floor, ceil, and erf. They work on scalar
values and on shaped values.
@kernel
def sigmoid(x: "f32[8]", out: "f32[8]"):
for i in range(8):
out[i] = 1.0 / (1.0 + allo.math.exp(-x[i]))Linalg operators currently include matmul and dot. They support both
default buffer mode and tensor mode. In buffer mode, pass an explicit acc=
output because the operation writes into an existing buffer. In tensor mode, the
same operation can return a tensor value directly.
@kernel(options=KernelOptions(enable_tensor=True))
def dense(a: "f32[2, 3]", b: "f32[3, 4]") -> "f32[2, 4]":
return allo.linalg.matmul(a, b)@kernel
def buffer_matmul(a: "f32[2, 3]", b: "f32[3, 4]", out: "f32[2, 4]"):
allo.linalg.matmul(a, b, acc=out)Global Python int and float values are visible as compile-time constants.
SCALE = 3
@kernel
def add_scale(x: i32) -> i32:
return x + SCALEconsteval marks a Python helper function that runs during compilation.
from allo import consteval
@consteval
def factor():
return 3
@kernel
def use_factor(x: i32) -> i32:
return x + factor()Templates parameterize kernels over compile-time types and values. A templated
kernel is not concrete until it is specialized with kernel[...].
from allo import Template, f32, i32
T = Template("T")
N = Template("N")
@kernel(T, N)
def fill_template(x: T, out: "T[N]"):
for i in range(N):
out[i] = x
fill_i32_4 = fill_template[i32, 4]Template bindings must be provided before compilation or execution. Type templates can be used in scalar annotations and as the head of shaped string annotations. Integer templates can be used in shape expressions and loop bounds.
Templates are different from ordinary global aliases. A global alias such as
T = i32 is an immediately chosen concrete type; every use of T in that kernel
means i32, and callers cannot specialize it. A Template("T") is a delayed
binding point that must be supplied by the caller.
FixedT = i32
@kernel
def fixed_alias(x: FixedT, out: "FixedT[4]"):
for i in range(4):
out[i] = x
T = Template("T")
@kernel(T)
def delayed_type(x: T, out: "T[4]"):
for i in range(4):
out[i] = x
delayed_i32 = delayed_type[i32]
delayed_f32 = delayed_type[f32]The new frontend reports compilation errors in a clang-like style. Diagnostics include the source file, line, column, error message, the relevant source line, and a caret span pointing at the AST node that triggered the error.
For example, an undefined name in:
def broken(x):
return x + yis rendered as:
broken.py:11:16: error: Name 'y' is not defined
11 | return x + y
| ^The same format is used for kernel syntax errors such as missing annotations, unsupported control flow, return type mismatches, illegal captures, or invalid operator calls. When an error occurs while compiling a called kernel or nested kernel, the diagnostic message is wrapped with call context so the caller and callee relationship is visible.
Source locations are based on Python source inspection. They are reliable for
kernels defined in normal .py files. In a REPL, notebook, python -c, or other
dynamically generated context, Python may not expose stable source lines; Allo
will still report the error, but file names and line numbers can be missing or
inaccurate. For compiler debugging, set ALLO_SHOW_COMPILER_TRACEBACK=1 to keep
the full Python traceback instead of the shortened user diagnostic.
The new frontend intentionally rejects unsupported Python early and reports the source location. The most important restrictions are:
return is not supported inside loops or nested if statements.break, continue, loop else blocks, arbitrary Python calls, attribute
assignment, chained assignment such as a = b = c, and multi-way comparisons
are not supported.constexpr variables must be explicitly annotated, initialized at
declaration, and never reassigned.... shapes, tensor methods
such as .T and .copy(), and bit-range indexing are not supported in the
current new frontend.The upstream frontend in docs/source/dive/frontend_syntax.rst, tests/, and
examples/ describes the older API. It is useful as historical context, but the
new frontend should be documented from test/, allo/exp, and example/.
| Area | Older upstream frontend | New frontend |
|---|---|---|
| Kernel entry | Plain Python function passed to allo.customize | Function decorated with @kernel |
| Import style | from allo.ir.types import int32, float32, ConstExpr | from allo import i32, f32, constexpr, kernel |
| Shaped annotations | Often int32[32, 32] | String annotations such as "i32[32, 32]" |
| Compile-time constants | ConstExpr[...] | constexpr annotation and @consteval helpers |
| Templates | Old Python generic syntax and scheduler instantiation | Template("T"), @kernel(T), and kernel[i32] specialization |
| Kernel calls | Helper functions inside customized functions | Calls between @kernel functions, including nested kernels |
| Diagnostics | Many errors surfaced later or through Python exits | Frontend diagnostics point to source locations |
| Default shaped value | Old tensor/memref behavior from upstream schedule flow | Mutable buffer by default; tensor mode with KernelOptions(enable_tensor=True) |
Some old examples are not current syntax:
# Old style
from allo.ir.types import int32
def gemm(A: int32[32, 32], B: int32[32, 32]) -> int32[32, 32]:
...
s = allo.customize(gemm)The new form is:
from allo import f32, kernel
@kernel
def gemm(A: "f32[32, 32]", B: "f32[32, 32]") -> "f32[32, 32]":
C: "f32[32, 32]" = 0.0
for i, j in allo.grid(32, 32):
for k in range(32):
C[i, j] += A[i, k] * B[k, j]
return CFeatures documented in the old frontend guide should not be copied into new
documentation unless they are implemented in the new frontend. In particular,
old meta_if/meta_for, dynamic float32[...] shapes, partial subviews,
general Python slicing, tensor attributes such as .T, .copy, and .reverse,
old fixed-point type attributes, and high-level neural-network library calls are
outside the currently documented new frontend surface.