HomeArchiveBlog


Original contents are licensed under CC BY-NC 4.0. All rights reserved © 2026 Kai.
Back to Archives
Allo Frontend Syntax

Syntax reference for the new Allo Python frontend.

Sat May 09 2026
Sat May 09 2026
AlloFrontendSyntaxKernelDSL
On this page
  • Frontend Syntax
    • Kernel Definition
    • Types and Annotations
    • Variables and Scope
    • Loops
    • Conditionals
    • Expressions and Operators
    • Indexing and Memory Access
    • Operator Namespaces
    • Compile-Time Features
    • Diagnostics
    • Common Restrictions
    • Differences from the Upstream Frontend

Frontend Syntax

This document describes the new Allo frontend syntax. The implementation is currently staged under allo.exp, but this document uses the intended top-level API:

import allo
from allo import f32, i32, kernel

The Python frontend is a restricted Python-embedded DSL (eDSL). It uses Python syntax for readability, but only the constructs described here are part of the kernel language.

Kernel Definition

Allo kernels are Python functions decorated with @kernel. Every parameter must have a type annotation.

from allo import f32, i32, kernel

@kernel
def saxpy(a: f32, x: "f32[16]", y: "f32[16]", out: "f32[16]"):
    for i in range(16):
        out[i] = a * x[i] + y[i]

Scalar annotations use type objects directly. Shaped annotations use strings:

@kernel
def scalar_add(x: i32, y: i32) -> i32:
    return x + y

@kernel
def vector_add(x: "i32[16]", y: "i32[16]") -> "i32[16]":
    out: "i32[16]" = 0
    for i in range(16):
        out[i] = x[i] + y[i]
    return out

Functions with no return value can omit the return annotation or use -> None. Returning a value requires an explicit return annotation.

@kernel
def fill(out: "i32[4]"):
    for i in range(4):
        out[i] = i

@kernel
def no_result(out: "i32[4]") -> None:
    return

Multiple return values are written as tuple annotations.

@kernel
def split_pair(x: i32, y: f32) -> (i32, f32):
    return x + 1, y + 1.0

@kernel
def caller(x: i32, y: f32, out: "f32[1]"):
    lhs, rhs = split_pair(x, y)
    out[0] = rhs + lhs

Return placement is intentionally restricted. A return may appear at the top level of the kernel body or in a first-level if/else branch. Returns inside loops and nested if statements are rejected.

@kernel
def choose(cond: allo.bool, x: i32, y: i32) -> i32:
    if cond:
        return x
    return y

Kernels can define nested kernels as local helper kernels. A nested kernel must be declared at the top level of the enclosing kernel body, must use exactly one @kernel decorator, and can be called like any other kernel.

@kernel
def outer(x: i32, out: "i32[1]"):
    @kernel
    def add_one(v: i32) -> i32:
        return v + 1

    out[0] = add_one(x)

Nested kernel definitions are not allowed inside if, for, grid, or while bodies. Recursive kernel calls are also rejected, including indirect recursion across multiple top-level or nested kernels.

Types and Annotations

The scalar types are:

CategoryTypes
Signed integersi2 through i16, plus i32, i64, i128, i256
Unsigned integersu1 through u16, plus u32, u64, u128, u256
Floating pointf16, f32, f64, bf16
Specialindex, bool, constexpr

bool is an alias for u1. index is the preferred type for loop indices and values used as dynamic indices.

Use apint(width, signed=False) to define custom integer widths beyond the predefined aliases. Unsigned custom integers are the default; pass signed=True for signed integers.

from allo import apint, kernel

u17 = apint(17)
i23 = apint(23, signed=True)

@kernel
def custom_width(x: u17, y: i23, out: "u17[1]"):
    out[0] = x + y

Shaped values are written as "dtype[shape]". A rank-0 shaped value uses an empty shape list.

@kernel
def shapes(a: "f32[8]", b: "i32[4, 4]", acc: "f32[]"):
    acc[()] = a[0] + b[0, 0]

Shape expressions are compile-time integer expressions. They may use integer literals, visible constants, template parameters, unary +/-, and the binary operators +, -, *, and //.

M = 4
N = 8

@kernel
def reshape_like(inp: "i32[M * N]", out: "i32[M, N]"):
    for i, j in allo.grid(M, N):
        out[i, j] = inp[i * N + j]

By default, shaped annotations describe mutable buffers. With KernelOptions(enable_tensor=True), the same annotation syntax describes MLIR tensors.

from allo import KernelOptions

@kernel(options=KernelOptions(enable_tensor=True))
def tensor_add(x: "f32[4]", y: "f32[4]") -> "f32[4]":
    return x + y

Variables and Scope

Annotated assignments declare variables.

@kernel
def declarations(x: i32, out: "i32[4]"):
    base: i32 = x
    tmp: "i32[4]" = 0
    for i in range(4):
        tmp[i] = base + i
        out[i] = tmp[i]

Shaped locals may be declared without an initializer. This allocates a local buffer in the default mode or an empty tensor in tensor mode.

@kernel
def local_buffer(out: "i32[4]"):
    buf: "i32[4]"
    for i in range(4):
        buf[i] = i
        out[i] = buf[i]

Scalar variables must be initialized when declared. A runtime local can also be introduced by assigning an existing runtime value.

@kernel
def inferred_local(cond: allo.bool, x: i32, y: i32, out: "i32[1]"):
    v = x
    if cond:
        v = y
    else:
        v = x + y
    out[0] = v

Compile-time variables must be declared with constexpr. They are evaluated during compilation and cannot be reassigned.

from allo import constexpr

@kernel
def constexpr_bound(out: "i32[4]"):
    N: constexpr = 4
    for i in range(N):
        out[i] = i

List initializers are supported for shaped values when every element is a compile-time int or float. The list shape must exactly match the annotation.

@kernel
def constants(out: "i32[2, 2]"):
    scale: constexpr = 3
    table: "i32[2, 2]" = [[1, scale], [scale + 1, scale + 2]]
    for i, j in allo.grid(2, 2):
        out[i, j] = table[i, j]

Allo uses block scope. Variables declared inside an if, for, grid, or while body are local to that block. Declare a variable before the block if it must be used afterward.

@kernel
def scoped(cond: allo.bool, x: i32, out: "i32[1]"):
    value: i32 = 0
    if cond:
        value = x
    else:
        value = x + 1
    out[0] = value

A name cannot be redeclared in the same scope. Later assignments are cast back to the variable's original type.

Nested kernels follow the same scoping model, but their captures are deliberately limited. They may capture compile-time symbols from the enclosing scope: constexpr values, concrete types, type aliases, other kernels, consteval functions, Allo operators, and modules. They may not capture runtime values such as enclosing kernel parameters, local scalar variables, loop indices, or buffers. Pass runtime values explicitly as nested-kernel arguments.

@kernel
def captures(x: i32, out: "i32[1]"):
    offset: constexpr = 2
    T: constexpr = i32

    @kernel
    def add_offset(v: T) -> T:
        return v + offset

    out[0] = add_offset(x)

Loops

Both Python range and allo.range are supported in kernels. They accept the same one-, two-, or three-argument forms.

from allo import range as allo_range


@kernel
def ranges(out: "i32[20]"):
    for i in range(10):
        out[i] = i
    for i in range(10, 20):
        out[i] = i
    for i in allo_range(0, 20, 2):
        out[i] = i * 2

Loop bounds may depend on runtime values. Loop steps must be positive if they are not constexpr.

@kernel
def variable_bounds(a: "i32[10]", out: "i32[10]"):
    for i in range(10):
        for j in range(a[i], 10, a[i]):
            out[j] += i

allo.grid is a shorthand for a multidimensional parallel loop. It requires at least two dimensions, and the loop target must be a tuple with the same number of variables.

@kernel
def matmul(a: "f32[32, 32]", b: "f32[32, 32]") -> "f32[32, 32]":
    c: "f32[32, 32]" = 0.0
    for i, j in allo.grid(32, 32):
        for k in range(32):
            c[i, j] += a[i, k] * b[k, j]
    return c

Grid dimensions may also be written as (start, stop) or (start, stop, step) tuples.

@kernel
def strided_grid(out: "i32[8, 8]"):
    for i, j in allo.grid((0, 8, 2), (1, 8, 2)):
        out[i, j] = i + j

At the moment, grid does not support non-trivial loop-carried scalar dependencies. Use nested range loops when the loop body needs to update a scalar accumulator across iterations.

while loops are supported for runtime conditions. A while loop may update loop-carried scalar values.

@kernel
def count(out: "i32[1]"):
    i: i32 = 0
    acc: i32 = 0
    while i < 4:
        acc += i
        i += 1
    out[0] = acc

break, continue, for ... else, and while ... else are not supported.

Conditionals

Runtime if/elif/else statements lower to structured control flow. Variables declared outside the conditional can be assigned in either branch and used afterward.

@kernel
def classify(x: i32, y: i32) -> i32:
    result: i32 = 0
    if x == 0:
        result = 1
    elif y > x:
        result = 2
    else:
        result = 3
    return result

Conditions may use comparison operators, and, or, and not.

@kernel
def logic(a: "i32[3]", b: i32) -> i32:
    out: i32 = 0
    if a[0] > 0 and b < 0:
        out = 1
    elif a[1] <= 1 or not (a[2] == 3):
        out = 2
    return out

Ternary expressions lower to a select operation when the condition is runtime. At least one branch must be a runtime value so the result type can be inferred.

@kernel
def select(cond: allo.bool, x: i32, y: i32) -> i32:
    return x if cond else y

If a condition is a constexpr, the frontend evaluates the condition during compilation and only emits the selected branch.

Expressions and Operators

The frontend supports the following Python operators.

CategoryOperators
Arithmetic+, -, *, /, //, %, **
Unary+x, -x, ~x, not x
Comparison==, !=, <, <=, >, >=
Booleanand, or
Bitwise&, |, ^, <<, >>
Assignment=, +=, -=, *=, /=, //=, %=, **=, &=, |=, ^=, <<=, >>=

Multi-way comparisons such as a < b < c are not supported; write them with and.

@kernel
def comparisons(a: i32, b: i32, c: i32) -> allo.bool:
    return a < b and b < c

The default typing_style is "hls", which uses HLS-oriented integer promotion. For example, an addition may widen internally and then cast back to the destination type. KernelOptions(typing_style="cpp") selects C++-style promotion rules.

@kernel(options=KernelOptions(typing_style="cpp"))
def cpp_style(x: allo.u32, y: i32, out: "u32[1]"):
    out[0] = x + y

min and max are supported as built-ins and lower to Allo arithmetic operators.

@kernel
def clamp(x: i32, lo: i32, hi: i32) -> i32:
    return min(max(x, lo), hi)

Only Allo kernels, Allo operators, and consteval functions may be called from inside a kernel. The static built-ins print and len are evaluated during compilation when their arguments are compile-time values.

Indexing and Memory Access

Shaped values use tuple-style indexing. The number of indices must match the rank.

@kernel
def copy_2d(src: "f32[4, 4]", dst: "f32[4, 4]"):
    for i, j in allo.grid(4, 4):
        dst[i, j] = src[i, j]

Rank-0 shaped values are indexed with ().

@kernel(options=allo.KernelOptions(enable_tensor=True))
def dot_scalar(a: "f32[4]", b: "f32[4]") -> f32:
    return allo.linalg.dot(a, b)[()]

Integer scalar values support single-bit extraction and insertion with subscript syntax.

@kernel
def bit_ops(x: allo.u32, out: "u1[1]"):
    out[0] = x[0]

Python slice indices such as A[0:4], partial subviews such as A[i] for a rank-2 buffer, and bit ranges such as x[0:4] are not part of the current new frontend.

Operator Namespaces

Python operators cover scalar arithmetic and shaped elementwise expressions. Explicit operator calls are useful when an operation needs an output accumulator.

@kernel
def memref_elementwise(x: "f32[4]", y: "f32[4]", out: "f32[4]"):
    allo.arith.add(x, y, acc=out)

Math operators include exp, exp2, log, log2, abs, pow, sqrt, rsqrt, sin, cos, tan, floor, ceil, and erf. They work on scalar values and on shaped values.

@kernel
def sigmoid(x: "f32[8]", out: "f32[8]"):
    for i in range(8):
        out[i] = 1.0 / (1.0 + allo.math.exp(-x[i]))

Linalg operators currently include matmul and dot. They support both default buffer mode and tensor mode. In buffer mode, pass an explicit acc= output because the operation writes into an existing buffer. In tensor mode, the same operation can return a tensor value directly.

@kernel(options=KernelOptions(enable_tensor=True))
def dense(a: "f32[2, 3]", b: "f32[3, 4]") -> "f32[2, 4]":
    return allo.linalg.matmul(a, b)
@kernel
def buffer_matmul(a: "f32[2, 3]", b: "f32[3, 4]", out: "f32[2, 4]"):
    allo.linalg.matmul(a, b, acc=out)

Compile-Time Features

Global Python int and float values are visible as compile-time constants.

SCALE = 3


@kernel
def add_scale(x: i32) -> i32:
    return x + SCALE

consteval marks a Python helper function that runs during compilation.

from allo import consteval


@consteval
def factor():
    return 3


@kernel
def use_factor(x: i32) -> i32:
    return x + factor()

Templates parameterize kernels over compile-time types and values. A templated kernel is not concrete until it is specialized with kernel[...].

from allo import Template, f32, i32

T = Template("T")
N = Template("N")

@kernel(T, N)
def fill_template(x: T, out: "T[N]"):
    for i in range(N):
        out[i] = x

fill_i32_4 = fill_template[i32, 4]

Template bindings must be provided before compilation or execution. Type templates can be used in scalar annotations and as the head of shaped string annotations. Integer templates can be used in shape expressions and loop bounds.

Templates are different from ordinary global aliases. A global alias such as T = i32 is an immediately chosen concrete type; every use of T in that kernel means i32, and callers cannot specialize it. A Template("T") is a delayed binding point that must be supplied by the caller.

FixedT = i32

@kernel
def fixed_alias(x: FixedT, out: "FixedT[4]"):
    for i in range(4):
        out[i] = x

T = Template("T")

@kernel(T)
def delayed_type(x: T, out: "T[4]"):
    for i in range(4):
        out[i] = x


delayed_i32 = delayed_type[i32]
delayed_f32 = delayed_type[f32]

Diagnostics

The new frontend reports compilation errors in a clang-like style. Diagnostics include the source file, line, column, error message, the relevant source line, and a caret span pointing at the AST node that triggered the error.

For example, an undefined name in:

def broken(x):
    return x + y

is rendered as:

broken.py:11:16: error: Name 'y' is not defined
11 |     return x + y
   |                ^

The same format is used for kernel syntax errors such as missing annotations, unsupported control flow, return type mismatches, illegal captures, or invalid operator calls. When an error occurs while compiling a called kernel or nested kernel, the diagnostic message is wrapped with call context so the caller and callee relationship is visible.

Source locations are based on Python source inspection. They are reliable for kernels defined in normal .py files. In a REPL, notebook, python -c, or other dynamically generated context, Python may not expose stable source lines; Allo will still report the error, but file names and line numbers can be missing or inaccurate. For compiler debugging, set ALLO_SHOW_COMPILER_TRACEBACK=1 to keep the full Python traceback instead of the shortened user diagnostic.

Common Restrictions

The new frontend intentionally rejects unsupported Python early and reports the source location. The most important restrictions are:

  • All kernel parameters require explicit annotations.
  • Returning a value requires an explicit return annotation.
  • return is not supported inside loops or nested if statements.
  • break, continue, loop else blocks, arbitrary Python calls, attribute assignment, chained assignment such as a = b = c, and multi-way comparisons are not supported.
  • constexpr variables must be explicitly annotated, initialized at declaration, and never reassigned.
  • Runtime values from an outer kernel scope cannot be captured by nested kernels.
  • Recursive kernel calls, including indirect recursion through nested kernels, are not supported.
  • Python slices, partial tensor subviews, dynamic ... shapes, tensor methods such as .T and .copy(), and bit-range indexing are not supported in the current new frontend.

Differences from the Upstream Frontend

The upstream frontend in docs/source/dive/frontend_syntax.rst, tests/, and examples/ describes the older API. It is useful as historical context, but the new frontend should be documented from test/, allo/exp, and example/.

AreaOlder upstream frontendNew frontend
Kernel entryPlain Python function passed to allo.customizeFunction decorated with @kernel
Import stylefrom allo.ir.types import int32, float32, ConstExprfrom allo import i32, f32, constexpr, kernel
Shaped annotationsOften int32[32, 32]String annotations such as "i32[32, 32]"
Compile-time constantsConstExpr[...]constexpr annotation and @consteval helpers
TemplatesOld Python generic syntax and scheduler instantiationTemplate("T"), @kernel(T), and kernel[i32] specialization
Kernel callsHelper functions inside customized functionsCalls between @kernel functions, including nested kernels
DiagnosticsMany errors surfaced later or through Python exitsFrontend diagnostics point to source locations
Default shaped valueOld tensor/memref behavior from upstream schedule flowMutable buffer by default; tensor mode with KernelOptions(enable_tensor=True)

Some old examples are not current syntax:

# Old style
from allo.ir.types import int32

def gemm(A: int32[32, 32], B: int32[32, 32]) -> int32[32, 32]:
    ...

s = allo.customize(gemm)

The new form is:

from allo import f32, kernel

@kernel
def gemm(A: "f32[32, 32]", B: "f32[32, 32]") -> "f32[32, 32]":
    C: "f32[32, 32]" = 0.0
    for i, j in allo.grid(32, 32):
        for k in range(32):
            C[i, j] += A[i, k] * B[k, j]
    return C

Features documented in the old frontend guide should not be copied into new documentation unless they are implemented in the new frontend. In particular, old meta_if/meta_for, dynamic float32[...] shapes, partial subviews, general Python slicing, tensor attributes such as .T, .copy, and .reverse, old fixed-point type attributes, and high-level neural-network library calls are outside the currently documented new frontend surface.