Basic Elements in MLIR

An introduction to the basic elements and concepts in MLIR, including operations, types, and dialects.

Thu Oct 30 2025

Fri Jan 02 2026

MLIRCompilerIntroduction

Basic Elements in MLIR

这一章介绍 MLIR 中的基本数据对象, 如果此前有过 LLVM 的使用经验, 会发现 MLIR 中的很多概念和 LLVM 是类似的.

操作(Operation)

操作是组成 MLIR 程序的最基本单元. 其他的概念都是建立在操作的基础上的. 每个操作表示一个计算或者控制流的基本单位, 类似于 LLVM 中的指令(Instruction). 例如

%0 = arith.addi %a, %b : i32
%alloc_0 = memref.alloc() : memref<10xi32>
func.func @my_function(%arg0: i32, %arg1: i32) -> i32 {
    %0 = arith.addi %arg0, %arg1 : i32
    return %0 : i32
}

每个操作都可以有零个或者多个输入, 零个或者多个输出. 例如 arith.addi 操作可以接受两个输入 %a 和 %b, 并产生它们相加的结果 %0. 在 MLIR 中, 操作的输入被称为 操作数(Operands), 操作的输出被称为 结果(Results).

操作数通常是运行时才能确定的值, 另一种在编译期间就已经知道的值被称为 属性(Attributes). 例如:

%c0 = arith.constant 42 : i32

arith.constant 操作用来生成一个字面值常量. 这里的 42 是一个属性, 它在编译期间就已经确定了, 而这个操作没有任何操作数.

大多数情况下一个操作的属性会被放在 attr-dict 中显示出来, 也就是一个花括号括起来的键值对列表, 例如:

%alloc = memref.alloc() {alignment = 64 : i64} : memref<10xi32>

这里的 alignment 属性指定了分配的内存块的对齐方式.

在 C++ API中, 和操作相关的基类主要有 mlir::Op 和 mlir::Operation, 前者用来通用地建模所有类型的操作, 提供一些通用的接口. 而后者派生自前者, 实际上是一个针对 Operation * 的智能指针. 对于每一种特定类型的操作, 都会有一个派生自 mlir::Op 的类, 派生类会再提供一些特定于这种操作的接口. 例如

Operation *op = ...; // Assume this is an arith.constant operation
auto result = op->getResult(0); // Get the first result of the operation
auto constOp = cast<arith::ConstantOp>(op); // Cast to the specific operation type
auto value = constOp.getResult(); // Get the result of the operation
assert(result == value); // They should be the same

由于 arith.constant 操作只可能有一个返回值, 因此可以通过 Operation::getResult(0) 来获取它的结果. 但同样可以通过将 Operation * 转换为 arith::ConstantOp , 然后调用它的 getResult() 方法来获取结果. 实际上后者就是对前者的封装, 但它更语义化.

这也是个例子:

// given a store op: 
// memref.store %value, %ptr[%index] : memref<10xi32>
Operation *op = ...;
auto value1 = op->getOperand(0): // Get the first operand of the operation
auto storeOp = cast<memref::StoreOp>(op); // Cast to the specific operation type
auto value2 = storeOp.getValueToStore(); // Get the value to store
assert(value1 == value2); // They should be the same

在 MLIR 中, 各种类型的操作对应的派生类实际上是一个轻量化的包装类, 它们并不持有操作对象的所有权, 只是对操作对象的一个引用. 因此可以随意地创建和销毁这些派生类的对象, 而不会影响到底层的操作对象. 但 mlir::Operation 类不同, 它用来储存IR中每一条操作的具体数据, 禁止拷贝和赋值(这一点类似智能指针), 只能通过引用或者指针来使用它.

arith::ConstantOp constOp1 = nullptr;
arith::ConstantOp constOp2 = ...;
constOp1 = constOp2; // OK, now constOp1 points to the same operation as constOp2
Operation op1 = ...; 
// Error: use of deleted function 'mlir::Operation::Operation(const mlir::Operation&)'

值(Value)

在 MLIR 中, 类似操作的操作数和结果, 函数的参数和返回值, 都是 值(Value) 的一种. 和 LLVM 中一样, 所有的值都是SSA(Single Static Assignment) 形式的, 也就是说每个值只能被定义一次, 但可以被多次使用. 例如:

%0 = arith.addi %a, %b : i32
%1 = arith.muli %0, %c : i32

利用SSA可以很容易追溯一个值的来源, 以及它的使用列表. MLIR 内部会维护所有值的 use-def chain.

在 MLIR 中, 一个值只可能来源于两种途径: 由操作定义, 或者作为块参数被引入. 例如:

%0 = arith.addi %a, %b : i32 // %0 is defined by the arith.addi operation
^bb0(%arg0: i32, %arg1: i32): // %arg0 and %arg1 are block arguments
    %1 = arith.muli %arg0, %arg1 : i32 // %1 is defined by the arith.muli op
    return %1 : i32

在 C++ API中, 管理值的基类是 mlir::Value, 它是一个轻量化的包装类, 内部持有一个detail::ValueImpl* 指针, 该指针指向值的具体实现对象. 因此 mlir::Value 对象可以被拷贝或者赋值. 它提供了一些常用的接口, 例如 Value::getDefiningOp() 可以获取定义该值的操作, 根据前面的讨论, 如果返回的是 nullptr, 那么说明该值是一个块参数. Value::getUsers() 可以获取使用该值的所有操作的链表. Value::getType() 可以获取这个值的类型.

需要注意的是, Value::getUsers() 返回的是一个操作链表, 它并不保证是按照支配关系排序的. 如果要获取按照支配关系使用顺序, 可能需要使用 topologicalSort 进行排序或者 DominanceInfo 进行分析.

类型(Type)

每个值都有一个类型, 类型描述了这个值的存储方式和语义. 例如:

%0 = arith.addi %a, %b : i32 // %0 is of type i32
%1 = memref.alloc() : memref<10xi32> // %1 is of type memref<10xi32>

这里的 i32 是一个标量类型, 表示一个32位整数, 而 memref<10xi32> 是一个内存引用类型, 表示一个包含10个32位整数的数组.

常见的类型包括不限于: IndexType, 表示索引类型, 访问内存位置用到的下标(Subscript)通常就是索引类型. IntegerType, FloatType, 顾名思义. MemrefType, 表示内存引用类型, 可以表示一块内存区域, 包括它的形状和里面储存的元素类型. FunctionType, 表示函数类型, 储存的是函数的所有参数的类型和返回值类型.

在 C++ API中, 管理类型的基类是 mlir::Type, 它是一个轻量化的包装类, 内部持有一个detail::TypeImpl* 指针, 该指针指向类型的具体实现对象. mlir::Type 提供了很多 is 类方法, 判断某个类型是否是某个具体的类型.

另外, MLIR 提供了一个模板类 TypedValue<> 用来表示某个具体类型的值. 通过这个模板特化的类, 可以调用一些特定于该类型的接口. 例如:

memref::StoreOp storeOp = ...; // Assume this is a memref.store operation
auto memref = storeOp.getMemref(); // inferred as TypedValue<MemrefType>
auto shape = memref.getShape(); // directly get the shape of the memref
auto elementType = memref.getElementType(); // directly get the element type of the memref

属性(Attribute)

属性是一些在编译期间就已经确定的值, 它们通常用来描述操作的某些静态特征. 常见的属性有很多种: 整型属性(IntegerAttr), 浮点型属性(FloatAttr), 字符串属性(StringAttr), 数组属性(ArrayAttr), 字典属性(DictionaryAttr), 仿射关系属性(AffineMapAttr) 等等.

%alloc = memref.alloc() {alignment = 64 : i64} : memref<10xi32>
%c0 = arith.constant 42 : i32 // 42 is an IntegerAttr
func.func @my_function(%arg0: i32) attributes { "kernel" } { // a UnitAttr
    // function body
}

一个比较特殊的属性是 UnitAttr, 它表示一个没有具体值的属性, 仅仅用来表示某个特征的存在与否. 例如上面的函数定义操作中, "kernel" 属性就是一个 UnitAttr.

在 C++ API中, 管理属性的基类是 mlir::Attribute, 它提供了一些常用的接口, 例如 Attribute::getType() 可以获取属性的类型.

块(Block)

块是一组有序操作的集合, 类似于 LLVM 中的基本块(Basic Block). 块是程序控制流的基本单位, 每个块都有一个入口点和一个出口点. 同一个块中的操作是顺序执行的, 块之间通过控制流操作进行连接构成程序的控制流图(CFG, Control Flow Graph). 例如:

func.func @max(%arg0: i32, %arg1: i32) -> i32 {
    %cmp = arith.cmpi sgt, %arg0, %arg1 : i32
    cf.cond_br %cmp, ^bb1, ^bb2
    ^bb1:
    return %arg0 : i32
    ^bb2:
    return %arg1 : i32
}

首先一个函数定义操作 func.func 一定会有一个入口块, 但这个块在打印的时候是隐含的, 不会显式写出 bb0. 这个入口块包含了比较操作 arith.cmpi 和条件分支操作 cf.cond_br. 该条件分支操作根据比较结果跳转到两个不同的块 bb1 和 bb2. 这两个块分别包含一个返回操作, 返回不同的结果.

一个块也可以有任意数量的块参数, 这些可以在其所属的块中使用. 例如:

^bb0(%arg0: i32, %arg1: i32):
    %cmp = arith.cmpi sgt, %arg0, %arg1 : i32
    cf.cond_br %cmp, ^bb1, ^bb2

需要注意的是块参数是没有定义操作的, 它们是由块的前驱块通过控制流操作传递进来的. 因此对块参数调用 getDefiningOp() 会返回 nullptr.

在 MLIR 中管理块的基类是 mlir::Block *. 它提供了一些常用的接口, 例如 Block::getOperations() 可以获得一个包含块中所有操作的容器, Block::getArguments() 可以获得一个包含块中所有块参数的容器. 对于一个块参数 BlockArgument, 可以通过 BlockArgument::getOwner() 获得它所属的块.

Block *block = ...; // Assume this is a valid block pointer
for (auto &op : block->getOperations()) {
    op.dumpPretty(); // print each operation in the block
}
func::FuncOp funcOp = ...; // Assume this is a valid function operation
Block &entryBlock = funcOp.getBody().front();
for (auto arg : entryBlock.getArguments()) {
    // do something with each block argument
}

区域(Region)

区域是一些块的集合, 类似于 LLVM 中的函数(Function). 内部的块通过控制流操作进行连接. 在 MLIR 中, 一般一组花括号 {} 中包含的所有内容就构成一个区域. 很多操作有且只能有一个区域, 例如函数定义操作 func.func, 循环操作 affine.for 等. 也有少部分操作可以有多个区域, 最常见的就是控制流 scf.if 和 affine.if 操作, 它们分别有两个区域, 分别对应 then 分支和 else 分支.

// only one region
affine.for %i0 = 0 to 10 {
    // loop body
}
func.func @add(%arg0: i32, %arg1: i32) -> i32 {
    %0 = arith.addi %arg0, %arg1 : i32
    return %0 : i32
}
// two regions
scf.if %cond {
    // then branch
} else {
    // else branch
}

MLIR 中管理区域的基类是 mlir::Region *. 它提供了一些常用的接口, 例如 Region::getBlocks() 可以获得一个包含区域中所有块的容器, Region::front() 和 Region::back() 可以分别获得区域中的第一个块和最后一个块. Region::getOps() 可以获得一个范围对象, 用来遍历区域中所有的操作.

在 MLIR 中, 区域是由操作定义的. 整个IR的嵌套结构是: 一个操作定义一个或者多个区域, 每个区域包含多个块, 每个块包含多个操作, 这些操作又可以再定义新的区域, 如此反复形成一个树形结构. 因此一个特别的操作是 ModuleOp, 它是整个IR的根操作, 定义了一个顶层区域.

方言(Dialect)

方言并不是一个基本数据对象, 但我决定把它也放在这里. 方言是 MLIR 中用来组织和扩展操作、类型和属性的一种机制. 它不是一种具体的数据对象, 而是一种逻辑上的分类.

MLIR核心库定义很多常用的方言, 每种方言你都可以将它想象为"领域特定的语言扩展". 例如:

memref 方言定义了一组用于内存引用操作和类型的操作, memref.alloc 用来分配堆内存, memref.alloca 用来分配栈内存, memref.load 和 memref.store 用来加载和存储内存数据.
arith 方言定义了一组用于算术运算的操作, 例如 arith.addi 用来执行整数加法, arith.muli 用来执行整数乘法.
func 方言定义了一组用于函数定义和调用的操作, 例如 func.func 用来定义函数, func.call 用来调用函数.

还有很多很多常用的核心方言. 每个方言都有自己的一套操作、类型和属性, 这些操作、类型和属性通常都是为了满足某一类特定需求而设计的. 这使得 MLIR 能够很容易进行扩展和定制, 你只需要组合其中的一些方言, 又或者定义你自己的方言, 就可以构建出符合某种特定需求的中间表示.

在MLIR中一个很重要的理念是逐级递降(Progressive Lowering). 这也是它和LLVM的最大区别. 在传统的LLVM编译器中, 原始的代码全部被直接打碎成基本的原子指令(指的是不可再分, 而不是同步里面的那个"原子操作"), 然后再进行各种优化和代码生成. 但这种方法一下子就把程序的高层语义全部丢掉了, 显然从原子指令中反推高层语义是非常困难的. 而MLIR则不同, 方言的设计允许你定义不同抽象层级的操作, 例如一个循环操作

// in MLIR
affine.for %i = 0 to 10 {}

// in LLVM IR, this might be as follows:
br label %loop_header
loop_header:
    %i = phi [0, %entry], [%next_i, %loop_body]
    %cmp = icmp slt %i, 10
    br i1 %cmp, label %loop_body, label %loop_exit
loop_body:
    // loop body
    %next_i = add %i, 1
    br label %loop_header
loop_exit:

这样对比一下, MLIR允许编译器更快更准确地识别程序的高层语义, 选择正确的优化策略. 并且MLIR允许你定义一套规则, 在方言之间进行转换(Conversion), 逐步将高层语义的方言转换为低层语义的方言, 最终生成目标代码. 举个例子:

在MLIR核心库中, linalg 和 tensor 方言定义了一些高层的线性代数操作和张量操作, 比如卷积, 转置. 编译器首先在这个抽象层级做一些优化, 例如转置的转置可以直接消除掉, Elementwise的操作可以进行融合(Fusion).
然后再逐步将这些高层操作转换为底层一些的操作, 例如 affine 和 memref 方言. 在这个层级上可以做一些内存访问优化, 针对目标机器平台的向量化优化, 循环分块优化等等.
转换为 LLVM 方言, 也就是LLVM IR, 最后交给LLVM后端生成最终的CPU端机器码.
如果目标平台不同, 你也可以选择不同的底层方言进行转换, 例如 NVVM 方言用来生成NVIDIA GPU的机器码, ROCDL 方言用来生成AMD GPU的机器码.
甚至对于FPGA等特殊平台, 你也可以自定义方言, 从 affine, arith, scf 这个级别的方言, 逐步降低到RTL级别的方言, 最终生成HDL代码. (参考 llvm/circt)

这种逐级递降的方法算是MLIR的核心优势之一, 灵活性相比传统的LLVM编译器高了很多, 尤其是对于异构计算平台的支持.

MLIR还支持翻译(Translation), 指的是将MLIR转换为其他中间表示(IR)(例如SPIR-V)或者高级语言代码(例如C/C++), 这允许MLIR生态和其他编译器生态进行互操作. 比如对于FPGA HLS, 可以将优化后的MLIR代码翻译成C/C++代码, 然后交给Xilinx Vitis HLS或者Intel OneAPI HLS进行综合.

实际上目前基于MLIR的异构平台编译器已经很多了, 比如针对CPU/GPU混合平台的IREE, 针对CPU/GPU/TPU的OpenXLA, 学术界针对FPGA HLS的ScaleHLS, Allo等等. 它们利用的正是MLIR的逐级递降和翻译机制, 先在高抽象层级通用地描述计算逻辑, 然后针对不同的平台特性做出原始计算模式和调度方式的优化, 最终生成针对不同架构的机器码. 在这些实践中, MLIR中的核心方言更多作为一个中层甚至偏低层的抽象表达, 而高层的抽象表达则更多依赖于自定义方言来实现领域特定的优化和转换. (个人认为Triton并不算MLIR编译器, 它更像是一个CUDA Python Binding, 只是利用了MLIR的部分基础设施. 它假设了很多GPU的硬件架构, 并且这个架构一部分特性还是NVIDIA玩的那一套, 实在是不太算一个真正意义上的逐层递降的编译器)

总之, 方言是MLIR中非常重要的一个概念, 它不仅帮助组织和分类操作、类型和属性, 还支持逐级递降和翻译等高级功能. 在用MLIR开发的过程中遵循逐级递降的理念是一个很好的实践, 而不是直接降到底, 那就失去MLIR的意义了.

Back to Archives

Basic Elements in MLIR

An introduction to the basic elements and concepts in MLIR, including operations, types, and dialects.

Thu Oct 30 2025

Fri Jan 02 2026

MLIRCompilerIntroduction

Basic Elements in MLIR

这一章介绍 MLIR 中的基本数据对象, 如果此前有过 LLVM 的使用经验, 会发现 MLIR 中的很多概念和 LLVM 是类似的.

操作(Operation)

%0 = arith.addi %a, %b : i32
%alloc_0 = memref.alloc() : memref<10xi32>
func.func @my_function(%arg0: i32, %arg1: i32) -> i32 {
    %0 = arith.addi %arg0, %arg1 : i32
    return %0 : i32
}

操作数通常是运行时才能确定的值, 另一种在编译期间就已经知道的值被称为 属性(Attributes). 例如:

%c0 = arith.constant 42 : i32

arith.constant 操作用来生成一个字面值常量. 这里的 42 是一个属性, 它在编译期间就已经确定了, 而这个操作没有任何操作数.

大多数情况下一个操作的属性会被放在 attr-dict 中显示出来, 也就是一个花括号括起来的键值对列表, 例如:

%alloc = memref.alloc() {alignment = 64 : i64} : memref<10xi32>

这里的 alignment 属性指定了分配的内存块的对齐方式.

Operation *op = ...; // Assume this is an arith.constant operation
auto result = op->getResult(0); // Get the first result of the operation
auto constOp = cast<arith::ConstantOp>(op); // Cast to the specific operation type
auto value = constOp.getResult(); // Get the result of the operation
assert(result == value); // They should be the same

这也是个例子:

// given a store op: 
// memref.store %value, %ptr[%index] : memref<10xi32>
Operation *op = ...;
auto value1 = op->getOperand(0): // Get the first operand of the operation
auto storeOp = cast<memref::StoreOp>(op); // Cast to the specific operation type
auto value2 = storeOp.getValueToStore(); // Get the value to store
assert(value1 == value2); // They should be the same

arith::ConstantOp constOp1 = nullptr;
arith::ConstantOp constOp2 = ...;
constOp1 = constOp2; // OK, now constOp1 points to the same operation as constOp2
Operation op1 = ...; 
// Error: use of deleted function 'mlir::Operation::Operation(const mlir::Operation&)'

值(Value)

%0 = arith.addi %a, %b : i32
%1 = arith.muli %0, %c : i32

利用SSA可以很容易追溯一个值的来源, 以及它的使用列表. MLIR 内部会维护所有值的 use-def chain.

在 MLIR 中, 一个值只可能来源于两种途径: 由操作定义, 或者作为块参数被引入. 例如:

%0 = arith.addi %a, %b : i32 // %0 is defined by the arith.addi operation
^bb0(%arg0: i32, %arg1: i32): // %arg0 and %arg1 are block arguments
    %1 = arith.muli %arg0, %arg1 : i32 // %1 is defined by the arith.muli op
    return %1 : i32

类型(Type)

每个值都有一个类型, 类型描述了这个值的存储方式和语义. 例如:

%0 = arith.addi %a, %b : i32 // %0 is of type i32
%1 = memref.alloc() : memref<10xi32> // %1 is of type memref<10xi32>

这里的 i32 是一个标量类型, 表示一个32位整数, 而 memref<10xi32> 是一个内存引用类型, 表示一个包含10个32位整数的数组.

另外, MLIR 提供了一个模板类 TypedValue<> 用来表示某个具体类型的值. 通过这个模板特化的类, 可以调用一些特定于该类型的接口. 例如:

memref::StoreOp storeOp = ...; // Assume this is a memref.store operation
auto memref = storeOp.getMemref(); // inferred as TypedValue<MemrefType>
auto shape = memref.getShape(); // directly get the shape of the memref
auto elementType = memref.getElementType(); // directly get the element type of the memref

属性(Attribute)

%alloc = memref.alloc() {alignment = 64 : i64} : memref<10xi32>
%c0 = arith.constant 42 : i32 // 42 is an IntegerAttr
func.func @my_function(%arg0: i32) attributes { "kernel" } { // a UnitAttr
    // function body
}

在 C++ API中, 管理属性的基类是 mlir::Attribute, 它提供了一些常用的接口, 例如 Attribute::getType() 可以获取属性的类型.

块(Block)

func.func @max(%arg0: i32, %arg1: i32) -> i32 {
    %cmp = arith.cmpi sgt, %arg0, %arg1 : i32
    cf.cond_br %cmp, ^bb1, ^bb2
    ^bb1:
    return %arg0 : i32
    ^bb2:
    return %arg1 : i32
}

一个块也可以有任意数量的块参数, 这些可以在其所属的块中使用. 例如:

^bb0(%arg0: i32, %arg1: i32):
    %cmp = arith.cmpi sgt, %arg0, %arg1 : i32
    cf.cond_br %cmp, ^bb1, ^bb2

需要注意的是块参数是没有定义操作的, 它们是由块的前驱块通过控制流操作传递进来的. 因此对块参数调用 getDefiningOp() 会返回 nullptr.

Block *block = ...; // Assume this is a valid block pointer
for (auto &op : block->getOperations()) {
    op.dumpPretty(); // print each operation in the block
}
func::FuncOp funcOp = ...; // Assume this is a valid function operation
Block &entryBlock = funcOp.getBody().front();
for (auto arg : entryBlock.getArguments()) {
    // do something with each block argument
}

区域(Region)

// only one region
affine.for %i0 = 0 to 10 {
    // loop body
}
func.func @add(%arg0: i32, %arg1: i32) -> i32 {
    %0 = arith.addi %arg0, %arg1 : i32
    return %0 : i32
}
// two regions
scf.if %cond {
    // then branch
} else {
    // else branch
}

方言(Dialect)

MLIR核心库定义很多常用的方言, 每种方言你都可以将它想象为"领域特定的语言扩展". 例如:

memref 方言定义了一组用于内存引用操作和类型的操作, memref.alloc 用来分配堆内存, memref.alloca 用来分配栈内存, memref.load 和 memref.store 用来加载和存储内存数据.
arith 方言定义了一组用于算术运算的操作, 例如 arith.addi 用来执行整数加法, arith.muli 用来执行整数乘法.
func 方言定义了一组用于函数定义和调用的操作, 例如 func.func 用来定义函数, func.call 用来调用函数.

// in MLIR
affine.for %i = 0 to 10 {}

// in LLVM IR, this might be as follows:
br label %loop_header
loop_header:
    %i = phi [0, %entry], [%next_i, %loop_body]
    %cmp = icmp slt %i, 10
    br i1 %cmp, label %loop_body, label %loop_exit
loop_body:
    // loop body
    %next_i = add %i, 1
    br label %loop_header
loop_exit:

在MLIR核心库中, linalg 和 tensor 方言定义了一些高层的线性代数操作和张量操作, 比如卷积, 转置. 编译器首先在这个抽象层级做一些优化, 例如转置的转置可以直接消除掉, Elementwise的操作可以进行融合(Fusion).
然后再逐步将这些高层操作转换为底层一些的操作, 例如 affine 和 memref 方言. 在这个层级上可以做一些内存访问优化, 针对目标机器平台的向量化优化, 循环分块优化等等.
转换为 LLVM 方言, 也就是LLVM IR, 最后交给LLVM后端生成最终的CPU端机器码.
如果目标平台不同, 你也可以选择不同的底层方言进行转换, 例如 NVVM 方言用来生成NVIDIA GPU的机器码, ROCDL 方言用来生成AMD GPU的机器码.
甚至对于FPGA等特殊平台, 你也可以自定义方言, 从 affine, arith, scf 这个级别的方言, 逐步降低到RTL级别的方言, 最终生成HDL代码. (参考 llvm/circt)

这种逐级递降的方法算是MLIR的核心优势之一, 灵活性相比传统的LLVM编译器高了很多, 尤其是对于异构计算平台的支持.