Morgan Woods


AI Compilation Principles 1

Thank you to the uploader ZOMI Sauce:
I can't understand "AI Compiler Development" without any foundation.

01 Compiler Basics#

  1. What is a compiler?
  2. Why do AI frameworks need to introduce compilers?
  3. What is the relationship between AI frameworks and AI compilers?

Compiler and Interpreter#

The biggest difference between a compiler and an interpreter is that an interpreter converts code into machine code at runtime, while a compiler converts code into machine code before runtime.

JIT and AOT Compilation#

Currently, programs mainly have two ways of running: static compilation and dynamic interpretation.

  • Static compilation compiles the entire code program into machine code before execution, usually referred to as AOT (Ahead of time) compilation;
  • Dynamic interpretation translates the code program while it is running, usually referred to as JIT (Just in time) compilation.

AOT programs are typically developed in C/C++ and must be compiled into machine code before execution, and then handed over to the operating system for specific execution;
There are many representatives of JIT, such as JavaScript, Python, and other dynamically interpreted programs.

FeatureJIT (Just in time)AOT (Ahead of time)
Advantages1. Can generate optimal machine instructions in real time based on the current hardware situation
2. Can generate the optimal sequence of machine instructions based on the current program's runtime situation
3. When the program needs to support dynamic linking, only the JIT compilation method can be used
4. Can adjust the code based on the actual memory situation in the process, allowing the memory to be more fully utilized
1. Compiling before program execution can avoid the performance and memory consumption of compilation during runtime
2. Can achieve the highest performance at the beginning of program execution
3. Can significantly speed up program execution efficiency
Disadvantages1. Compilation consumes runtime resources, causing the process to stutter during execution
2. Compilation consumes runtime, and some code compilation optimizations are not fully supported, requiring a trade-off between fluency and time
3. Time is required for frequent use of compilation preparation and identification of methods, and the initial compilation cannot achieve the highest performance
1. Compiling before program execution increases the installation time of the program
2. Saving the content of pre-compiled code will consume more memory
3. Sacrifices the consistency of high-level languages

Differences in AI Frameworks#

Currently, mainstream AI frameworks are equipped with front-end expression layers and AI compilers for hardware enablement. Therefore, the relationship between AI frameworks and AI compilers is very close. Some AI frameworks such as MindSpore and TensorFlow include their own AI compilers by default. After the upgrade of PyTorch 2.X version, it also comes with the Inductor feature by default, which can be connected to multiple AI compilers.

Compilation MethodDescriptionTypical Examples
AOT (Ahead of time)Static compilation compiles the code program into machine code before execution, suitable for mobile and embedded deep learning applications.1. Inference engine: Pre-trained AI models are solidified in advance for inference deployment.
2. Static graph generation: Neural network models are represented as a unified IR description, and the compiled content is executed at runtime.
JIT (Just in time)Dynamic interpretation translates the program code while it is running, suitable for scenarios that require real-time optimization.1. PyTorch JIT: Compiles Python code into native machine code in real time, optimizing and accelerating deep learning models.
2. Jittor: A deep learning framework based on dynamic compilation JIT, using meta-operators and unified computation graphs to achieve efficient operations and automatic optimization.

Pass and Intermediate Representation (IR)#

A Pass is mainly a complete scan or processing of the source program language. In a compiler, a Pass refers to a structured technique used to perform analysis, optimization, or transformation of the compilation object (IR). The execution of a Pass is the process of analyzing and optimizing the compilation unit in the compiler, and the Pass builds the analysis results required for these processes.

In the LLVM compiler, Passes are divided into three categories: Analysis pass, Transform pass, and Utility pass.

Pass TypeDescriptionFunctionCommon Examples
Analysis PassCalculates high-level information about related IR units without modifying them. This information can be used by other Passes or for debugging and program visualization.1. Extract and store information from IR units
2. Provide query interfaces for other Passes to access
3. Provide invalidate interfaces to handle information invalidation
Basic Alias Analysis, Scalar Evolution Analysis
Transform PassUses the analysis results of Analysis Pass to modify and optimize the IR.1. Modify instructions and control flow in the IR
2. May reduce function calls and expose more optimization opportunities
Inline Pass
Utility PassFunctional utilities that do not belong to Analysis Pass or Transform Pass.1. Perform specific tasks, such as extracting basic blocksextract-blocks Pass

IR (Intermediate Representation) is an important data structure in the compiler. After completing the front-end work in the compiler, the compiler first generates its custom IR, and then performs various optimization algorithms based on this, and finally generates target code.

As shown in the figure, in compiler theory, compilers are usually divided into front-end and back-end. The front-end analyzes the input program through lexical analysis, syntax analysis, and semantic analysis, and then generates an intermediate representation (IR) of the program. The back-end optimizes the IR and then generates the target code.

For example, LLVM separates the front-end and back-end, and defines an abstract language in the intermediate layer, which is called LLVM IR. After defining the IR, the task of the front-end is to generate the IR, the optimizer is responsible for optimizing the generated IR, and the task of the back-end is to convert the IR into the language of the target platform. The type system of LLVM IR refers to the type system in LLVM assembly language.

Therefore, in the compiler, the front-end, optimizer, and back-end only exchange data structure types, which is the IR, to achieve decoupling of different modules. Some IRs are given special names, such as WHIRL IR in Open64, MAPLE IR in Ark Compiler, and LLVM IR in LLVM.

In most cases, IR has two uses: 1) to perform analysis and transformation and 2) to be directly used for interpretation and execution.

In a compiler, based on the abstraction level from high to low, IR can be divided into three layers: High IR (HIR), Middle IR (MIR), and Low IR (LIR).

IR TypeDescriptionUseCharacteristics
HIR (High IR)Performs analysis and transformation of code based on the source programming language.Used for IDEs, code translation tools, code generation tools, etc., to perform high-level code optimization (such as constant folding, inlining).1. Accurately expresses the semantics of the source programming language
2. Can use AST and symbol table
MIR (Middle IR)Performs code analysis and optimization independent of the source programming language and hardware architecture.Used for compilation optimization algorithms, performing general optimizations (such as arithmetic optimization, constant and variable propagation, dead code elimination).1. Independent of the source code and target code
2. Usually based on three-address code (TAC)
LIR (Low IR)Optimizes and generates code based on specific hardware architectures.Used for optimizing programs related to specific hardware architectures, generating machine instructions or assembly code.1. Instructions usually correspond one-to-one with machine instructions
2. Reflects the low-level characteristics of specific hardware architectures

Three-address code (TAC) has the following characteristics: It can have up to three addresses (i.e., variables), where the left side of the assignment symbol is used for writing, and the right side can have up to two addresses and one operator for reading data and performing calculations.

Compared with single-layer IR, multi-layer IR has obvious advantages:

  • It can provide more information about the source programming language
  • IR is more flexible in expression and more convenient for optimization
  • It makes optimization algorithms and Pass execution more efficient

In the LLVM compiler, based on the abstraction level from high to low, it adopts a three-stage structure of front-end, optimizer, and back-end, which makes it convenient to add new language support or new target platform support, greatly reducing engineering costs. And LLVM IR plays an important role in this three-stage structure that separates the front-end and back-end. It facilitates the understanding of program code by developers and hardware machines.


  1. Interpreters are computer programs that convert each high-level program statement into machine code.
  2. Compilers convert high-level language programs into machine code, i.e., they convert human-readable code into machine-readable code.
  3. A Pass is a complete scan or processing of the source program language in a compiler.
  4. IR (Intermediate Representation) is an important data structure in the compiler, responsible for connecting different levels and modules within the compiler.

02 Development of Traditional Compilers#

The development of compilers and programming languages has almost been synchronous and can be divided into several stages:

  • The first stage: In the 1950s, the first compiler program appeared, which translated arithmetic formulas into machine code, laying the foundation for the development of high-level languages.
  • The second stage: In the 1960s, various high-level languages and corresponding compilers appeared, such as Fortran, COBOL, LISP, ALGOL, etc., and the compilation technology gradually matured and standardized.
  • The third stage: In the 1970s, structured programming methods and modular programming concepts emerged, as well as object-oriented languages and compilers, such as Pascal, C, Simula, etc. Compilation technology began to focus on the readability and maintainability of engineering code.
  • The fourth stage: In the 1980s, parallel computers and distributed systems emerged, as well as languages and compilers that support parallel and distributed computing, such as Ada, Prolog, ML, etc. Compilation technology began to consider the parallel and distributed capabilities of programs.
  • The fifth stage: In the 1990s, the emergence of the Internet and mobile devices and other emerging platforms, as well as languages and compilers that support cross-platform and dynamic features, such as Java, C#, Python, etc. Compilation technology began to focus on program security and efficiency.
  • The sixth stage: In the first decade of the 21st century, Torch frameworks led by Lua emerged to solve the explosion of AI applications and AI algorithm research. Then, TensorFlow, PyTorch, MindSpore, Paddle, and other AI frameworks were introduced. With the development of AI frameworks and the AI industry, AI compilers such as AKG and MLIR have emerged.

Traditional Compiler Battle#


Currently, classic open-source compilers such as LLVM and GCC are usually divided into three parts: front-end, optimizer, and back-end.

  1. Front-End: Mainly responsible for lexical and syntax analysis, converting source code into abstract syntax trees (AST), which divide the program into basic components, check the syntax, semantics, and grammar of the code, and then generate intermediate code.
  2. Optimizer: The optimizer optimizes the intermediate code obtained from the front-end (e.g., removing redundant code, performing subexpression elimination, etc.) to make the code more efficient.
  3. Back-End: The back-end takes the optimized intermediate code and generates target code specific to the hardware, transforming it into code that includes a code optimizer and code generator.
LicenseGNU GPLApache 2.0
Code ModularityMonolithic architectureModular
Platform SupportUnix, Windows, MACUnix, MAC
Code GenerationEfficient, many compiler options availableEfficient, LLVM backend uses SSA form
Language-Independent Type SystemNoYes
Build SystemMake BaseCMake
ParserOriginally used Bison LR, now changed to recursive descent parserHand-written recursive descent parser


  • Compiler technology is a gem on the crown of computer science and is a core technology in basic software.
  • Compiler technology can recognize the vocabulary, sentences, and various specific formats and data structures in high-level language program code.
  • The compilation process converts source code programs into machine-readable binary code.
  • Traditional compilers are usually divided into three parts: front-end, optimizer, and back-end.
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.