Summary
This RFC proposes adding support for FlagOS (a unified open-source AI system software stack) to Taichi, enabling Taichi programs to run o
n various domestic AI chips including MLU (Cambricon), Ascend (Huawei), DCU (Hygon), and GCU (Enflame) through FlagOS's unified compiler infr
astructure.
Motivation
Current Situation
- Taichi currently supports NVIDIA GPUs (CUDA), AMD GPUs, and other backends
- Domestic AI chips are widely used in China but lack Taichi support
- Each chip requires individual backend development effort
Proposed Solution
Integrate with FlagOS, which provides:
- Unified Compiler (FlagTree): Single compiler targeting multiple AI chip architectures
- Multi-chip Support: MLU, Ascend, DCU, GCU, and more
- Mature Ecosystem: FlagGems (operators), FlagCX (communication), FlagScale (training)
Benefits
- Expand Taichi's Hardware Support: Access to domestic AI chip market
- Reduce Development Effort: One backend for multiple chips
- Ecosystem Integration: Connect Taichi with FlagOS ecosystem
Proposed Design
Architecture Overview
Taichi DSL → LLVM IR → FlagTree Compiler → AI Chip Binary
Key Components
-
RHI Device Layer (taichi/rhi/flagos/)
- Memory management via FlagOS runtime
- Kernel launch interface
- Multi-chip support (MLU370, Ascend910, DCU, GCU)
-
Code Generation Layer (taichi/codegen/flagos/)
- LLVM IR generation for AI chips
- SPMD execution model
- Chip-specific optimizations
-
Program Implementation (taichi/runtime/program_impls/flagos/)
- Kernel compilation via FlagTree
- Kernel launch management
Python API
import taichi as ti
# Initialize FlagOS backend
ti.init(arch=ti.flagos, flagos_chip="mlu370")
@ti.kernel
def compute():
for i in range(1000000):
pass
compute()
Implementation Plan
Phase Timeline Deliverables
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase 1 1-2 weeks Infrastructure (current PR ready)
Phase 2 2-4 weeks FlagTree SDK integration
Phase 3 4-8 weeks Advanced features & optimization
Current Status
✅ Completed:
• Core architecture (27 files, +2900 lines)
• RHI Device layer
• Code generation layer
• Program implementation
• Build system integration
• Example programs
• Documentation
🔄 Pending (requires FlagOS SDK):
• FlagTree compiler integration
• Kernel binary generation
• Hardware-specific optimizations
Testing Strategy
- Stub Testing: Use generic stub for API testing
- Mock Testing: Mock FlagTree compiler for unit tests
- CI Integration: GitHub Actions with TI_WITH_FLAGOS=ON
Questions for Discussion
- Should FlagosProgramImpl inherit from LlvmProgramImpl or be separate?
- Is stub/mock testing acceptable for initial merge without hardware?
- Who will maintain this backend long-term?
Related Links
• FlagOS: https://github.com/flagos-ai
• FlagTree: https://github.com/flagos-ai/flagtree
• Taichi FlagOS Fork: https://github.com/GWinfinity/taichi
Checklist
• [x] RFC created
• [ ] Community feedback incorporated
• [ ] Core team approval
• [ ] PR submitted
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/cc @ailzhang @k-ye @bobcao3
Would love to hear your thoughts on this proposal!
Summary
This RFC proposes adding support for FlagOS (a unified open-source AI system software stack) to Taichi, enabling Taichi programs to run o
n various domestic AI chips including MLU (Cambricon), Ascend (Huawei), DCU (Hygon), and GCU (Enflame) through FlagOS's unified compiler infr
astructure.
Motivation
Current Situation
Proposed Solution
Integrate with FlagOS, which provides:
Benefits
Proposed Design
Architecture Overview
Taichi DSL → LLVM IR → FlagTree Compiler → AI Chip Binary
Key Components
RHI Device Layer (
taichi/rhi/flagos/)Code Generation Layer (
taichi/codegen/flagos/)Program Implementation (
taichi/runtime/program_impls/flagos/)Python API
Implementation Plan
Phase Timeline Deliverables
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase 1 1-2 weeks Infrastructure (current PR ready)
Phase 2 2-4 weeks FlagTree SDK integration
Phase 3 4-8 weeks Advanced features & optimization
Current Status
✅ Completed:
• Core architecture (27 files, +2900 lines)
• RHI Device layer
• Code generation layer
• Program implementation
• Build system integration
• Example programs
• Documentation
🔄 Pending (requires FlagOS SDK):
• FlagTree compiler integration
• Kernel binary generation
• Hardware-specific optimizations
Testing Strategy
Questions for Discussion
Related Links
• FlagOS: https://github.com/flagos-ai
• FlagTree: https://github.com/flagos-ai/flagtree
• Taichi FlagOS Fork: https://github.com/GWinfinity/taichi
Checklist
• [x] RFC created
• [ ] Community feedback incorporated
• [ ] Core team approval
• [ ] PR submitted
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/cc @ailzhang @k-ye @bobcao3
Would love to hear your thoughts on this proposal!