Skip to content

[RFC] Add FlagOS Backend Support for Multi-Chip AI Accelerators #8786

@GWinfinity

Description

@GWinfinity

Summary

This RFC proposes adding support for FlagOS (a unified open-source AI system software stack) to Taichi, enabling Taichi programs to run o
n various domestic AI chips including MLU (Cambricon), Ascend (Huawei), DCU (Hygon), and GCU (Enflame) through FlagOS's unified compiler infr
astructure.

Motivation

Current Situation

  • Taichi currently supports NVIDIA GPUs (CUDA), AMD GPUs, and other backends
  • Domestic AI chips are widely used in China but lack Taichi support
  • Each chip requires individual backend development effort

Proposed Solution

Integrate with FlagOS, which provides:

  • Unified Compiler (FlagTree): Single compiler targeting multiple AI chip architectures
  • Multi-chip Support: MLU, Ascend, DCU, GCU, and more
  • Mature Ecosystem: FlagGems (operators), FlagCX (communication), FlagScale (training)

Benefits

  1. Expand Taichi's Hardware Support: Access to domestic AI chip market
  2. Reduce Development Effort: One backend for multiple chips
  3. Ecosystem Integration: Connect Taichi with FlagOS ecosystem

Proposed Design

Architecture Overview

Taichi DSL → LLVM IR → FlagTree Compiler → AI Chip Binary

Key Components

  1. RHI Device Layer (taichi/rhi/flagos/)

    • Memory management via FlagOS runtime
    • Kernel launch interface
    • Multi-chip support (MLU370, Ascend910, DCU, GCU)
  2. Code Generation Layer (taichi/codegen/flagos/)

    • LLVM IR generation for AI chips
    • SPMD execution model
    • Chip-specific optimizations
  3. Program Implementation (taichi/runtime/program_impls/flagos/)

    • Kernel compilation via FlagTree
    • Kernel launch management

Python API

  import taichi as ti

  # Initialize FlagOS backend
  ti.init(arch=ti.flagos, flagos_chip="mlu370")

  @ti.kernel
  def compute():
      for i in range(1000000):
          pass

  compute()

Implementation Plan

Phase Timeline Deliverables
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase 1 1-2 weeks Infrastructure (current PR ready)
Phase 2 2-4 weeks FlagTree SDK integration
Phase 3 4-8 weeks Advanced features & optimization

Current Status

✅ Completed:

• Core architecture (27 files, +2900 lines)
• RHI Device layer
• Code generation layer
• Program implementation
• Build system integration
• Example programs
• Documentation

🔄 Pending (requires FlagOS SDK):

• FlagTree compiler integration
• Kernel binary generation
• Hardware-specific optimizations

Testing Strategy

  1. Stub Testing: Use generic stub for API testing
  2. Mock Testing: Mock FlagTree compiler for unit tests
  3. CI Integration: GitHub Actions with TI_WITH_FLAGOS=ON

Questions for Discussion

  1. Should FlagosProgramImpl inherit from LlvmProgramImpl or be separate?
  2. Is stub/mock testing acceptable for initial merge without hardware?
  3. Who will maintain this backend long-term?

Related Links

• FlagOS: https://github.com/flagos-ai
• FlagTree: https://github.com/flagos-ai/flagtree
• Taichi FlagOS Fork: https://github.com/GWinfinity/taichi

Checklist

• [x] RFC created
• [ ] Community feedback incorporated
• [ ] Core team approval
• [ ] PR submitted

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/cc @ailzhang @k-ye @bobcao3

Would love to hear your thoughts on this proposal!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Untriaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions