Skip to content

Commit 3055b9a

Browse files
erwei-xilinxclaude
andcommitted
[AIEPlacer] Tighten capacity check: shared helper, stack, MemTile, diagnostics
Address review on PR #3044: * Lift the CoreTile/MemTile capacity dispatch into a single `AIETargetModel::getDataMemorySize(AIETileType)` helper. Both AIEAssignBuffers (basic + bank-aware) and the placer now consume it, removing three copies of the same dispatch. * Reserve `core.getStackSize()` per CoreTile LogicalTileOp at requirement- build time. AIEAssignBuffers reserves the stack before laying out buffers; the placer now matches that, so a CoreTile candidate that passes here also fits at allocation time. * Validate the per-LogicalTileOp upper bound for MemTiles too. Full per-physical-tile accumulation across MemTile-sharing LogicalTileOps is still left to AIEAssignBuffers (the channel-style accumulator is a separate follow-up), but the necessary single-tile bound is now caught here with the better diagnostic. * Track in the partial-constraint loop whether buffer capacity was the *only* reason candidates matching `col`/`row` were skipped, so the unconstrained-failure suffix names buffer capacity only when that was the actual cause. Document that on current homogeneous devices the per-candidate buffer filter is forward-looking - all CoreTiles share one capacity - so its purpose today is to feed the diagnostic. * Switch `tileMemoryCapacity` to `int64_t`, matching `getAllocationSize` and removing the `static_cast<uint64_t>` workaround. * Diagnostic now distinguishes "buffers + stack" (CoreTile) from "buffers" (MemTile), with the bytes-vs-capacity numbers reflecting the actual demand including stack. * Tests: add multi-buffer summation, physical-tile peer skip, MemTile pinned fits/overflow, and an AIE2P/npu2 path. Update the existing error wording / overflow byte counts for the stack-aware demand. * Document the remaining limitations in code comments: ObjectFifo- derived buffers are not yet visible at this pass and remain AIEAssignBuffers' responsibility; mid-migration mixed physical/ logical IR is validated only for the LogicalTileOp share. All 137 lit tests pass under test/place-tiles, test/dialect/AIE, test/assign-buffer-addresses (1 pre-existing XFAIL); mlir-air rebuilds clean against the updated shared library. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 17a5e10 commit 3055b9a

7 files changed

Lines changed: 228 additions & 45 deletions

File tree

include/aie/Dialect/AIE/IR/AIETargetModel.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,12 @@ class AIETargetModel {
257257
/// Return the size (in bytes) of the local data memory of a core.
258258
virtual uint32_t getLocalMemorySize() const = 0;
259259

260+
/// Return the size (in bytes) of the data memory addressable from the given
261+
/// tile type: `getLocalMemorySize()` for CoreTile, `getMemTileSize()` for
262+
/// MemTile, 0 otherwise. Centralizes the dispatch used by AIEAssignBuffers
263+
/// and AIEPlacer so per-tile-type capacity stays consistent across passes.
264+
uint32_t getDataMemorySize(AIETileType tileType) const;
265+
260266
/// Return the size (in bits) of the accumulator/cascade.
261267
virtual uint32_t getAccumulatorCascadeSize() const = 0;
262268

include/aie/Dialect/AIE/Transforms/AIEPlacer.h

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -114,19 +114,28 @@ class SequentialPlacer : public Placer {
114114
llvm::SmallVector<ObjectFifoCreateOp> &objectFifos,
115115
llvm::SmallVector<ObjectFifoLinkOp> &objectFifoLinks);
116116

117-
// Sum `BufferOp::getAllocationSize()` per `LogicalTileOp` (TileOp peers are
118-
// physical and validated downstream by AIEAssignBuffers).
117+
// Sum `BufferOp::getAllocationSize()` per `LogicalTileOp`, plus the stack
118+
// reservation of any attached CoreOp for CoreTile LogicalTileOps (matching
119+
// AIEAssignBuffers' downstream layout). Buffers attached to physical TileOp
120+
// peers are skipped: those are not part of any placement decision and are
121+
// still validated downstream by AIEAssignBuffers. ObjectFifo-derived buffers
122+
// are not yet visible at this pass and are not counted; AIEAssignBuffers
123+
// remains the authoritative final check.
119124
llvm::DenseMap<mlir::Operation *, int64_t>
120-
buildBufferRequirements(llvm::ArrayRef<BufferOp> buffers);
125+
buildBufferRequirements(llvm::ArrayRef<BufferOp> buffers,
126+
llvm::ArrayRef<LogicalTileOp> logicalTiles);
121127

122128
// Returns true iff placing `logicalTile` at `candidate` would keep its total
123-
// buffer allocation within the candidate tile type's local memory capacity.
129+
// buffer allocation (plus stack reservation, for CoreTiles) within the
130+
// candidate tile type's data memory capacity. Returns true unconditionally
131+
// for tile types without addressable data memory (Shim*).
124132
bool fitsBufferCapacity(LogicalTileOp logicalTile, TileID candidate,
125133
const llvm::DenseMap<mlir::Operation *, int64_t>
126134
&bufferRequirements) const;
127135

128-
// Capacity in bytes for the given physical tile (0 if no local memory).
129-
uint32_t tileMemoryCapacity(TileID tile) const;
136+
// Data memory capacity in bytes for the given physical tile (0 if the tile
137+
// has no addressable data memory).
138+
int64_t tileMemoryCapacity(TileID tile) const;
130139
};
131140

132141
} // namespace xilinx::AIE

lib/Dialect/AIE/IR/AIETargetModel.cpp

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,19 @@ uint32_t AIETargetModel::encodeFieldValue(const BitFieldInfo &field,
8484
return db->encodeFieldValue(field, value);
8585
}
8686

87+
uint32_t AIETargetModel::getDataMemorySize(AIETileType tileType) const {
88+
switch (tileType) {
89+
case AIETileType::CoreTile:
90+
return getLocalMemorySize();
91+
case AIETileType::MemTile:
92+
return getMemTileSize();
93+
case AIETileType::ShimNOCTile:
94+
case AIETileType::ShimPLTile:
95+
return 0;
96+
}
97+
llvm_unreachable("unhandled AIETileType");
98+
}
99+
87100
std::optional<uint32_t>
88101
AIETargetModel::getFieldMask(const BitFieldInfo &field) const {
89102
uint32_t width = field.getWidth();

lib/Dialect/AIE/Transforms/AIEAssignBuffers.cpp

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,7 @@ bool basicAllocation(TileOp tile) {
6161
return false;
6262

6363
const auto &targetModel = getTargetModel(tile);
64-
int maxDataMemorySize = 0;
65-
if (tile.isMemTile())
66-
maxDataMemorySize = targetModel.getMemTileSize();
67-
else
68-
maxDataMemorySize = targetModel.getLocalMemorySize();
64+
int maxDataMemorySize = targetModel.getDataMemorySize(tile.getTileType());
6965

7066
SmallVector<BufferOp> buffers;
7167
SmallVector<BufferOp> allocated_buffers;
@@ -386,11 +382,7 @@ bool simpleBankAwareAllocation(TileOp tile) {
386382
// end addresses for each bank
387383

388384
const auto &targetModel = getTargetModel(tile);
389-
int maxDataMemorySize = 0;
390-
if (tile.isMemTile())
391-
maxDataMemorySize = targetModel.getMemTileSize();
392-
else
393-
maxDataMemorySize = targetModel.getLocalMemorySize();
385+
int maxDataMemorySize = targetModel.getDataMemorySize(tile.getTileType());
394386

395387
int numBanks = targetModel.getNumBanks(tile.getCol(), tile.getRow());
396388
int bankSize = maxDataMemorySize / numBanks;

lib/Dialect/AIE/Transforms/AIEPlacer.cpp

Lines changed: 74 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ LogicalResult SequentialPlacer::place(DeviceOp device) {
137137
// tile-local memory requirements from BufferOp allocations.
138138
auto channelRequirements =
139139
buildChannelRequirements(objectFifos, objectFifoLinks);
140-
auto bufferRequirements = buildBufferRequirements(buffers);
140+
auto bufferRequirements = buildBufferRequirements(buffers, logicalTiles);
141141

142142
// Phase 3: Place constrained tiles then compute tiles
143143
size_t nextCompIdx = 0;
@@ -149,9 +149,14 @@ LogicalResult SequentialPlacer::place(DeviceOp device) {
149149
TileID tile{*col, *row};
150150
if (!fitsBufferCapacity(logicalTile, tile, bufferRequirements)) {
151151
int64_t need = bufferRequirements.lookup(logicalTile.getOperation());
152+
// Stack only contributes when this LogicalTileOp is a CoreTile with
153+
// an attached CoreOp; mention it only in that case for accuracy.
154+
bool includesStack =
155+
logicalTile.getTileType() == AIETileType::CoreTile;
152156
return logicalTile.emitError()
153157
<< "tile (" << tile.col << ", " << tile.row << ") cannot host "
154-
<< need << " bytes of buffers (capacity "
158+
<< need << " bytes of buffers"
159+
<< (includesStack ? " + stack" : "") << " (capacity "
155160
<< tileMemoryCapacity(tile) << ")";
156161
}
157162
if (failed(validateAndUpdateChannelUsage(logicalTile, tile,
@@ -170,6 +175,15 @@ LogicalResult SequentialPlacer::place(DeviceOp device) {
170175
// Place compute tiles with partial constraint support
171176
if (logicalTile.getTileType() == AIETileType::CoreTile) {
172177
std::optional<TileID> placement = std::nullopt;
178+
// Track whether buffer capacity was the *only* reason candidates
179+
// matching the partial constraints (if any) were skipped, so the
180+
// diagnostic on failure can name the actual cause. (On current devices
181+
// every CoreTile has the same `getLocalMemorySize()`, so skipping one
182+
// by buffer-capacity skips them all - this loop's per-candidate filter
183+
// is forward-looking for heterogeneous future devices and feeds the
184+
// diagnostic below.)
185+
bool sawConstraintMatch = false;
186+
bool allConstraintMatchesFailedBuffer = true;
173187

174188
for (size_t i = nextCompIdx; i < availability.compTiles.size(); ++i) {
175189
TileID candidate = availability.compTiles[i];
@@ -179,8 +193,10 @@ LogicalResult SequentialPlacer::place(DeviceOp device) {
179193
continue;
180194
if (row && candidate.row != *row)
181195
continue;
196+
sawConstraintMatch = true;
182197
if (!fitsBufferCapacity(logicalTile, candidate, bufferRequirements))
183198
continue;
199+
allConstraintMatchesFailedBuffer = false;
184200

185201
// Found valid tile - swap to nextCompIdx position and use
186202
std::swap(availability.compTiles[i],
@@ -190,17 +206,24 @@ LogicalResult SequentialPlacer::place(DeviceOp device) {
190206
}
191207

192208
if (!placement) {
193-
bool hasBuffers = bufferRequirements.count(logicalTile.getOperation());
209+
// Buffer capacity is implicated only when at least one candidate
210+
// matched the partial constraints AND every such candidate failed the
211+
// buffer filter.
212+
bool bufferWasCause =
213+
sawConstraintMatch && allConstraintMatchesFailedBuffer &&
214+
bufferRequirements.count(logicalTile.getOperation());
194215
if (col || row) {
195216
return logicalTile.emitError()
196217
<< "no compute tile available matching constraint ("
197218
<< (col ? std::to_string(*col) : "?") << ", "
198219
<< (row ? std::to_string(*row) : "?") << ")"
199-
<< (hasBuffers ? " and buffer capacity" : "");
220+
<< (bufferWasCause ? " with sufficient buffer capacity" : "");
200221
}
201222
return logicalTile.emitError()
202223
<< "no available compute tiles for placement"
203-
<< (hasBuffers ? " (buffer capacity exceeded)" : "");
224+
<< (bufferWasCause ? " (buffer capacity exceeded on every "
225+
"candidate)"
226+
: "");
204227
}
205228

206229
if (failed(validateAndUpdateChannelUsage(logicalTile, *placement,
@@ -488,46 +511,71 @@ SequentialPlacer::buildChannelRequirements(
488511
return channelRequirements;
489512
}
490513

491-
llvm::DenseMap<Operation *, int64_t>
492-
SequentialPlacer::buildBufferRequirements(ArrayRef<BufferOp> buffers) {
514+
// Sum BufferOp::getAllocationSize() per LogicalTileOp, then add the stack
515+
// reservation of any CoreOp attached to that LogicalTileOp. The stack lives in
516+
// the same local data memory as buffers (cf. AIEAssignBuffers reserving
517+
// `core.getStackSize()` bytes before laying out buffers), so a CoreTile
518+
// candidate that passes the placer must also leave room for the stack.
519+
//
520+
// Buffers attached to physical TileOps are skipped here: those are not part of
521+
// any placement decision and are still validated downstream by
522+
// AIEAssignBuffers. Mid-migration IR mixing physical TileOp and LogicalTileOp
523+
// peers on the same eventual physical tile is therefore validated only for the
524+
// LogicalTileOp share at this stage; AIEAssignBuffers catches the rest.
525+
//
526+
// ObjectFifo-derived buffers are not yet visible at this pass (they are
527+
// materialized later by objectfifo lowering) and are not counted here. A tile
528+
// that passes this filter can still overflow once those buffers materialize;
529+
// AIEAssignBuffers remains the authoritative final check.
530+
llvm::DenseMap<Operation *, int64_t> SequentialPlacer::buildBufferRequirements(
531+
ArrayRef<BufferOp> buffers, ArrayRef<LogicalTileOp> logicalTiles) {
493532
llvm::DenseMap<Operation *, int64_t> bufferRequirements;
533+
534+
// Seed with stack reservations for CoreTile LogicalTileOps that already
535+
// have a CoreOp user. Without a CoreOp the stack attribute does not yet
536+
// exist, so don't pre-reserve.
537+
for (auto lt : logicalTiles) {
538+
if (lt.getTileType() != AIETileType::CoreTile)
539+
continue;
540+
for (Operation *user : lt.getResult().getUsers()) {
541+
if (auto core = dyn_cast<CoreOp>(user)) {
542+
bufferRequirements[lt.getOperation()] += core.getStackSize();
543+
break;
544+
}
545+
}
546+
}
547+
494548
for (auto buf : buffers) {
495549
auto *defOp = buf.getTile().getDefiningOp();
496-
// Only LogicalTileOp endpoints participate in placement decisions.
497-
// TileOp peers are physical and validated downstream by AIEAssignBuffers.
498550
if (!isa_and_nonnull<LogicalTileOp>(defOp))
499551
continue;
500552
bufferRequirements[defOp] += buf.getAllocationSize();
501553
}
502554
return bufferRequirements;
503555
}
504556

505-
uint32_t SequentialPlacer::tileMemoryCapacity(TileID tile) const {
506-
switch (targetModel->getTileType(tile.col, tile.row)) {
507-
case AIETileType::CoreTile:
508-
return targetModel->getLocalMemorySize();
509-
case AIETileType::MemTile:
510-
return targetModel->getMemTileSize();
511-
default:
512-
return 0;
513-
}
557+
int64_t SequentialPlacer::tileMemoryCapacity(TileID tile) const {
558+
return targetModel->getDataMemorySize(
559+
targetModel->getTileType(tile.col, tile.row));
514560
}
515561

516562
bool SequentialPlacer::fitsBufferCapacity(
517563
LogicalTileOp logicalTile, TileID candidate,
518564
const llvm::DenseMap<Operation *, int64_t> &bufferRequirements) const {
519-
// MemTiles are shared across LogicalTileOps in the existing placer (see
520-
// `removeTile` skipping non-CoreTile). Validating accumulated buffer bytes
521-
// there requires per-physical-tile tracking (cf. inputChannelsUsed); leave
522-
// that to AIEAssignBuffers and a follow-up PR. CoreTiles are 1:1 so the
523-
// per-LogicalTileOp sum is the actual physical-tile demand.
524-
if (targetModel->getTileType(candidate.col, candidate.row) !=
525-
AIETileType::CoreTile)
565+
// Both CoreTile and MemTile have finite data memory. CoreTile is 1:1 with a
566+
// LogicalTileOp, so the per-LogicalTileOp sum *is* the physical-tile demand
567+
// and we validate exactly. MemTile may host buffers from multiple
568+
// LogicalTileOps (cf. `removeTile` skipping non-CoreTile and the channel
569+
// accumulation in `inputChannelsUsed`); we still validate the per-tile sum
570+
// as a *necessary* upper bound here, but full per-physical-tile accumulation
571+
// is left to AIEAssignBuffers and a follow-up PR.
572+
AIETileType type = targetModel->getTileType(candidate.col, candidate.row);
573+
if (type != AIETileType::CoreTile && type != AIETileType::MemTile)
526574
return true;
527575
auto it = bufferRequirements.find(logicalTile.getOperation());
528576
if (it == bufferRequirements.end())
529577
return true;
530-
return static_cast<uint64_t>(it->second) <= tileMemoryCapacity(candidate);
578+
return it->second <= tileMemoryCapacity(candidate);
531579
}
532580

533581
void SequentialPlacer::buildObjectFifoGroups(

test/place-tiles/sequential_placer/test_place_buffers.mlir

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,76 @@ module @buffer_fits_unconstrained {
3636
// CHECK-NOT: aie.logical_tile
3737
}
3838
}
39+
40+
// -----
41+
42+
// Multiple buffers on the same logical tile sum together against capacity.
43+
// 4 * memref<8192xi32> = 4 * 32KB = 128KB of buffers + 1KB stack would exceed
44+
// npu1's 64KB CoreTile L1, but each buffer is 32KB on its own. This case uses
45+
// 3 * 8KB = 24KB + 1KB stack = 25KB which fits comfortably; it pins the
46+
// summation behaviour without overflowing.
47+
// CHECK-LABEL: @multi_buffer_sum_fits
48+
module @multi_buffer_sum_fits {
49+
aie.device(npu1) {
50+
// CHECK-DAG: %[[T:.*]] = aie.tile(2, 3)
51+
%t = aie.logical_tile<CoreTile>(2, 3)
52+
%b0 = aie.buffer(%t) : memref<2048xi32>
53+
%b1 = aie.buffer(%t) : memref<2048xi32>
54+
%b2 = aie.buffer(%t) : memref<2048xi32>
55+
aie.core(%t) { aie.end }
56+
// CHECK-NOT: aie.logical_tile
57+
}
58+
}
59+
60+
// -----
61+
62+
// Buffer attached to a physical aie.tile (not aie.logical_tile) is silently
63+
// ignored by the placer's buffer-capacity check: such buffers are not part of
64+
// any placement decision and remain AIEAssignBuffers' responsibility. The
65+
// 256KB physical-tile buffer here would overflow if it were counted, but the
66+
// placer must accept this IR unchanged (a separate logical_tile with a small
67+
// buffer placed normally proves the rest of the pipeline still runs).
68+
// CHECK-LABEL: @physical_tile_buffer_skipped
69+
module @physical_tile_buffer_skipped {
70+
aie.device(npu1) {
71+
// CHECK-DAG: %[[PT:.*]] = aie.tile(2, 3)
72+
%pt = aie.tile(2, 3)
73+
%big = aie.buffer(%pt) : memref<65536xi32>
74+
// CHECK-DAG: %[[LT:.*]] = aie.tile(2, 4)
75+
%lt = aie.logical_tile<CoreTile>(2, 4)
76+
%small = aie.buffer(%lt) : memref<128xi32>
77+
aie.core(%lt) { aie.end }
78+
// CHECK-NOT: aie.logical_tile
79+
}
80+
}
81+
82+
// -----
83+
84+
// MemTile-pinned LogicalTileOp with a buffer that fits in the MemTile capacity
85+
// (npu1 MemTile = 512KB). Validates that MemTile placements pass the new
86+
// per-LogicalTileOp upper-bound check.
87+
// CHECK-LABEL: @memtile_buffer_fits
88+
module @memtile_buffer_fits {
89+
aie.device(npu1) {
90+
// CHECK-DAG: %[[MT:.*]] = aie.tile(0, 1)
91+
%mt = aie.logical_tile<MemTile>(0, 1)
92+
%b = aie.buffer(%mt) : memref<1024xi32>
93+
// CHECK-NOT: aie.logical_tile
94+
}
95+
}
96+
97+
// -----
98+
99+
// Same shape on AIE2P (npu2), exercising the target-model dispatch on a
100+
// different architecture. npu2 CoreTile L1 is also 64KB; a 4KB buffer plus
101+
// 1KB default stack fits with room to spare.
102+
// CHECK-LABEL: @buffer_fits_npu2
103+
module @buffer_fits_npu2 {
104+
aie.device(npu2) {
105+
// CHECK-DAG: %[[T:.*]] = aie.tile(0, 2)
106+
%t = aie.logical_tile<CoreTile>(?, ?)
107+
%b = aie.buffer(%t) : memref<1024xi32>
108+
aie.core(%t) { aie.end }
109+
// CHECK-NOT: aie.logical_tile
110+
}
111+
}

0 commit comments

Comments
 (0)