feat(go/plugins/middleware): rework filesystem middleware#5202
Open
feat(go/plugins/middleware): rework filesystem middleware#5202
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request enhances the filesystem middleware by implementing a file state cache and path-based locking to ensure consistency and prevent race conditions. The read_file tool is updated to support line-based slicing with offset and limit parameters, and the search_and_replace tool is replaced by a structured edit_file tool. New safety measures include a "read-before-write" requirement and staleness checks to detect external file modifications. Review feedback highlights the need for a more efficient and accurate line-slicing implementation, potential memory risks when reading large files, and the importance of precise output formatting to maintain byte-for-byte matching for the model.
- sliceLines: byte-position slicing instead of Split/Join — preserves line terminators inside the window and avoids the string round-trip. - read_file: acquire per-path lock around stat → cache-check → read → cache-set. genkit runs tool calls concurrently, so a parallel read_file/edit_file on the same path could otherwise let the read clobber the edit's cache update with stale state. - read_file: add a 10 MiB hard cap on file size before root.ReadFile. Even sliced reads materialize the full file, so without this a 1 GB log would OOM the process before sliceLines runs. - read_file: drop the fabricated "\n" before </read_file>. When file content already ends in a newline, the file's own terminator separates it from the close tag; when it doesn't, manufacturing one invites the model to copy it into edit_file.oldString and miss the byte-for-byte match.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rebuilds the filesystem middleware around the patterns established by tool-calling code agents: structured edits, a per-call file-state cache, and explicit safety guards. The previous SEARCH/REPLACE text-block parser is replaced with an
edit_filetool that takes JSON-shaped edits — no markers, no whitespace edge cases. Read/write/edit are now coordinated through a shared cache that enforces read-before-edit, detects external modifications, and dedups unchanged re-reads.Tool surface
list_filesEntries now report byte size for files (omitted for directories):
{Path: "todo.txt", IsDirectory: false, SizeBytes: 412} {Path: "docs", IsDirectory: true}read_fileReturns the file via the deferred user-message channel with a
<read_file>envelope that carries metadata. New optionaloffset(1-indexed line) andlimitparameters select a window. The header includestotalLinesand (for sliced reads)lines="X-Y"so the model can reason about scope. A 256 KiB byte cap on full reads bails large files with a hint to useoffset/limit.Re-reading the same file at the same range with no on-disk modification returns a stub instead of re-injecting the bytes:
write_fileCreating a new file works as before. Overwriting an existing file now requires a prior
read_filein this session — the cache must contain an entry whose mtime matches disk. This catches the lost-update class of bug where the model writes over an external change it never saw. The cache is updated after a successful write.edit_file(replacessearch_and_replace)Takes structured edits — no markers, no parsing:
{ "filePath": "todo.txt", "edits": [ {"oldString": "- [ ] Write a smoke test for /health.\n", "newString": ""}, {"oldString": "TODO\n====", "newString": "TODO\n====\nLast updated: 2026-04-28"} ] }Per-edit
replaceAllhandles renames. Edits apply sequentially — each sees the result of the previous. Same require-read-first and mtime-staleness guards aswrite_file.applyEditrules:oldString→ erroroldString == newString→ errorreplaceAll→ error citing match countstrings.Replace(one) orstrings.ReplaceAll(all)Safety guarantees
edit_fileand overwrite-write_filerefuse without a cache entryoldStringmatches multiple locations silentlyreplaceAllor more contextoldString == newStringoffset/limitAll guards return actionable error messages so the model can self-correct on the next turn.
Internal additions
fileStateCache: per-call, FIFO-bounded (200 entries)path → {content, mtime, offset, limit}map. Allocated inNew(), scoped to oneHooksinstance.pathLocks:path → *sync.Mutexfor serializing read-modify-write.countLines,sliceLines: helpers forread_filemetadata and windowing.What was deliberately deferred
tool_resultinstead of a deferred user message.read_filestill injects content as a separate user message on the next turn (viaenqueueParts). Putting it directly inMultipartToolResponseis structurally cleaner — single block per read, no protocol-awkward consecutive user messages — but works today and is non-breaking to defer.read_fileoutput. Useful for reasoning about line positions, but pairs with a footgun (the model can copy line-number prefixes into edit blocks where they don't match disk bytes). Structured edits don't need them.offset/limit, butedit_filedoesn't yet refuse edits whoseoldStringfalls outside a partial-read window. Mtime check is sufficient for the common case.Sample
samples/basic-middleware/filesystemexercises the full surface (read_file → edit_file witholdString→""to delete) against a tiny workspace.MaxTurnsbumped from 12 to 20 to give the model room to recover from any first-attempt format errors during interactive testing.Test coverage
24 tests, all passing. New coverage:
replaceAll, equal old/new, empty old, deletion via empty newedit_filehappy path,edit_filefailure surfacing, require-read-first on edit, require-read-first on overwrite, external-modification detection, dedup stub,read_fileoffset/limit,list_filessize metadata,read_fileline metadata