feat: cfg based dataflow by hongjr03 · Pull Request #2321 · Myriad-Dreamin/tinymist

hongjr03 · 2025-12-21T06:51:36Z

This PR introduces a reusable statement-level control-flow graph (CFG) and a generic forward/backward dataflow solver to tinymist-analysis, and migrates the “implicitly discarded by function return” lint to this new infrastructure. The new implementation is more precise, path-sensitive, and cacheable, replacing the previous ad-hoc backward traversal.

This PR is based on #2302 and should be merged after it.

I gated CFG-based lint behind lint-v2 feature (keep v1 as default) in 6673d33.

Motivation

The existing “discarded by function return” lint relied on a hand-written reverse traversal over blocks. While lightweight, it had several limitations:

Poor modeling of control flow (if/else, loops, break/continue)
No short-circuit semantics for boolean conditions
Heuristic warning suppression that was hard to reason about
Repeated recomputation of analysis for the same function bodies

This PR addresses these issues by introducing a proper CFG + dataflow foundation and reimplementing the lint on top of it.

What’s in this PR

1. New analysis infrastructure (`tinymist-analysis::flow`)

cfg: a minimal directed CFG with explicit entry / exit, stable NodeId, labeled edges, and reachability utilities.
dataflow: a generic worklist-based solver for forward and backward dataflow problems over join-semilattices.
typst: lowering from Typst AST to a statement-level CFG, with:
- Explicit nodes for if/while/for, joins, loop headers, return, break, continue
- Correct modeling of short-circuit boolean operators (and, or, not)
- Well-formedness checks (all reachable nodes can reach exit)

These components are reusable by future analyses and lints.

2. Rewritten “discarded by function return” lint

Implemented in tinymist-lint/src/cfg.rs using the new CFG + dataflow framework.
Uses two backward dataflow analyses:
1. Semantic analysis (MustReturnKind)
  Determines whether all paths from a node to exit must hit:
  - no return
  - return <value>
  - return (none)
2. Diagnostic coverage analysis
  Ensures that warnings are emitted at most once per path, at the closest relevant statement to the return.
Warnings are issued only when:
- the statement is reachable,
- it is guaranteed to be discarded by a return,
- and it matches the same “content-like” expression set as before.
Hints (let _ = ...) are preserved for hashable, non-show/set expressions.

3. Removal of legacy implementation

Deletes LateFuncLinter, ReturnBlockInfo, and the old reverse block traversal.
Simplifies loop bookkeeping (break / continue checks no longer require tracking flags).
The lint is now invoked once per function body (closure/contextual), after normal traversal.

4. Cross-request caching

Introduces LintCaches in tinymist-lint for expensive analyses.
Adds query-level caching keyed by (TypstFileId, source_hash) in tinymist-query.
CFG + dataflow results for unchanged sources are reused across lint runs, reducing latency.

5. Tests and behavior changes

Adds new fixtures covering:
- return with none + set
- return with none + show
- return with none + plain text
Updates snapshots:
- Warnings now correctly appear in cases involving break paths.
- return none only warns for show / set, not plain text.

…ting compiler/lint diagnostics

…iagnostic management

…mpiler diagnostics and a new `LintHook` for lint-specific diagnostics.

…, replacing individual handlers for diagnostics, linting, preview, and export.

…ector initialization.

gemini-code-assist · 2025-12-21T06:51:58Z

Summary of Changes

Hello @hongjr03, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the linting system by introducing a Control Flow Graph (CFG) based dataflow analysis. This change aims to provide a more accurate and comprehensive mechanism for identifying implicitly discarded expressions within functions, particularly those affected by return statements. The new system also incorporates caching to optimize performance for repeated linting operations.

Highlights

New CFG-based Dataflow Analysis: Introduced a new cfg.rs module that implements a Control Flow Graph (CFG) for more robust dataflow analysis, specifically for detecting implicitly discarded expressions by function returns.
Linter Refactoring: The Linter struct in lib.rs has been refactored to integrate the new CFG-based analysis, replacing the previous LateFuncLinter and ReturnBlockInfo mechanisms. This simplifies the linter's internal logic and improves accuracy.
Linting Cache Implementation: A LintCaches struct and associated caching logic have been added to store results of expensive lint analyses, such as CFG construction, to improve performance by avoiding redundant computations.
Enhanced Discarded Expression Detection: The new CFG approach allows for more precise detection of expressions that are implicitly discarded by function returns, including set and show rules, leading to more comprehensive warnings.
Updated Test Fixtures and Snapshots: New test cases and updated snapshots have been added to validate the correctness of the new CFG-based linting behavior across various scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is an excellent refactoring of the linting logic, replacing the previous visitor-based dataflow analysis with a more robust and maintainable Control Flow Graph (CFG) based approach. The new implementation for detecting discarded values before a function return is well-structured, with clear separation of concerns between CFG construction, dataflow analysis, and diagnostics emission. The introduction of caching for the CFG analysis is also a great addition for performance. The code is clean, well-commented, and the changes significantly improve the clarity of the linting infrastructure. I have one minor suggestion to improve the accuracy of span information in the CFG. Overall, great work!

crates/tinymist-lint/src/cfg.rs

…fault)

hongjr03 added 8 commits December 16, 2025 19:55

feat: enhance diagnostics handling by introducing DiagKind and separa…

b1871fb

…ting compiler/lint diagnostics

feat: add clear_diagnostics method to CompileHandlerImpl for better d…

fa44328

…iagnostic management

refactor: add hook scaffolding

49fb261

feat: split lint hook from compiler diagnostics

0b70b0a

feat: Separate diagnostic handling into a dedicated DiagHook for co…

f35f25f

…mpiler diagnostics and a new `LintHook` for lint-specific diagnostics.

refactor: Unify compile event handling with a new CompileHook trait…

92aef00

…, replacing individual handlers for diagnostics, linting, preview, and export.

docs: Add documentation comments to project hooks and refactor hook v…

c62159e

…ector initialization.

style: suppress unused_mut warning for hooks vector initialization

d805281

gemini-code-assist bot reviewed Dec 21, 2025

View reviewed changes

crates/tinymist-lint/src/cfg.rs Outdated Show resolved Hide resolved

hongjr03 changed the title ~~feat,refactor: cfg based dataflow~~ feat: cfg based dataflow Dec 21, 2025

hongjr03 mentioned this pull request Dec 21, 2025

feat: cfg-based unreachable diagnostics #2316

Closed

hongjr03 added 5 commits December 21, 2025 16:55

refactor(lint): replace DataFlowVisitor with CFG-based dataflow

e4a9096

cache

06da3f8

test

815aea1

clippy

3198e05

resolve review

06600ba

hongjr03 force-pushed the feat/cfg-based-dataflow branch from 71da4ed to 06600ba Compare December 21, 2025 09:02

hongjr03 added 4 commits December 21, 2025 20:58

feat: introduce dataflow analysis infrastructure

f7a65f7

feat: enhance CFG with edge payloads and improve dataflow analysis

9dffb55

fmt

9fda561

feat: add LintRequest for lint diagnostics and update check handling

c877874

hongjr03 marked this pull request as ready for review December 21, 2025 15:21

hongjr03 requested review from Enter-tainer and Myriad-Dreamin as code owners December 21, 2025 15:21

feat(lint): gate CFG-based lint behind lint-v2 feature (keep v1 as de…

6673d33

…fault)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cfg based dataflow#2321

feat: cfg based dataflow#2321
hongjr03 wants to merge 18 commits intoMyriad-Dreamin:mainfrom
hongjr03:feat/cfg-based-dataflow

hongjr03 commented Dec 21, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hongjr03 commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

What’s in this PR

1. New analysis infrastructure (tinymist-analysis::flow)

2. Rewritten “discarded by function return” lint

3. Removal of legacy implementation

4. Cross-request caching

5. Tests and behavior changes

Uh oh!

gemini-code-assist bot commented Dec 21, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hongjr03 commented Dec 21, 2025 •

edited

Loading

1. New analysis infrastructure (`tinymist-analysis::flow`)