Skip to content

feat: cfg based dataflow#2321

Open
hongjr03 wants to merge 18 commits intoMyriad-Dreamin:mainfrom
hongjr03:feat/cfg-based-dataflow
Open

feat: cfg based dataflow#2321
hongjr03 wants to merge 18 commits intoMyriad-Dreamin:mainfrom
hongjr03:feat/cfg-based-dataflow

Conversation

@hongjr03
Copy link
Copy Markdown
Contributor

@hongjr03 hongjr03 commented Dec 21, 2025

This PR introduces a reusable statement-level control-flow graph (CFG) and a generic forward/backward dataflow solver to tinymist-analysis, and migrates the “implicitly discarded by function return” lint to this new infrastructure. The new implementation is more precise, path-sensitive, and cacheable, replacing the previous ad-hoc backward traversal.

This PR is based on #2302 and should be merged after it.

I gated CFG-based lint behind lint-v2 feature (keep v1 as default) in 6673d33.

Motivation

The existing “discarded by function return” lint relied on a hand-written reverse traversal over blocks. While lightweight, it had several limitations:

  • Poor modeling of control flow (if/else, loops, break/continue)
  • No short-circuit semantics for boolean conditions
  • Heuristic warning suppression that was hard to reason about
  • Repeated recomputation of analysis for the same function bodies

This PR addresses these issues by introducing a proper CFG + dataflow foundation and reimplementing the lint on top of it.

What’s in this PR

1. New analysis infrastructure (tinymist-analysis::flow)

  • cfg: a minimal directed CFG with explicit entry / exit, stable NodeId, labeled edges, and reachability utilities.

  • dataflow: a generic worklist-based solver for forward and backward dataflow problems over join-semilattices.

  • typst: lowering from Typst AST to a statement-level CFG, with:

    • Explicit nodes for if/while/for, joins, loop headers, return, break, continue
    • Correct modeling of short-circuit boolean operators (and, or, not)
    • Well-formedness checks (all reachable nodes can reach exit)

These components are reusable by future analyses and lints.

2. Rewritten “discarded by function return” lint

  • Implemented in tinymist-lint/src/cfg.rs using the new CFG + dataflow framework.

  • Uses two backward dataflow analyses:

    1. Semantic analysis (MustReturnKind)
      Determines whether all paths from a node to exit must hit:

      • no return
      • return <value>
      • return (none)
    2. Diagnostic coverage analysis
      Ensures that warnings are emitted at most once per path, at the closest relevant statement to the return.

  • Warnings are issued only when:

    • the statement is reachable,
    • it is guaranteed to be discarded by a return,
    • and it matches the same “content-like” expression set as before.
  • Hints (let _ = ...) are preserved for hashable, non-show/set expressions.

3. Removal of legacy implementation

  • Deletes LateFuncLinter, ReturnBlockInfo, and the old reverse block traversal.
  • Simplifies loop bookkeeping (break / continue checks no longer require tracking flags).
  • The lint is now invoked once per function body (closure/contextual), after normal traversal.

4. Cross-request caching

  • Introduces LintCaches in tinymist-lint for expensive analyses.
  • Adds query-level caching keyed by (TypstFileId, source_hash) in tinymist-query.
  • CFG + dataflow results for unchanged sources are reused across lint runs, reducing latency.

5. Tests and behavior changes

  • Adds new fixtures covering:

    • return with none + set
    • return with none + show
    • return with none + plain text
  • Updates snapshots:

    • Warnings now correctly appear in cases involving break paths.
    • return none only warns for show / set, not plain text.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @hongjr03, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the linting system by introducing a Control Flow Graph (CFG) based dataflow analysis. This change aims to provide a more accurate and comprehensive mechanism for identifying implicitly discarded expressions within functions, particularly those affected by return statements. The new system also incorporates caching to optimize performance for repeated linting operations.

Highlights

  • New CFG-based Dataflow Analysis: Introduced a new cfg.rs module that implements a Control Flow Graph (CFG) for more robust dataflow analysis, specifically for detecting implicitly discarded expressions by function returns.
  • Linter Refactoring: The Linter struct in lib.rs has been refactored to integrate the new CFG-based analysis, replacing the previous LateFuncLinter and ReturnBlockInfo mechanisms. This simplifies the linter's internal logic and improves accuracy.
  • Linting Cache Implementation: A LintCaches struct and associated caching logic have been added to store results of expensive lint analyses, such as CFG construction, to improve performance by avoiding redundant computations.
  • Enhanced Discarded Expression Detection: The new CFG approach allows for more precise detection of expressions that are implicitly discarded by function returns, including set and show rules, leading to more comprehensive warnings.
  • Updated Test Fixtures and Snapshots: New test cases and updated snapshots have been added to validate the correctness of the new CFG-based linting behavior across various scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an excellent refactoring of the linting logic, replacing the previous visitor-based dataflow analysis with a more robust and maintainable Control Flow Graph (CFG) based approach. The new implementation for detecting discarded values before a function return is well-structured, with clear separation of concerns between CFG construction, dataflow analysis, and diagnostics emission. The introduction of caching for the CFG analysis is also a great addition for performance. The code is clean, well-commented, and the changes significantly improve the clarity of the linting infrastructure. I have one minor suggestion to improve the accuracy of span information in the CFG. Overall, great work!

@hongjr03 hongjr03 changed the title feat,refactor: cfg based dataflow feat: cfg based dataflow Dec 21, 2025
@hongjr03 hongjr03 force-pushed the feat/cfg-based-dataflow branch from 71da4ed to 06600ba Compare December 21, 2025 09:02
@hongjr03 hongjr03 marked this pull request as ready for review December 21, 2025 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant