Skip to content

feat(sqlparser): add expr_core module with Pratt parser combinator#25214

Open
TennyZhuang wants to merge 13 commits intorisingwavelabs:mainfrom
TennyZhuang:feat/expr-core-foundation
Open

feat(sqlparser): add expr_core module with Pratt parser combinator#25214
TennyZhuang wants to merge 13 commits intorisingwavelabs:mainfrom
TennyZhuang:feat/expr-core-foundation

Conversation

@TennyZhuang
Copy link
Copy Markdown
Contributor

This PR adds a new expr_core module to the parser_v2 crate, implementing expression parsing using winnow'''s Pratt parser combinator.

Features

  • Core expression parser using winnow::combinator::expression
  • Support for LIKE/ILIKE operators
  • Compound and quoted identifier support
  • Complete IS family: IS [NOT] NULL/TRUE/FALSE/UNKNOWN
  • 35 unit tests covering literals, identifiers, operators, and precedence

Implementation Details

  • Pure parser_v2 implementation with zero fallback to v1 parser
  • Uses Prefix, Postfix, and Infix combinators for operator handling
  • Precedence levels carefully designed to match SQL semantics

Not Yet Implemented (Future Work)

  • NOT LIKE / NOT ILIKE (requires multi-token infix handling)
  • BETWEEN / IN
  • AT TIME ZONE / :: (cast)

Testing

  • All 35 expr_core tests pass
  • All existing tests (177 total) still pass
  • Zero v1 parser fallback maintained

This is a foundation milestone for gradually migrating expression parsing to combinator style.

…ombinator

Add expr_core.rs with pure parser_v2 implementation:
- expr_core(): Pratt parser using winnow::combinator::expression
- atom_core(): literals, identifiers, parenthesized expressions
- prefix_op_core(): unary +, -
- infix_op_core(): +, -, *, /, %, ^, comparisons, AND, OR

Key properties:
- Zero fallback to v1 parser (no parse_v1 calls)
- 12 unit tests covering literals, identifiers, operators, precedence
- All existing tests still pass

This is the foundation for gradually migrating expression parsing
to combinator style without circular dependencies.
Add LIKE and ILIKE operators to the core expression parser:
- LIKE_POWER precedence level (12), between comparisons and AND
- Constructs Expr::Like and Expr::ILike AST nodes
- No NOT LIKE, no ESCAPE clause (as per scope guard)

Add 5 new tests:
- test_like: basic LIKE parsing
- test_ilike: basic ILIKE parsing
- test_like_precedence_with_and: verifies LIKE < AND
- test_like_with_arithmetic: verifies + > LIKE
- test_like_parenthesized_with_or: verifies parens work

Total: 17 expr_core tests passing, all existing tests still pass.
Expand identifier coverage in expr_core:
- Compound identifiers: foo.bar, schema.table.col
- Quoted identifiers: "CaseSensitive"
- Accept keywords as identifier parts (e.g., 'table' in schema.table)

Add 5 new tests:
- test_compound_identifier: foo.bar
- test_compound_identifier_three_parts: schema.table.col
- test_quoted_identifier: "CaseSensitive"
- test_compound_identifier_with_expression: foo.bar + 1
- test_compound_identifier_with_like: schema.tbl.col ilike 'x'

Total: 22 expr_core tests passing, all existing tests still pass.
Add IS NULL and IS NOT NULL postfix operators:
- IS_NULL_POWER = 10 (between LIKE and AND)
- postfix_op_core() to handle IS NULL / IS NOT NULL
- Uses Postfix combinator from winnow

Add 5 new tests:
- test_is_null: basic IS NULL
- test_is_not_null: basic IS NOT NULL
- test_is_null_precedence_with_and: verifies IS NULL < AND
- test_is_null_with_arithmetic: verifies + > IS NULL
- test_is_null_parenthesized_with_or: verifies parens work

Total: 27 expr_core tests passing, all existing tests still pass.
Extend IS family in expr_core:
- IS TRUE / IS NOT TRUE
- IS FALSE / IS NOT FALSE
- IS UNKNOWN / IS NOT UNKNOWN

Refactor postfix_op_core to use cleaner keyword matching pattern.

Add 8 new tests:
- test_is_true / test_is_not_true
- test_is_false / test_is_not_false
- test_is_unknown / test_is_not_unknown
- test_is_true_precedence_with_and
- test_is_false_precedence_with_or

Total: 35 expr_core tests passing, all existing tests still pass.
@TennyZhuang TennyZhuang force-pushed the feat/expr-core-foundation branch from c34f60f to 32b02af Compare March 31, 2026 16:11
TennyZhuang and others added 7 commits April 1, 2026 00:31
The expr_core module is a foundation for future expression parsing
migration but has no call sites yet. Remove the pub(crate) re-export
to avoid unused import warnings, and add #![allow(dead_code)] to the
module since the functions are only exercised by tests for now.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace redundant pattern matching with .is_ok() to satisfy
  clippy::redundant_pattern_matching lint
- Escape [NOT] in doc comments to prevent rustdoc from interpreting
  them as unresolved links

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r.rs

Rewrites expr_core to handle ALL SQL expression operators using winnow's
expression combinator. Atoms are parsed via the v1 bridge (parse_prefix),
while all infix and postfix operators are handled natively by the Pratt
parser. This replaces the v1 fallback in expr.rs's expr_parse function.

New operators added: NOT prefix, XOR, BETWEEN, IN, NOT LIKE/ILIKE,
SIMILAR TO, NOT SIMILAR TO, NOT BETWEEN, NOT IN, IS DISTINCT FROM,
IS NOT DISTINCT FROM, IS [NOT] JSON, AT TIME ZONE, :: cast, [] subscript,
ISNULL, NOTNULL, OPERATOR(...), custom Op, Pipe, ALL/ANY/SOME, ESCAPE
clause for LIKE/ILIKE/SIMILAR TO. Also fixes negative literal normalization
and reserved keyword rejection (both handled by v1 parse_prefix).

All 295 tests pass. Clippy and rustdoc clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All expression types are now implemented and used, so the allow
attribute is no longer needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix import ordering and closure formatting caught by `cargo fmt --check` in CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@TennyZhuang
Copy link
Copy Markdown
Contributor Author

@xiangjinwu PTAL, a step to the full combinator based parser.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR switches parser_v2’s top-level expression parsing entrypoint to a new Pratt-based implementation (expr_core) built on winnow::combinator::expression, aiming to provide v2-native operator precedence parsing and broader operator coverage.

Changes:

  • Added parser_v2::expr_core module implementing Pratt parsing with postfix/infix operator handling.
  • Updated parser_v2::expr to use expr_core instead of delegating full expression parsing to v1.
  • Added a dedicated unit test suite for expr_core covering literals, identifiers, operators, and precedence.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/sqlparser/src/parser_v2/mod.rs Registers the new expr_core module in parser_v2.
src/sqlparser/src/parser_v2/expr.rs Routes expr_parse to the new Pratt-based expr_core parser.
src/sqlparser/src/parser_v2/expr_core.rs Implements Pratt parsing (postfix/infix) and adds unit tests for core expression cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +491 to +533
Token::Eq => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| {
Ok(Expr::BinaryOp {
left: Box::new(a),
op: BinaryOperator::Eq,
right: Box::new(b),
})
})),
Token::Neq => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| {
Ok(Expr::BinaryOp {
left: Box::new(a),
op: BinaryOperator::NotEq,
right: Box::new(b),
})
})),
Token::Lt => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| {
Ok(Expr::BinaryOp {
left: Box::new(a),
op: BinaryOperator::Lt,
right: Box::new(b),
})
})),
Token::Gt => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| {
Ok(Expr::BinaryOp {
left: Box::new(a),
op: BinaryOperator::Gt,
right: Box::new(b),
})
})),
Token::LtEq => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| {
Ok(Expr::BinaryOp {
left: Box::new(a),
op: BinaryOperator::LtEq,
right: Box::new(b),
})
})),
Token::GtEq => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| {
Ok(Expr::BinaryOp {
left: Box::new(a),
op: BinaryOperator::GtEq,
right: Box::new(b),
})
})),

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison operators (=, <>, <, >, <=, >=) are defined as Infix::Neither here, but the existing v1 parser treats them as regular left-associative binary operators (it will parse chains like a = b = c as (a = b) = c). Using Neither changes parsing behavior and may break queries that rely on the v1 behavior (even if they later fail type checking). Consider switching these to Infix::Left to match v1/PostgreSQL associativity, or add explicit tests + justification if the behavior change is intentional.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +19
//! This module provides a v2 expression parser that uses winnow's `expression`
//! combinator for operator precedence parsing. Atoms (primary expressions
//! including prefix operators like NOT and unary +/-) are parsed via the v1
//! bridge, while all infix and postfix operators are handled natively by the
//! Pratt parser.
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level docs and PR description claim “zero fallback to v1 / pure parser_v2”, but expr_core still delegates large parts of parsing to v1 via parse_v1 (atoms via parse_prefix, plus BETWEEN/IN/::/[]/custom operators, etc.). Either update the documentation/PR description to reflect the current hybrid approach, or complete the v2 implementations so the claim is accurate.

Copilot uses AI. Check for mistakes.
Comment on lines +383 to +409
// Try negated forms (IS NOT ...)
if Keyword::NOT.parse_next(input).is_ok() {
if Keyword::TRUE.parse_next(input).is_ok() {
return Ok(Expr::IsNotTrue(Box::new(expr)));
}
if Keyword::FALSE.parse_next(input).is_ok() {
return Ok(Expr::IsNotFalse(Box::new(expr)));
}
if Keyword::UNKNOWN.parse_next(input).is_ok() {
return Ok(Expr::IsNotUnknown(Box::new(expr)));
}
if Keyword::NULL.parse_next(input).is_ok() {
return Ok(Expr::IsNotNull(Box::new(expr)));
}
if Keyword::DISTINCT.parse_next(input).is_ok() {
cut_err(Keyword::FROM).parse_next(input)?;
let expr2 = input.parse_v1(|p| p.parse_expr())?;
return Ok(Expr::IsNotDistinctFrom(Box::new(expr), Box::new(expr2)));
}
if Keyword::JSON.parse_next(input).is_ok() {
return input.parse_v1(|p| p.parse_is_json(expr, true));
}
}

// Nothing matched after IS [NOT]
fail.parse_next(input)
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_is_suffix falls back to fail when the IS suffix doesn’t match any supported form. In winnow, fail produces a backtracking error, which risks the Pratt parser treating IS as “not an operator here” instead of emitting a committed/clear syntax error after IS has already been consumed. Consider returning a cut error (or using cut_err(...)) with an explicit expectation message for the allowed IS [NOT] ... forms.

Copilot uses AI. Check for mistakes.
…h v1

Change comparison operators (=, <>, <, >, <=, >=) from Infix::Neither
to Infix::Left so that expressions like `a = b = c` parse as `(a = b) = c`,
matching v1 parser behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@TennyZhuang
Copy link
Copy Markdown
Contributor Author

TennyZhuang commented Apr 8, 2026

@xiangjinwu

Hi! This PR is ready for review. Quick summary of what it does:

  • Introduces a Pratt parser (expr_core) using winnow::combinator::expression for operator precedence parsing in parser_v2
  • Covers all infix/postfix operators (comparisons, arithmetic, LIKE/ILIKE, SIMILAR TO, BETWEEN, IN, IS variants, AT TIME ZONE, ::, [], etc.)
  • Wires expr_parse to use the new Pratt parser, replacing the v1 fallback
  • All 295 tests pass, clippy and rustdoc clean

Would appreciate a review when you get a chance. Thanks! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants