feat(sqlparser): add expr_core module with Pratt parser combinator#25214
feat(sqlparser): add expr_core module with Pratt parser combinator#25214TennyZhuang wants to merge 13 commits intorisingwavelabs:mainfrom
Conversation
…ombinator Add expr_core.rs with pure parser_v2 implementation: - expr_core(): Pratt parser using winnow::combinator::expression - atom_core(): literals, identifiers, parenthesized expressions - prefix_op_core(): unary +, - - infix_op_core(): +, -, *, /, %, ^, comparisons, AND, OR Key properties: - Zero fallback to v1 parser (no parse_v1 calls) - 12 unit tests covering literals, identifiers, operators, precedence - All existing tests still pass This is the foundation for gradually migrating expression parsing to combinator style without circular dependencies.
Add LIKE and ILIKE operators to the core expression parser: - LIKE_POWER precedence level (12), between comparisons and AND - Constructs Expr::Like and Expr::ILike AST nodes - No NOT LIKE, no ESCAPE clause (as per scope guard) Add 5 new tests: - test_like: basic LIKE parsing - test_ilike: basic ILIKE parsing - test_like_precedence_with_and: verifies LIKE < AND - test_like_with_arithmetic: verifies + > LIKE - test_like_parenthesized_with_or: verifies parens work Total: 17 expr_core tests passing, all existing tests still pass.
Expand identifier coverage in expr_core: - Compound identifiers: foo.bar, schema.table.col - Quoted identifiers: "CaseSensitive" - Accept keywords as identifier parts (e.g., 'table' in schema.table) Add 5 new tests: - test_compound_identifier: foo.bar - test_compound_identifier_three_parts: schema.table.col - test_quoted_identifier: "CaseSensitive" - test_compound_identifier_with_expression: foo.bar + 1 - test_compound_identifier_with_like: schema.tbl.col ilike 'x' Total: 22 expr_core tests passing, all existing tests still pass.
Add IS NULL and IS NOT NULL postfix operators: - IS_NULL_POWER = 10 (between LIKE and AND) - postfix_op_core() to handle IS NULL / IS NOT NULL - Uses Postfix combinator from winnow Add 5 new tests: - test_is_null: basic IS NULL - test_is_not_null: basic IS NOT NULL - test_is_null_precedence_with_and: verifies IS NULL < AND - test_is_null_with_arithmetic: verifies + > IS NULL - test_is_null_parenthesized_with_or: verifies parens work Total: 27 expr_core tests passing, all existing tests still pass.
Extend IS family in expr_core: - IS TRUE / IS NOT TRUE - IS FALSE / IS NOT FALSE - IS UNKNOWN / IS NOT UNKNOWN Refactor postfix_op_core to use cleaner keyword matching pattern. Add 8 new tests: - test_is_true / test_is_not_true - test_is_false / test_is_not_false - test_is_unknown / test_is_not_unknown - test_is_true_precedence_with_and - test_is_false_precedence_with_or Total: 35 expr_core tests passing, all existing tests still pass.
c34f60f to
32b02af
Compare
The expr_core module is a foundation for future expression parsing migration but has no call sites yet. Remove the pub(crate) re-export to avoid unused import warnings, and add #![allow(dead_code)] to the module since the functions are only exercised by tests for now. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace redundant pattern matching with .is_ok() to satisfy clippy::redundant_pattern_matching lint - Escape [NOT] in doc comments to prevent rustdoc from interpreting them as unresolved links Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r.rs Rewrites expr_core to handle ALL SQL expression operators using winnow's expression combinator. Atoms are parsed via the v1 bridge (parse_prefix), while all infix and postfix operators are handled natively by the Pratt parser. This replaces the v1 fallback in expr.rs's expr_parse function. New operators added: NOT prefix, XOR, BETWEEN, IN, NOT LIKE/ILIKE, SIMILAR TO, NOT SIMILAR TO, NOT BETWEEN, NOT IN, IS DISTINCT FROM, IS NOT DISTINCT FROM, IS [NOT] JSON, AT TIME ZONE, :: cast, [] subscript, ISNULL, NOTNULL, OPERATOR(...), custom Op, Pipe, ALL/ANY/SOME, ESCAPE clause for LIKE/ILIKE/SIMILAR TO. Also fixes negative literal normalization and reserved keyword rejection (both handled by v1 parse_prefix). All 295 tests pass. Clippy and rustdoc clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All expression types are now implemented and used, so the allow attribute is no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix import ordering and closure formatting caught by `cargo fmt --check` in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@xiangjinwu PTAL, a step to the full combinator based parser. |
There was a problem hiding this comment.
Pull request overview
This PR switches parser_v2’s top-level expression parsing entrypoint to a new Pratt-based implementation (expr_core) built on winnow::combinator::expression, aiming to provide v2-native operator precedence parsing and broader operator coverage.
Changes:
- Added
parser_v2::expr_coremodule implementing Pratt parsing with postfix/infix operator handling. - Updated
parser_v2::exprto useexpr_coreinstead of delegating full expression parsing to v1. - Added a dedicated unit test suite for
expr_corecovering literals, identifiers, operators, and precedence.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/sqlparser/src/parser_v2/mod.rs | Registers the new expr_core module in parser_v2. |
| src/sqlparser/src/parser_v2/expr.rs | Routes expr_parse to the new Pratt-based expr_core parser. |
| src/sqlparser/src/parser_v2/expr_core.rs | Implements Pratt parsing (postfix/infix) and adds unit tests for core expression cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Token::Eq => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| { | ||
| Ok(Expr::BinaryOp { | ||
| left: Box::new(a), | ||
| op: BinaryOperator::Eq, | ||
| right: Box::new(b), | ||
| }) | ||
| })), | ||
| Token::Neq => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| { | ||
| Ok(Expr::BinaryOp { | ||
| left: Box::new(a), | ||
| op: BinaryOperator::NotEq, | ||
| right: Box::new(b), | ||
| }) | ||
| })), | ||
| Token::Lt => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| { | ||
| Ok(Expr::BinaryOp { | ||
| left: Box::new(a), | ||
| op: BinaryOperator::Lt, | ||
| right: Box::new(b), | ||
| }) | ||
| })), | ||
| Token::Gt => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| { | ||
| Ok(Expr::BinaryOp { | ||
| left: Box::new(a), | ||
| op: BinaryOperator::Gt, | ||
| right: Box::new(b), | ||
| }) | ||
| })), | ||
| Token::LtEq => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| { | ||
| Ok(Expr::BinaryOp { | ||
| left: Box::new(a), | ||
| op: BinaryOperator::LtEq, | ||
| right: Box::new(b), | ||
| }) | ||
| })), | ||
| Token::GtEq => Some(Infix::Neither(CMP_POWER, |_, a: Expr, b: Expr| { | ||
| Ok(Expr::BinaryOp { | ||
| left: Box::new(a), | ||
| op: BinaryOperator::GtEq, | ||
| right: Box::new(b), | ||
| }) | ||
| })), | ||
|
|
There was a problem hiding this comment.
Comparison operators (=, <>, <, >, <=, >=) are defined as Infix::Neither here, but the existing v1 parser treats them as regular left-associative binary operators (it will parse chains like a = b = c as (a = b) = c). Using Neither changes parsing behavior and may break queries that rely on the v1 behavior (even if they later fail type checking). Consider switching these to Infix::Left to match v1/PostgreSQL associativity, or add explicit tests + justification if the behavior change is intentional.
| //! This module provides a v2 expression parser that uses winnow's `expression` | ||
| //! combinator for operator precedence parsing. Atoms (primary expressions | ||
| //! including prefix operators like NOT and unary +/-) are parsed via the v1 | ||
| //! bridge, while all infix and postfix operators are handled natively by the | ||
| //! Pratt parser. |
There was a problem hiding this comment.
The module-level docs and PR description claim “zero fallback to v1 / pure parser_v2”, but expr_core still delegates large parts of parsing to v1 via parse_v1 (atoms via parse_prefix, plus BETWEEN/IN/::/[]/custom operators, etc.). Either update the documentation/PR description to reflect the current hybrid approach, or complete the v2 implementations so the claim is accurate.
| // Try negated forms (IS NOT ...) | ||
| if Keyword::NOT.parse_next(input).is_ok() { | ||
| if Keyword::TRUE.parse_next(input).is_ok() { | ||
| return Ok(Expr::IsNotTrue(Box::new(expr))); | ||
| } | ||
| if Keyword::FALSE.parse_next(input).is_ok() { | ||
| return Ok(Expr::IsNotFalse(Box::new(expr))); | ||
| } | ||
| if Keyword::UNKNOWN.parse_next(input).is_ok() { | ||
| return Ok(Expr::IsNotUnknown(Box::new(expr))); | ||
| } | ||
| if Keyword::NULL.parse_next(input).is_ok() { | ||
| return Ok(Expr::IsNotNull(Box::new(expr))); | ||
| } | ||
| if Keyword::DISTINCT.parse_next(input).is_ok() { | ||
| cut_err(Keyword::FROM).parse_next(input)?; | ||
| let expr2 = input.parse_v1(|p| p.parse_expr())?; | ||
| return Ok(Expr::IsNotDistinctFrom(Box::new(expr), Box::new(expr2))); | ||
| } | ||
| if Keyword::JSON.parse_next(input).is_ok() { | ||
| return input.parse_v1(|p| p.parse_is_json(expr, true)); | ||
| } | ||
| } | ||
|
|
||
| // Nothing matched after IS [NOT] | ||
| fail.parse_next(input) | ||
| } |
There was a problem hiding this comment.
parse_is_suffix falls back to fail when the IS suffix doesn’t match any supported form. In winnow, fail produces a backtracking error, which risks the Pratt parser treating IS as “not an operator here” instead of emitting a committed/clear syntax error after IS has already been consumed. Consider returning a cut error (or using cut_err(...)) with an explicit expectation message for the allowed IS [NOT] ... forms.
…h v1 Change comparison operators (=, <>, <, >, <=, >=) from Infix::Neither to Infix::Left so that expressions like `a = b = c` parse as `(a = b) = c`, matching v1 parser behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hi! This PR is ready for review. Quick summary of what it does:
Would appreciate a review when you get a chance. Thanks! 🙏 |
This PR adds a new
expr_coremodule to the parser_v2 crate, implementing expression parsing using winnow'''s Pratt parser combinator.Features
winnow::combinator::expressionImplementation Details
Prefix,Postfix, andInfixcombinators for operator handlingNot Yet Implemented (Future Work)
Testing
This is a foundation milestone for gradually migrating expression parsing to combinator style.