feat: reverse Expr for matching "before" context#2934
feat: reverse Expr for matching "before" context#2934hippietrail wants to merge 12 commits intoAutomattic:masterfrom
Expr for matching "before" context#2934Conversation
Expr for matching "before" contextExpr for matching "before" context
|
@86xsk I don't seem to be able to request your review of this but it would be great to get some opinions from you whenever you might have time. |
86xsk
left a comment
There was a problem hiding this comment.
It would make things easier for more complex/advanced linters that need to check the context inside
match_to_lint_with_contextif there was a reverse equivalent ofExprso we can check the previous toke, then the one before that, etc.This PR provides that in what I think is a minimal way by introducing
_revversions of a few trait methods such asrun_rev()forExprandstep_rev()forStep.
I'm not too experienced with writing linters so it's a little difficult to wrap my head around this. Would you please give an example of where this might be useful in practice?
I also noticed that the default implementation of
runforExprhad a conditional check thatstepreturned a positive value but also had code for what to do with a negative value, but as far as I can tell a negative value there should be impossible. I verified this by running all the tests with asserts there and they never got triggered.I removed the code for the false paths in here but I'm not sure that's the right thing to do, or that the old code was the right thing to do either. It should be a sanity check but I believe we're not supposed to output to stderr or crash. What's the right thing to do here?
I'm not super familiar with this part of the code myself, but I think it is technically possible for a Step to return a negative value. Albeit it doesn't look like that's currently used. I believe the functionality was intentionally added in #1393.
Somewhat similarly, in all the
Patterns andExprs that do not yet support a_revversion I currently do output a message with🛑to stderr and returnNone. Ideally I was thinking a way for each one to share whether or not they support reverse usage might be needed, but maybe just returningNoneis correct. It's only going to be hit with a contributor is working on a new reverseExpr.
I think it might be better to use something like todo!() instead. That way it would panic, making it harder to accidentally use such an implementation without realizing it.
Another alternative might be to implement a separate trait for it instead, something like ExprRev perhaps, then implement it for Expr that are reversible. I think this is likely a more idiomatic way to go about it, but I guess it would cause some downstream complexity by requiring types like SequenceExpr to be generic over whether all the contained expressions support ExprRev or not. Because of this, I'm wondering if there's a simpler alternative that fulfills the same need, but it's hard to say since I'm not too experienced with actually writing linters myself.
Parts-of-speech of words are often ambiguous and you have to look at surrounding words to figure out which one applies in context. Normally we try to make the I wrote a Rust linter just in the past week that does some manual walking backward through the "before" context in
I think that's the PR where
I got in trouble for using
That's exactly the way I tried to implement it first. As I built it out and its tendrils spread, they hit something they tangled with in ugly ways that I no longer remember, but doing so helped me understand the code better to trim it way back and find which parts were essential. One problem was the interplay between backwards and forwards elements, which there shouldn't be a need for. The one I'm still not sure about but I think this implementation might be able to handle is when a I appreciate the feedback! |
About half a year ago, I tried to implement (but never finished) some form of lookahead/lookbehind by implementing functions like SequenceExpr::default()
.then(...) // Isn't captured in the match, but must precede for the match to succeed.
.start_capture()
.then(...) // What we actually want to have contained in the match.
.end_capture()
.then(...) // Isn't captured in the match, but must follow for the match to succeed.(If you're familiar with vim regex, the idea is basically identical to Do you think something like that would work well for the use cases you have in mind? If so, I could try taking another crack at it. If it ends up working, I feel it could help avoid some of the complexity with reverse matching.
The only other one that I can think of off the top of my head is |
Could do. I also tried to add some kind of capture groups last year and tried to add anchors before we got the ones we have now - which I still can't figure out how to use. I gave up both because there was too much about the overall architecture I didn't understand well enough, and probably too much Rust I hadn't yet learned either. I do keep thinking though about taking another crack at making an alternative to Another thing I think is missing is the ability to skip over things that are not matched. When you identify a pattern as a thing you don't want but know how many tokens it is only for it to get matched again a token or two later as a shorter sequence. This may already be doable. Unfortunately I also don't remember the concrete case by I know it's come up at least twice.
Interesting! That one shouldn't care about direction but maybe it does... |
|
@86xsk I just ran into a case where this would be useful. So hopefully I can make a clear case for it. I just read "except of" in a GitHub repo and wondered if it's common. It is. So I tried to think of edge cases that would prevent a naive "except of" -> "except for" Weir rule from being sufficient. I thought people might also use "except" instead of "exception" and indeed they do. Various such can include "except of": (Not to mention "except of course", but the after-context is much easier to check.) To check manually and perfectly:
Or a reverse Or if it had to be expressed backwards in full something like Still not a perfect example as it's two different mistakes clashing, rather than a mistake with edge cases that are not mistakes. But hopefully illustrative nonetheless. |
…into reverse-expr
Issues
N/A
Description
It would make things easier for more complex/advanced linters that need to check the context inside
match_to_lint_with_contextif there was a reverse equivalent ofExprso we can check the previous toke, then the one before that, etc.This PR provides that in what I think is a minimal way by introducing
_revversions of a few trait methods such asrun_rev()forExprandstep_rev()forStep.I also noticed that the default implementation of
runforExprhad a conditional check thatstepreturned a positive value but also had code for what to do with a negative value, but as far as I can tell a negative value there should be impossible. I verified this by running all the tests with asserts there and they never got triggered.I removed the code for the false paths in here but I'm not sure that's the right thing to do, or that the old code was the right thing to do either. It should be a sanity check but I believe we're not supposed to output to stderr or crash. What's the right thing to do here?
Somewhat similarly, in all the
Patterns andExprs that do not yet support a_revversion I currently do output a message with🛑to stderr and returnNone. Ideally I was thinking a way for each one to share whether or not they support reverse usage might be needed, but maybe just returningNoneis correct. It's only going to be hit with a contributor is working on a new reverseExpr.The only
Exprwhich support reverse mode so far isSequenceExpr, which I have a test for. There was a choice about whether the code would express reverse sequences "forward" or "backward" since they'll be evaluated backward but might be easier to reason about forward like otherSequenceExprthat's what I went with. But maybe I chose wrong or maybe both are needed.Please critique and give feedback!
How Has This Been Tested?
I introduced tests for both the forward and backward versions.
Checklist