-
Notifications
You must be signed in to change notification settings - Fork 314
Feature request: Corpus mode for harper-cli #2629
Description
I'm thinking of something like this:
You make one or many text files where each line is a sentence to lint, along with some markup that indicates whether it should pass or fail:
✅ This sentence is perfectly grammatical.
❌ This sentance has one or more problems.
The corpus mode would then be able to detect both false positives and false negatives, as well as linter regressions.
If it finds lints for a sentence marked ✅ or finds no lints for a sentence marked ❌ it could output [FAIL] in red and otherwise output [PASS] in green.
Another simple variant would include a specific linter by name:
✅ SpellCheck This sentence is perfectly grammatical.
❌ SpellCheck This sentance has one or more problems.
This would be especially useful for linters for which it's very difficult to catch edge cases, or even to foresee them before you start working. You would either write a bunch of sentences you think should pass and a bunch you think should fail, then see if your new linter agrees. Sometimes it will reveal a bug in your linter, sometimes it will reveal an unexpected edge case, like a verb also having a lesser known noun sense etc.
We could then start adding more features on top of this as we think of them.
We could also replace most of the test sections of linters to use this, or for those and harper-cli lint to share a common back end, etc.