Feature request: Corpus mode for `harper-cli`

I'm thinking of something like this:

You make one or many text files where each line is a sentence to lint, along with some markup that indicates whether it should pass or fail:

✅ This sentence is perfectly grammatical.
❌ This sentance has one or more problems.

The corpus mode would then be able to detect both false positives and false negatives, as well as linter regressions.

If it finds lints for a sentence marked ✅ or finds no lints for a sentence marked ❌ it could output [FAIL] in red and otherwise output [PASS] in green.

Another simple variant would include a specific linter by name:
✅ `SpellCheck` This sentence is perfectly grammatical.
❌ `SpellCheck` This sentance has one or more problems.

This would be especially useful for linters for which it's very difficult to catch edge cases, or even to foresee them before you start working. You would either write a bunch of sentences you think should pass and a bunch you think should fail, then see if your new linter agrees. Sometimes it will reveal a bug in your linter, sometimes it will reveal an unexpected edge case, like a verb also having a lesser known noun sense etc.

We could then start adding more features on top of this as we think of them.

We could also replace most of the test sections of linters to use this, or for those and `harper-cli lint` to share a common back end, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Corpus mode for `harper-cli` #2629

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: Corpus mode for harper-cli #2629

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Feature request: Corpus mode for `harper-cli` #2629