Skip to content

Feature request: Corpus mode for harper-cli #2629

@hippietrail

Description

@hippietrail

I'm thinking of something like this:

You make one or many text files where each line is a sentence to lint, along with some markup that indicates whether it should pass or fail:

✅ This sentence is perfectly grammatical.
❌ This sentance has one or more problems.

The corpus mode would then be able to detect both false positives and false negatives, as well as linter regressions.

If it finds lints for a sentence marked ✅ or finds no lints for a sentence marked ❌ it could output [FAIL] in red and otherwise output [PASS] in green.

Another simple variant would include a specific linter by name:
SpellCheck This sentence is perfectly grammatical.
SpellCheck This sentance has one or more problems.

This would be especially useful for linters for which it's very difficult to catch edge cases, or even to foresee them before you start working. You would either write a bunch of sentences you think should pass and a bunch you think should fail, then see if your new linter agrees. Sometimes it will reveal a bug in your linter, sometimes it will reveal an unexpected edge case, like a verb also having a lesser known noun sense etc.

We could then start adding more features on top of this as we think of them.

We could also replace most of the test sections of linters to use this, or for those and harper-cli lint to share a common back end, etc.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions