Skip to content

Add option to produce report of skipped pages #965

@tw4l

Description

@tw4l

A someone common reporting/QA requirement we've seen is to be able to list URLs that were not crawled because they were out of scope or not queued for other reasons, such as a limit being hit.

In order to meet this requirement, we should have an option to generate a page list like extraPages.jsonl for URLs that were not queued.

We may additionally want to log URLs that were queued but never crawled if the crawler gracefully stops (e.g. due to hitting a limit) as well.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done!

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions