A someone common reporting/QA requirement we've seen is to be able to list URLs that were not crawled because they were out of scope or not queued for other reasons, such as a limit being hit.
In order to meet this requirement, we should have an option to generate a page list like extraPages.jsonl for URLs that were not queued.
We may additionally want to log URLs that were queued but never crawled if the crawler gracefully stops (e.g. due to hitting a limit) as well.
A someone common reporting/QA requirement we've seen is to be able to list URLs that were not crawled because they were out of scope or not queued for other reasons, such as a limit being hit.
In order to meet this requirement, we should have an option to generate a page list like
extraPages.jsonlfor URLs that were not queued.We may additionally want to log URLs that were queued but never crawled if the crawler gracefully stops (e.g. due to hitting a limit) as well.