Problem Description
Crawler uses sitemaps to seed URLs for a crawl, but the content of sitemap files is never ingested. Some users may want to ingest data from this (for example metadata).
Proposed Solution
- Add a config option to allow ingesting sitemap content
- If config is enabled, ingest content of sitemap after using it to seed more URLs
Problem Description
Crawler uses sitemaps to seed URLs for a crawl, but the content of sitemap files is never ingested. Some users may want to ingest data from this (for example metadata).
Proposed Solution