Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Examples

First git clone https://github.com/spider-rs/spider.git and cd spider. Use the release flag for the best performance --release when running the examples below. It is recommended to use the headless-browser project for web crawling and scraping via a headless Docker container or by launching your own local chrome-headless-shell when using the chrome examples.

Basic

Simple concurrent crawl Simple.

  • cargo run --example example

Subscribe to realtime changes Subscribe.

  • cargo run --example subscribe

Subscribe to realtime changes Subscribe.

  • cargo run --example subscribe_multiple

Live handle index mutation example Callback.

  • cargo run --example callback

Enable log output Debug.

  • cargo run --example debug

Scrape the webpage with and gather html Scrape.

  • cargo run --example scrape

Scrape and download the html file to fs Download HTML. *Note: Enable feature flag [full_resources] to gather all files like css, jss, and etc.

  • cargo run --example download

Scrape and download html to react components and store to fs Download to React Component.

  • cargo run --example download_to_react

Crawl the page and output the links via Serde.

  • cargo run --example serde --features serde

Crawl links with a budget of amount of pages allowed Budget.

  • cargo run --example budget

Crawl links at a given cron time Cron.

  • cargo run --example cron

Crawl links with chrome headless rendering Chrome.

  • cargo run --example chrome --features chrome

Crawl links with chrome headed rendering Chrome.

  • cargo run --example chrome --features chrome_headed

Crawl links with chrome headless rendering remote connections Chrome.

  • cargo run --example chrome_remote --features chrome

Crawl links with view port configuration Chrome Viewport.

  • cargo run --example chrome_viewport --features chrome

Take a screenshot of a page during crawl Chrome Screenshot.

  • cargo run --example chrome_screenshot --features="spider/sync spider/chrome spider/chrome_store_page"

Crawl links with smart mode detection. Runs HTTP by default until Chrome Rendering is needed. Smart.

  • cargo run --example smart --features smart

Use different encodings for the page. Encoding.

  • cargo run --example encoding --features encoding

Use advanced configuration re-use. Advanced Configuration.

  • cargo run --example cache_chrome_hybrid --features="spider/sync spider/chrome spider/cache_chrome_hybrid"

Use chrome hybrid caching. Chrome Cache Hybrid.

  • cargo run --example advanced_configuration

End-to-end remote cache warm + skip-browser return path using HYBRID_CACHE_ENDPOINT (index_cache_server / hybrid_cache_server). Remote Cache Skip Browser.

  • HYBRID_CACHE_ENDPOINT=http://127.0.0.1:8080 cargo run --example cache_remote_skip_browser --features="spider/sync spider/chrome spider/chrome_remote_cache" -- https://example.com/

Use URL globbing for a domain. URL Globbing.

  • cargo run --example glob --features glob

Use URL globbing for a domain and subdomains. URL Globbing Subdomains.

  • cargo run --example url_glob_subdomains --features glob

Downloading files in a subscription. Subscribe Download.

  • cargo run --example subscribe_download

Add links to gather mid crawl. Queue.

  • cargo run --example queue

Use OpenAI to get custom Javascript to run in a browser. OpenAI. Make sure to set OPENAI_API_KEY=$MY_KEY as an env variable or pass it in before the script.

  • cargo run --example openai

or

  • OPENAI_API_KEY=replace_me_with_key cargo run --example openai

or setting multiple actions to drive the browser

  • OPENAI_API_KEY=replace_me_with_key cargo run --example openai_multi

or to get custom data from the GPT with JS scripts if needed.

  • OPENAI_API_KEY=replace_me_with_key cargo run --example openai_extra

Remote Multimodal (OpenRouter / Vision+Text)

Single page extraction from a book details page Remote Multimodal Scrape.

  • OPEN_ROUTER=replace_me_with_key cargo run --example remote_multimodal_scrape --features "spider/sync spider/chrome spider/agent_chrome"

Multi-page extraction by crawling from a category page Remote Multimodal Multi.

  • OPEN_ROUTER=replace_me_with_key cargo run --example remote_multimodal_multi --features "spider/sync spider/chrome spider/agent_chrome"

Dual-model routing (vision + text model split) Remote Multimodal Dual.

  • OPEN_ROUTER=replace_me_with_key cargo run --example remote_multimodal_dual --features "spider/sync spider/chrome spider/agent_chrome"

Dual-model multi-round automation Remote Multimodal Dual Automation.

  • OPEN_ROUTER=replace_me_with_key cargo run --example remote_multimodal_dual_automation --features "spider/sync spider/chrome spider/agent_chrome"

Quote extraction with structured JSON output Remote Multimodal Quotes.

  • OPEN_ROUTER=replace_me_with_key cargo run --example remote_multimodal_quotes --features "spider/sync spider/chrome spider/agent_chrome"

Listing page product extraction Remote Multimodal Listing.

  • OPEN_ROUTER=replace_me_with_key cargo run --example remote_multimodal_listing --features "spider/sync spider/chrome spider/agent_chrome"