Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
-
Updated
Apr 27, 2026 - Rust
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
Use LLMs to robustly extract web data
Fully automated and hands-free, accurately extracting and understanding web content — powered by machine learning agents.
Replayable Browser Agent
Low-Cost Cross-Domain Web Structured Information Extraction using specialized LoRA adapters.
基于Scala Akka的分布式主题网络爬虫
Automatic extraction of the information on local event from a webpage with Machine Learning
Predicting product recommendation score using the data available on the website of the client
A powerful and lightweight web scraping library with LLM extraction capabilities. This library combines web scraping with AI-powered content extraction using either OpenAI or OpenRouter APIs.
Programming assignments for Web Information Extraction and Retrieval, FRI UL, 2021. PA1: standalone webcrawler of .gov.si web sites, PA2: approaches of the structured web data extraction, PA3: Data processing and indexing and Data retrieval.
Structured web-extraction tool plugin with schema, provenance, and drift awareness.
pinterest data extraction toolkit
This project is a command-line tool that extracts text from web pages and PDF files, including scanned documents. It supports various extraction methods. This tool is ideal for data scraping, NLP preprocessing, and content analysis.
Local-first search tool layer for AI agents, built with FastAPI, SearXNG, and Trafilatura.
MarkGrab plugin for Claude Code — web content extraction to LLM-ready markdown
Glasses Web Reader for Even Realities G2 — three-layer browser (sources → articles → reader) using r.jina.ai for clean URL extraction.
google search real-time results
Add a description, image, and links to the web-extraction topic page so that developers can more easily learn about it.
To associate your repository with the web-extraction topic, visit your repo's landing page and select "manage topics."