Wayback Machine Archive: Tools for checking, submitting, listing, screenshotting, and cache management for archived URLs
Version: v0.10.0
Install: /plugin install wayback@mearman
Manage the OS tmpdir-based cache for Wayback Machine API responses.
npx tsx scripts/cache.ts <command> [options]| Command | Description |
|---|---|
clear |
Clear all cached Wayback data |
status |
Show cache directory location and file count |
| Option | Description |
|---|---|
--no-cache |
Bypass cache for single operation |
Cached responses are stored in the OS temporary directory:
os.tmpdir()/wayback-cache/
Cache keys are generated from URLs and parameters using SHA-256 hashing.
| Operation | TTL | Rationale |
|---|---|---|
| Availability API | 24 hours | Snapshots don't change often |
| CDX API | 1 hour | Snapshot list can change |
| Save status | 30 seconds | Only during polling |
Cached entries expire automatically and are deleted on access.
npx tsx scripts/cache.ts <command> [options]Commands:
clear- Clear all cached Wayback datastatus- Show cache directory location and file count
Remove all cached API responses:
npx tsx scripts/cache.ts clearThis deletes all .json cache files from the cache directory.
Display cache information:
npx tsx scripts/cache.ts statusShows:
- Cache directory path
- Number of cached files
- Total cache size (if available)
# Clear all cache before checking a URL
npx tsx scripts/cache.ts clear
npx tsx scripts/check.ts https://example.com
# Clear cache, then list snapshots
npx tsx scripts/cache.ts clear
npx tsx scripts/list.ts https://example.com 20
# Check cache status
npx tsx scripts/cache.ts statusIndividual scripts support --no-cache to skip cache for one operation without clearing all cached data:
npx tsx scripts/check.ts https://example.com --no-cache
npx tsx scripts/list.ts https://example.com --no-cache
npx tsx scripts/screenshot.ts https://example.com --no-cacheThe --no-cache flag bypasses reading from cache but still caches the fresh response for future requests.
Cache keys are 16-character hexadecimal strings:
a1b2c3d4e5f6g7h8.json
Each key represents a unique URL + parameter combination.
View cache directory contents:
# On macOS/Linux
ls -la $(getconf DARWIN_USER_TEMP_DIR)/wayback-cache/
# Or
ls -la /tmp/wayback-cache/
# View individual cache file
cat /tmp/wayback-cache/a1b2c3d4e5f6g7h8.json | jq- Use
wayback-checkto verify if a URL is archived - Use
wayback-listto see all captures with filtering options - Use
wayback-screenshotto retrieve visual screenshots - Use
wayback-submitto create a new archive
Check if a URL has been archived by the Internet Archive's Wayback Machine.
npx tsx scripts/check.ts <url> [options]| Argument | Required | Description |
|---|---|---|
url |
Yes | The URL to check |
| Option | Description |
|---|---|
--no-raw |
Include Wayback toolbar in archived URL |
--timestamp=DATE |
Find snapshot closest to date (YYYYMMDD or YYYYMMDDhhmmss) |
--no-cache |
Bypass cache and fetch fresh data from API |
When archived:
✓ Archived
Timestamp: January 1, 2024 (3 days ago)
URL: https://web.archive.org/web/20240101120000id_/https://example.com
When not archived:
✗ Not archived
Consider using wayback-submit to archive this URL.
npx tsx scripts/check.ts <url> [options]Options:
--no-raw- Include Wayback toolbar in archived URL--timestamp=DATE- Find snapshot closest to date (YYYYMMDD or YYYYMMDDhhmmss)--no-cache- Bypass cache and fetch fresh data from API
Run from the wayback plugin directory: ~/.claude/plugins/cache/wayback/
https://archive.org/wayback/available?url={URL}
| Parameter | Required | Description |
|---|---|---|
url |
Yes | The URL to check |
timestamp |
No | Find snapshot closest to this date (YYYYMMDDhhmmss, partial dates OK) |
callback |
No | JSONP callback for cross-domain requests |
Latest snapshot:
https://archive.org/wayback/available?url=https://example.com
Snapshot closest to a specific date:
https://archive.org/wayback/available?url=https://example.com×tamp=20200101
Archived - Response contains archived_snapshots.closest with:
available: trueurl: The archived URL (format:https://web.archive.org/web/{timestamp}/{original_url})timestamp: Archive timestamp (YYYYMMDDhhmmss format)status: HTTP status code
Not Archived - Response has empty archived_snapshots object.
By default, append id_ after the timestamp in URLs to get raw content without the Wayback toolbar:
- With toolbar:
https://web.archive.org/web/20240101120000/https://example.com - Raw (no toolbar):
https://web.archive.org/web/20240101120000id_/https://example.com
| Modifier | URL Pattern | Description |
|---|---|---|
| (none) | /web/{ts}/ |
Page with Wayback toolbar |
id_ |
/web/{ts}id_/ |
Raw page content (no toolbar) |
im_ |
/web/{ts}im_/ |
Screenshot image |
js_ |
/web/{ts}js_/ |
JavaScript content |
cs_ |
/web/{ts}cs_/ |
CSS content |
- Use
wayback-screenshotto retrieve visual screenshots of archived pages - Use
wayback-listto see all captures with filtering options - Use
wayback-submitto create a new archive (with optional screenshot)
Availability API responses are cached for 24 hours using the OS temporary directory (os.tmpdir()). Cache keys are generated from the URL and timestamp parameters using SHA-256 hashing. Cached responses expire automatically and are deleted on access.
Use wayback-cache to manage cached data:
npx tsx scripts/cache.ts clear # Clear all cache
npx tsx scripts/cache.ts status # Show cache statusSee wayback-cache skill for complete cache management documentation.
If the Wayback Machine API returns an error or is unavailable, retry after a brief delay. The API may be rate-limited during high traffic periods.
Analyze the capture frequency and rate for a URL over a specified time range.
npx tsx scripts/frequency.ts <url> [from] [to] [options]| Argument | Required | Description |
|---|---|---|
url |
Yes | The URL to analyze |
from |
No | Start date (YYYYMMDD or YYYY-MM). Default: oldest capture |
to |
No | End date (YYYYMMDD or YYYY-MM). Default: newest capture |
| Option | Description |
|---|---|
--full |
Include detailed breakdown by year |
--json |
Output as JSON |
--no-cache |
Bypass cache and fetch fresh data from API |
Default (compact):
1423 captures over 3652 days
Average: 0.39/day, 11.87/month, 142.3/year
With --full:
📊 CAPTURE FREQUENCY ANALYSIS
URL: https://example.com
Range: 2015-01-01 12:00 to 2025-01-01 08:00 (3652 days)
Total captures: 1423
Average rate:
0.39 captures per day
11.87 captures per month
142.3 captures per year
By year:
2015: 156 captures
2016: 203 captures
2017: 189 captures
...
npx tsx scripts/frequency.ts <url> [from] [to] [options]Run from the wayback plugin directory: ~/.claude/plugins/cache/wayback/
# Analyze entire archive history
npx tsx scripts/frequency.ts https://example.com
# Analyze specific date range
npx tsx scripts/frequency.ts https://example.com 2020 2023
# Full breakdown with year-by-year stats
npx tsx scripts/frequency.ts https://example.com 2020 2023 --full
# Specific dates
npx tsx scripts/frequency.ts https://example.com 20200101 20231231CDX API responses are cached for 1 hour. Use --no-cache to bypass.
- wayback-range - Show oldest and newest captures with archive span
- wayback-list - List all snapshots with pagination
- wayback-oldest - Find the earliest capture
- wayback-newest - Find the most recent capture
Retrieve a list of archived snapshots for a URL from the Wayback Machine CDX API.
npx tsx scripts/list.ts <url> [limit] [options]| Argument | Required | Description |
|---|---|---|
url |
Yes | The URL to search for |
limit |
No | Max number of results (default: unlimited) |
| Option | Description |
|---|---|
--no-raw |
Include Wayback toolbar in URLs |
--with-screenshots |
Cross-reference to show which captures have screenshots (📷) |
--no-cache |
Bypass cache and fetch fresh data from API |
January 1, 2024 (3 days ago)
https://web.archive.org/web/20240101120000id_/https://example.com
December 15, 2023 (20 days ago)
https://web.archive.org/web/20231215100000id_/https://example.com
Total: 2 snapshot(s)
npx tsx scripts/list.ts <url> [limit] [options]Options:
--no-raw- Include Wayback toolbar in URLs--with-screenshots- Cross-reference to show which captures have screenshots (📷)--no-cache- Bypass cache and fetch fresh data from API
Run from the wayback plugin directory: ~/.claude/plugins/cache/wayback/
https://web.archive.org/cdx/search/cdx?url={URL}&output=json&limit={N}
Most CDX queries don't require authentication. For restricted data access:
# Cookie-based auth for restricted content
curl "https://web.archive.org/cdx/search/cdx?url=..." \
--cookie "cdx-auth-token=YOUR_TOKEN"Get API keys at https://archive.org/account/s3.php
| Parameter | Description |
|---|---|
url |
The URL to search for (required) |
output |
Response format: json (recommended) |
matchType |
exact (default), prefix, host, or domain |
limit |
Max results. Use -N for last N results |
offset |
Skip first N records |
from |
Start date (YYYYMMDD or partial like "2020") |
to |
End date (YYYYMMDD or partial) |
filter |
Field filter: [!]field:regex (e.g., statuscode:200, !mimetype:image.*) |
collapse |
Dedupe: field or field:N (e.g., timestamp:8 = daily) |
fl |
Fields to return: comma-separated (urlkey, timestamp, original, mimetype, statuscode, digest, length) |
fastLatest |
true for efficient recent results |
showResumeKey |
true to get pagination token |
resumeKey |
Continue from previous query |
Use WebFetch to query the CDX API:
https://web.archive.org/cdx/search/cdx?url=https://example.com&output=json&limit=10
JSON array where first row is headers:
[
["urlkey", "timestamp", "original", "mimetype", "statuscode", "digest", "length"],
["com,example)/", "20240101120000", "https://example.com/", "text/html", "200", "ABC123", "1234"]
]From timestamp, build the archived URL:
https://web.archive.org/web/{timestamp}/{original_url}
For raw content (no Wayback toolbar):
https://web.archive.org/web/{timestamp}id_/{original_url}
# Only successful pages
&filter=statuscode:200
# Exclude images
&filter=!mimetype:image.*
# One snapshot per day (collapse on first 8 digits of timestamp)
&collapse=timestamp:8
# One snapshot per hour
&collapse=timestamp:10
# Date range (partial dates work)
&from=2023&to=2024
# All pages under a path (prefix match)
&url=example.com/blog/&matchType=prefix
# Entire domain including subdomains
&url=example.com&matchType=domain
# Get last 5 snapshots efficiently
&limit=-5&fastLatest=true
# Paginate large results
&showResumeKey=true&limit=1000
# Then continue with: &resumeKey={token_from_previous}
The CDX API doesn't include a screenshot field. To find captures with screenshots, cross-reference with:
https://web.archive.org/cdx/search/cdx?url=web.archive.org/screenshot/{URL}/*&output=json
The --with-screenshots flag in the script does this automatically, showing 📷 next to captures that have screenshots.
CDX API responses are cached for 1 hour using the OS temporary directory (os.tmpdir()). Cache keys are generated from the URL and query parameters using SHA-256 hashing. Cached responses expire automatically and are deleted on access.
Use wayback-cache to manage cached data:
npx tsx scripts/cache.ts clear # Clear all cache
npx tsx scripts/cache.ts status # Show cache statusSee wayback-cache skill for complete cache management documentation.
2024-01-15 12:34 (3 days ago) 📷
https://web.archive.org/web/20240115123456id_/https://example.com/
📷 https://web.archive.org/web/20240115123456im_/https://example.com/
2024-01-10 08:00 (8 days ago)
https://web.archive.org/web/20240110080000id_/https://example.com/
Total: 2 snapshot(s)
Screenshots: 1 capture(s) have screenshots
Find the most recent archived snapshot of a URL from the Wayback Machine.
npx tsx scripts/oldest-newest.ts <url> --newest-only [options]| Argument | Required | Description |
|---|---|---|
url |
Yes | The URL to search for |
| Option | Description |
|---|---|
--full |
Include archive URL in output |
--json |
Output as JSON |
--no-cache |
Bypass cache and fetch fresh data from API |
Default (compact):
2024-01-15 14:30 (2 days ago)
With --full:
🆕 NEWEST:
2024-01-15 14:30 (2 days ago)
https://web.archive.org/web/20240115143000id_/https://example.com
npx tsx scripts/oldest-newest.ts <url> --newest-onlyRun from the wayback plugin directory: ~/.claude/plugins/cache/wayback/
https://web.archive.org/cdx/search/cdx?url={URL}&output=json&limit=1&filter=statuscode:200&fastLatest=true
The fastLatest=true parameter efficiently returns the most recent capture without scanning the entire index.
CDX API responses are cached for 1 hour. Use --no-cache to bypass.
- wayback-oldest - Find the earliest capture
- wayback-range - Show both oldest and newest with archive span
- wayback-list - List all snapshots with pagination
Find the earliest archived snapshot of a URL from the Wayback Machine.
npx tsx scripts/oldest-newest.ts <url> --oldest-only [options]| Argument | Required | Description |
|---|---|---|
url |
Yes | The URL to search for |
| Option | Description |
|---|---|
--full |
Include archive URL in output |
--json |
Output as JSON |
--no-cache |
Bypass cache and fetch fresh data from API |
Default (compact):
1998-12-01 08:00 (9200 days ago)
With --full:
📜 OLDEST:
1998-12-01 08:00 (9200 days ago)
https://web.archive.org/web/19981201080000id_/https://example.com
npx tsx scripts/oldest-newest.ts <url> --oldest-onlyRun from the wayback plugin directory: ~/.claude/plugins/cache/wayback/
https://web.archive.org/cdx/search/cdx?url={URL}&output=json&limit=1&filter=statuscode:200
The limit=1 parameter with default ascending sort returns the oldest capture first.
CDX API responses are cached for 1 hour. Use --no-cache to bypass.
- wayback-newest - Find the most recent capture
- wayback-range - Show both oldest and newest with archive span
- wayback-list - List all snapshots with pagination
Show both the oldest and newest archived snapshots for a URL, displaying the full archive time span.
npx tsx scripts/oldest-newest.ts <url> [options]| Argument | Required | Description |
|---|---|---|
url |
Yes | The URL to search for |
| Option | Description |
|---|---|
--full |
Include archive URLs in output |
--json |
Output as JSON |
--no-cache |
Bypass cache and fetch fresh data from API |
Default (compact):
1998-12-01 08:00 (9200 days ago)
2024-01-15 14:30 (2 days ago)
With --full:
📜 OLDEST:
1998-12-01 08:00 (9200 days ago)
https://web.archive.org/web/19981201080000id_/https://example.com
🆕 NEWEST:
2024-01-15 14:30 (2 days ago)
https://web.archive.org/web/20240115143000id_/https://example.com
Archive span: 9198 days
npx tsx scripts/oldest-newest.ts <url>Run from the wayback plugin directory: ~/.claude/plugins/cache/wayback/
Queries the CDX API twice:
- Oldest capture:
limit=1with default ascending sort - Newest capture:
limit=1withfastLatest=true
Calculates the day span between first and last capture to show how long the URL has been tracked.
CDX API responses are cached for 1 hour. Use --no-cache to bypass.
- wayback-oldest - Find only the earliest capture
- wayback-newest - Find only the most recent capture
- wayback-list - List all snapshots with pagination
Access existing screenshots stored by the Wayback Machine.
npx tsx scripts/screenshot.ts <url> [options]| Argument | Required | Description |
|---|---|---|
url |
Yes | The URL to find screenshots for |
| Option | Description |
|---|---|
--timestamp=DATE |
Get screenshot from specific capture (YYYYMMDDhhmmss) |
--list |
List available screenshots for URL |
--download=PATH |
Download screenshot to file |
--no-cache |
Bypass cache and fetch fresh data from API |
Screenshots for: https://example.com/
January 15, 2024 12:34 (3 days ago)
https://web.archive.org/web/20240115123456im_/https://example.com/
December 1, 2023 08:00 (46 days ago)
https://web.archive.org/web/20231201080000im_/https://example.com/
Total: 2 screenshot(s)
npx tsx scripts/screenshot.ts <url> [options]Options:
--timestamp=DATE- Get screenshot from specific capture (YYYYMMDDhhmmss)--list- List available screenshots for URL--download=PATH- Download screenshot to file--no-cache- Bypass cache and fetch fresh data from API
Run from the wayback plugin directory: ~/.claude/plugins/cache/wayback/
https://web.archive.org/screenshot/{URL}
Get the most recent screenshot:
https://web.archive.org/screenshot/https://example.com/
Get screenshot from a specific capture:
https://web.archive.org/web/{timestamp}im_/https://example.com/
The im_ modifier returns the screenshot image for that timestamp.
Use CDX API with wildcard to find all screenshots:
https://web.archive.org/cdx/search/cdx?url=web.archive.org/screenshot/https://example.com/*&output=json
Or browse visually:
https://web.archive.org/web/*/https://web.archive.org/screenshot/https://example.com/*
When viewing any archived page:
- Look for the camera icon (📷) in the top-right of the Wayback toolbar
- Click to view available screenshots for that capture
When submitting with capture_screenshot=1, the response includes:
{
"status": "success",
"screenshot": "https://web.archive.org/web/20240115123456im_/https://example.com/"
}
## Caveats
- **Not all captures have screenshots** - depends on whether `capture_screenshot=1` was used during archiving
- **Undocumented feature** - may be unreliable or change without notice
- **Indexing delays** - newly captured screenshots may not appear immediately
- **Coverage varies** - older archives typically don't have screenshots
## Caching
Availability API responses are cached for 24 hours using the OS temporary directory (`os.tmpdir()`). Cache keys are generated from the URL using SHA-256 hashing. Cached responses expire automatically and are deleted on access.
Use `wayback-cache` to manage cached data:
```bash
npx tsx scripts/cache.ts clear # Clear all cache
npx tsx scripts/cache.ts status # Show cache status
See wayback-cache skill for complete cache management documentation.
- Use
wayback-submit --capture-screenshotto create a new screenshot - Use
wayback-checkto verify if a URL is archived - Use
wayback-listto see all captures (not just those with screenshots)
Submit a URL to the Internet Archive's Wayback Machine using the Save Page Now 2 (SPN2) API.
npx tsx scripts/submit.ts <url> [options]| Argument | Required | Description |
|---|---|---|
url |
Yes | URL to archive |
| Option | Description |
|---|---|
--no-raw |
Include Wayback toolbar in archived URL |
--key=ACCESS:SECRET |
Use API authentication (get keys at https://archive.org/account/s3.php) |
When submission succeeds:
✓ Archive submitted successfully
Job ID: spn2-abc123...
Check status: https://web.archive.org/save/status/spn2-abc123...
Waiting for capture...
✓ Capture complete
URL: https://web.archive.org/web/20240115123456id_/https://example.com
npx tsx scripts/submit.ts <url> [options]Options:
--no-raw- Include Wayback toolbar in archived URL--key=ACCESS:SECRET- Use API authentication
Run from the wayback plugin directory: ~/.claude/plugins/cache/wayback/
Get API keys at https://archive.org/account/s3.php (requires free account).
Header format:
Authorization: LOW {access_key}:{secret_key}
| Limit | Authenticated | Anonymous |
|---|---|---|
| Concurrent captures | 12 | 6 |
| Daily captures | 100,000 | 4,000 |
| Per-URL daily | 10 | 10 |
| Capture timeout | 50s page load, 2min total |
Endpoint: POST https://web.archive.org/save
curl -X POST https://web.archive.org/save \
-H "Accept: application/json" \
-H "Authorization: LOW myaccesskey:mysecret" \
-d "url=https://example.com"| Parameter | Description |
|---|---|
url |
URL to archive (required) |
capture_all=1 |
Capture even 4xx/5xx error pages |
capture_outlinks=1 |
Also archive linked pages (first 100) |
capture_screenshot=1 |
Generate PNG screenshot |
delay_wb_availability=1 |
Delay indexing ~12 hours (reduces load) |
skip_first_archive=1 |
Skip check if URL was already archived |
if_not_archived_within=30d |
Skip if archived within timeframe (e.g., 30d, 1h) |
js_behavior_timeout=10 |
Run JavaScript for N seconds (max 30) |
force_get=1 |
Use simple HTTP GET instead of browser |
capture_cookie=name=value |
Include custom cookie in request |
target_username / target_password |
Login credentials for protected pages |
Success:
{
"url": "https://example.com",
"job_id": "spn2-abc123..."
}curl "https://web.archive.org/save/status/{job_id}" \
-H "Authorization: LOW myaccesskey:mysecret"Pending:
{"status": "pending", "resources": []}Success:
{
"status": "success",
"timestamp": "20240115123456",
"original_url": "https://example.com",
"resources": ["https://example.com/style.css"],
"outlinks": {},
"screenshot": "https://web.archive.org/web/.../screenshot.png"
}Error codes: error:blocked-url, error:too-many-daily-captures, error:soft-time-limit-exceeded, error:invalid-host-resolution
curl "https://web.archive.org/save/status/user" \
-H "Authorization: LOW myaccesskey:mysecret"Returns: {"available": 99950, "processing": 2}
For quick one-off saves without authentication:
https://web.archive.org/save/{URL}
Lower rate limits apply (6 concurrent, 4k daily).
- Use
delay_wb_availability=1for batch jobs (reduces server load) - Check job status for captures that take time (JS-heavy pages)
- Use
capture_screenshot=1for visual verification