Documentation Index
Fetch the complete documentation index at: https://mintlify.com/MatthewSabia1/AdRecon/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Page Ripper endpoint captures any public web page using headless Chrome and returns a downloadable ZIP archive containing:- Self-contained HTML (via SingleFile)
- Categorized assets (CSS, JavaScript, images, fonts, media)
Endpoint
Authentication
Requires a Supabase access token:Request Body
Full HTTP/HTTPS URL of the page to capture.Restrictions:
- Must use
http://orhttps://protocol - Cannot resolve to private/internal IP addresses
- Cannot be localhost or reserved IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, etc.)
Example Request
Response
Success (200 OK)
Returns a ZIP file download:ZIP Archive Structure
HTML documents are not duplicated in the assets folder —
page.html at the root is the only HTML file. The SingleFile library inlines critical resources directly into this HTML.Error Codes
| Status | Condition | Response Body |
|---|---|---|
400 | Missing url field | {"error": "Missing required field: url"} |
400 | Invalid URL format | {"error": "Invalid URL."} |
400 | Non-HTTP/HTTPS protocol | {"error": "Invalid URL. Must use http or https protocol."} |
400 | Private/internal address | {"error": "Requests to private/internal addresses are not allowed."} |
400 | DNS rebinding detected | {"error": "Requests to private/internal addresses are not allowed."} |
401 | Missing bearer token | {"error": "Missing Authorization bearer token."} |
401 | Invalid/expired token | {"error": "Invalid or expired session."} |
405 | Non-POST method | {"error": "Method not allowed."} |
429 | Rate limit exceeded | {"error": "Rate limit exceeded. Maximum 10 page captures per 15 minutes."} |
500 | Capture failed | {"error": "Page capture failed: <details>"} |
504 | Timeout | {"error": "Page capture timed out."} |
Rate Limiting
Each authenticated user is limited to:- 10 captures per 15-minute window
- Tracked via
page_rip_logtable keyed byuser_id
Rate Limit Response
SSRF Protection
The endpoint implements defense-in-depth SSRF protection:1. Hostname Validation
Rejects obviously private hostnames (source:api/download-page.js:104-107):Private Hostname Patterns
2. DNS Resolution Check
Resolves hostname via DNS and validates all returned IPs (source:api/download-page.js:114-134):DNS Validation
3. Post-Redirect Validation
After Puppeteer navigation, the final URL (post-redirects) is re-validated (source:api/download-page.js:451-460):Post-Navigation Check
Resource Limits
To prevent memory exhaustion:| Limit | Value | Behavior When Exceeded |
|---|---|---|
| Max total size | 100 MB | Stop capturing additional resources (source:api/download-page.js:46) |
| Max resource count | 500 items | Ignore additional resources (source:api/download-page.js:49) |
These limits apply to captured network resources only. The SingleFile HTML can be larger as it’s generated separately.
Timeouts
| Timeout | Value | Purpose |
|---|---|---|
| Navigation timeout | 60 seconds | Puppeteer page load (source:api/download-page.js:13) |
| Hard timeout | 110 seconds | Total request duration (source:api/download-page.js:12) |
| Auto-scroll timeout | 15 seconds | Lazy-load trigger (source:api/download-page.js:332) |
| Network idle after scroll | 2 seconds | Wait for lazy resources (source:api/download-page.js:16) |
Examples
Basic Capture
Error Handling
Comprehensive Error Handling
React Hook Example
usePageCapture Hook
Implementation Details
Browser Engine
- Puppeteer Core with @sparticuz/chromium (optimized for Vercel/serverless)
- Headless Chrome with disabled sandboxing for serverless environments
- Viewport: 1280x800 (source:api/download-page.js:408)
Page Capture Process
- Launch browser (different executable paths for dev/production)
- Navigate to target URL with
networkidle0wait condition - SSRF re-check on final URL after redirects
- Auto-scroll to trigger lazy-loaded content (300px steps with 100ms pauses)
- Network idle wait (2 seconds after scroll completes)
- SingleFile capture — inlines critical CSS/fonts/images into HTML
- Close browser immediately after HTML capture
- Build ZIP with captured resources organized by type
Resource Classification
Assets are categorized by MIME type and file extension (source:api/download-page.js:149-179):Asset Folder Mapping
User-Agent
Best Practices
Validate URLs client-side
Reject private IPs and invalid protocols before sending requests to save quota
Implement retry logic
Handle 504 timeouts with exponential backoff for large pages
Monitor rate limits
Track remaining captures and show warnings before hitting limits
Provide feedback
Captures can take 30-60+ seconds — show progress indicators to users
Related Endpoints
- Overview — API authentication and error handling
- Media Proxy — Proxy individual media assets