
Model Presets
Updated 03/01/2026
| Model | Tile Size | Tokens / Tile | Max Dimension |
|---|---|---|---|
| Claude | 1,092px | ~1,590 | 1,568px |
| OpenAI | 768px | ~765 | 2,048px |
| Gemini 3 | 1,536px | ~1,120 | 3,072px |
| Gemini | 768px | ~258 | 768px |
Claude
Tile: 1,092px
Tokens: ~1,590
Max: 1,568px
OpenAI
Tile: 768px
Tokens: ~765
Max: 2,048px
Gemini 3
Tile: 1,536px
Tokens: ~1,120
Max: 3,072px
Gemini
Tile: 768px
Tokens: ~258
Max: 768px
How It Works
Features
Full-Resolution Vision
Process every pixel at native resolution instead of letting LLMs auto-downscale your screenshots into mush.
Web Page Capture
Capture full-page screenshots from URLs via headless Chrome with scroll-stitching up to 200,000px.
Token-Efficient
Per-tile content analysis (blank, low-detail, high-detail) so you skip empty tiles and save tokens.
Mobile Emulation
Test responsive designs with mobile viewport, 2x retina, and mobile user agent.
Multi-Format
Tile local files, remote URLs, base64, and data URIs. Output as WebP or lossless PNG.
Versioned Output
Iterative workflows with versioned directories (_v1, _v2) so you never overwrite previous results.
FAQ
Why does my LLM miss details in screenshots?
Vision models have a maximum image dimension. Anything larger gets silently downscaled before the model sees it. A 3,600 x 20,220px screenshot becomes 279 x 1,568px, losing 99.4% of pixels. Buttons, labels, and fine UI details turn into unreadable noise.
What presets are included?
It ships presets for Claude (1,092px tiles), ChatGPT (768px), Gemini 1.5 (768px), and Gemini 2 (1,536px). You can also pass a custom tile size for any other model.
Does it work with URLs, not just local files?
Yes. It captures full-page screenshots from any URL via headless Chrome with scroll-stitching support up to 200,000px tall. Mobile viewport, 2x retina, and mobile user agent are all configurable.
Will it waste tokens on blank tiles?
No. Each tile is analyzed locally via Sharp for entropy-based content density: blank, low-detail, mixed, or high-detail. Blank and low-detail tiles are flagged so your LLM can skip them and spend tokens only where it matters.
What input formats does it accept?
Local file paths, remote URLs, base64 strings, and data URIs. Output is WebP by default or lossless PNG. Iterative runs write to versioned directories (_v1, _v2) so previous results are never overwritten.
Can I use it for visual QA: capture, analyze, fix, re-verify?
That's the primary workflow. Point it at a URL, tile the result, and let your LLM flag layout issues, misaligned elements, or broken styling at full resolution. Fix the code, re-capture, and the server writes to a new versioned directory (_v2, _v3, ...) so you can compare before and after without losing prior runs.
Can it test my site's mobile layout?
Yes. Ask to capture in mobile view and the server sets a 390px viewport, 2x retina scale, and a mobile Safari user agent. Sites that detect the UA or touch capability serve their actual mobile HTML, so the LLM reviews exactly what a phone user sees, not just a narrowed desktop layout.
Can I get just a screenshot without tiling?
Yes. Ask for a screenshot and stop after the comparison step. The server returns the full-page capture and the interactive HTML preview without slicing the image into tiles.