image-tiler-mcp-server

MCP Server - MIT - Node 20+

Stop LLMs from crushing your screenshots. Split large images into optimally-sized tiles so Claude, ChatGPT, and Gemini process full-resolution content, not a blurry downscale.

The server generates an interactive HTML preview for every image, showing per-model tile grids and token estimates
The server generates an interactive HTML preview for every image, showing per-model tile grids and token estimates

Model Presets

Updated 03/01/2026

ModelTile SizeTokens / TileMax Dimension
Claude1,092px~1,5901,568px
OpenAI768px~7652,048px
Gemini 31,536px~1,1203,072px
Gemini768px~258768px

Claude

Tile: 1,092px

Tokens: ~1,590

Max: 1,568px

OpenAI

Tile: 768px

Tokens: ~765

Max: 2,048px

Gemini 3

Tile: 1,536px

Tokens: ~1,120

Max: 3,072px

Gemini

Tile: 768px

Tokens: ~258

Max: 768px

How It Works

Phase 1
// Send just the image, get a comparison table
tiler(filePath: "screenshot.png")
Model comparison table with token estimates
outputDir: "tiles/screenshot_v1/"
Phase 2
// Pick a preset, get tiles with metadata
tiler(
filePath: "screenshot.png",
preset: "claude",
outputDir: "tiles/screenshot_v1/"
)

Features

Full-Resolution Vision

Process every pixel at native resolution instead of letting LLMs auto-downscale your screenshots into mush.

Web Page Capture

Capture full-page screenshots from URLs via headless Chrome with scroll-stitching up to 200,000px.

Token-Efficient

Per-tile content analysis (blank, low-detail, high-detail) so you skip empty tiles and save tokens.

Mobile Emulation

Test responsive designs with mobile viewport, 2x retina, and mobile user agent.

Multi-Format

Tile local files, remote URLs, base64, and data URIs. Output as WebP or lossless PNG.

Versioned Output

Iterative workflows with versioned directories (_v1, _v2) so you never overwrite previous results.

FAQ

Why does my LLM miss details in screenshots?

Vision models have a maximum image dimension. Anything larger gets silently downscaled before the model sees it. A 3,600 x 20,220px screenshot becomes 279 x 1,568px, losing 99.4% of pixels. Buttons, labels, and fine UI details turn into unreadable noise.

What presets are included?

It ships presets for Claude (1,092px tiles), ChatGPT (768px), Gemini 1.5 (768px), and Gemini 2 (1,536px). You can also pass a custom tile size for any other model.

Does it work with URLs, not just local files?

Yes. It captures full-page screenshots from any URL via headless Chrome with scroll-stitching support up to 200,000px tall. Mobile viewport, 2x retina, and mobile user agent are all configurable.

Will it waste tokens on blank tiles?

No. Each tile is analyzed locally via Sharp for entropy-based content density: blank, low-detail, mixed, or high-detail. Blank and low-detail tiles are flagged so your LLM can skip them and spend tokens only where it matters.

What input formats does it accept?

Local file paths, remote URLs, base64 strings, and data URIs. Output is WebP by default or lossless PNG. Iterative runs write to versioned directories (_v1, _v2) so previous results are never overwritten.

Can I use it for visual QA: capture, analyze, fix, re-verify?

That's the primary workflow. Point it at a URL, tile the result, and let your LLM flag layout issues, misaligned elements, or broken styling at full resolution. Fix the code, re-capture, and the server writes to a new versioned directory (_v2, _v3, ...) so you can compare before and after without losing prior runs.

Can it test my site's mobile layout?

Yes. Ask to capture in mobile view and the server sets a 390px viewport, 2x retina scale, and a mobile Safari user agent. Sites that detect the UA or touch capability serve their actual mobile HTML, so the LLM reviews exactly what a phone user sees, not just a narrowed desktop layout.

Can I get just a screenshot without tiling?

Yes. Ask for a screenshot and stop after the comparison step. The server returns the full-page capture and the interactive HTML preview without slicing the image into tiles.