Self-Host Firecrawl: Complete Deployment Guide
Firecrawl turns any website into structured JSON, combining a crawl pipeline, text cleanup, and LLM-powered summarization. Running it yourself lets you avoid rate limits, keep crawl data private, and tune concurrency for your backlog of sites.
Architecture Overview
An instance bundles several services:
- Firecrawl API – exposes REST endpoints for crawl jobs and returns normalized documents.
- Worker – queues and executes crawl jobs, calling browser sessions and enrichment models.
- Browser sandbox – a headless Chromium runtime (Playwright or Browserless) used for rendering.
- Redis – task queue backbone (BullMQ) and dedupe cache.
- Postgres – persistence layer for jobs, crawl results, and webhooks.
Keep every component on the same private network; only the API needs public exposure (often behind a reverse proxy).
Prerequisites
- x86_64 machine with 4 vCPUs, 8 GB RAM, Docker 24+, and docker-compose plugin.
- Domain pointing at your server (for TLS) and ports 80/443 open.
- Node.js 20+ and pnpm 8+ if you prefer building from source instead of containers.
- API keys for the LLM providers you plan to use (
OPENAI_API_KEY
,ANTHROPIC_API_KEY
, orGEMINI_API_KEY
). - Optional:
FIRECRAWL_DEFAULT_USER_TOKEN
if you want a pre-provisioned API key.
Configure Environment
- Clone the project:
git clone https://github.com/mendableai/firecrawl.git cd firecrawl cp apps/api/.env.example .env
- Edit
.env
and set the essentials:DATABASE_URL=postgresql://firecrawl:supersecret@postgres:5432/firecrawl REDIS_URL=redis://redis:6379 FIRECRAWL_URL=https://firecrawl.yourdomain.com FIRECRAWL_DEFAULT_USER_TOKEN=super-long-random-string OPENAI_API_KEY=sk-... SCRAPER_BROWSER=playwright
- For production, also set
FIRECRAWL_MAX_CONCURRENCY
,ALLOWED_ORIGINS
, and webhook secrets as needed. Avoid checking the file into Git.
Compose Deployment
Place the following as docker-compose.yml
at the repo root:
version: "3.9"
services:
postgres:
image: postgres:16
environment:
POSTGRES_USER: firecrawl
POSTGRES_PASSWORD: supersecret
POSTGRES_DB: firecrawl
volumes:
- postgres-data:/var/lib/postgresql/data
redis:
image: redis:7
command: redis-server --save 60 1 --loglevel warning
volumes:
- redis-data:/data
browserless:
image: ghcr.io/browserless/chrome:latest
environment:
- "MAX_CONCURRENT_SESSIONS=5"
shm_size: 2gb
api:
build:
context: .
dockerfile: apps/api/Dockerfile
env_file: .env
depends_on:
- postgres
- redis
- browserless
ports:
- "3000:3000"
worker:
build:
context: .
dockerfile: apps/worker/Dockerfile
env_file: .env
depends_on:
- api
- redis
- browserless
volumes:
postgres-data:
redis-data:
Bring everything up:
docker compose up -d --build
Run migrations once:
docker compose exec api pnpm prisma migrate deploy
Verify and Use the API
- Confirm health:
curl https://firecrawl.yourdomain.com/health
- Create an API token (if you did not set
FIRECRAWL_DEFAULT_USER_TOKEN
):docker compose exec api pnpm ts-node scripts/create-user-token.ts --email you@example.com
- Execute a crawl:
curl -X POST https://firecrawl.yourdomain.com/v1/crawl \ -H "Authorization: Bearer $FIRECRAWL_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://docs.firecrawl.dev", "depth": 1, "includeScreenshots": false}'
Monitor progress in Redis (docker compose exec redis redis-cli monitor
) or via the /v1/crawl/{jobId}
endpoint.
Reverse Proxy & TLS
Point an edge proxy like Caddy or Nginx to the API container. Example Caddyfile snippet:
firecrawl.yourdomain.com {
encode gzip
reverse_proxy localhost:3000
header {
Strict-Transport-Security "max-age=31536000; includeSubDomains"
}
}
This keeps TLS management at the proxy while the internal network stays private.
Operations Checklist
- Scaling: Increase
worker
replicas orMAX_CONCURRENT_SESSIONS
if throughput is low. - Storage: Schedule Postgres dumps (
pg_dump
) and prune Redis keys for completed jobs. - Observability: Forward container logs to Loki or Vector and enable metrics with
PROMETHEUS_METRICS=true
. - Upgrades: Pull upstream changes, re-build with
docker compose pull && docker compose up -d
. - Security: Rotate API tokens quarterly and lock ingress with Cloudflare tunnels or IP allowlists.
Troubleshooting
| Symptom | Likely Cause | Fix |
| --- | --- | --- |
| ECONNREFUSED
from crawler | Browserless container missing | Check docker compose ps
and increase shm_size
to 2g |
| Jobs stuck in PENDING
| Redis unreachable or queue blocked | Ensure redis-cli ping
returns PONG and redeploy worker |
| HTML returning empty JSON | Page needs JS rendering | Confirm SCRAPER_BROWSER=playwright
and that browserless has enough memory |
| 401 from API | Invalid token header | Prefix with Bearer
and verify token still active |
| High CPU usage | Concurrency too high | Lower FIRECRAWL_MAX_CONCURRENCY
or scale horizontally |
With this setup, Firecrawl runs entirely in your environment, letting you orchestrate crawls, feed downstream RAG pipelines, and comply with internal governance policies.