Self-Host Firecrawl: Complete Deployment Guide

Firecrawl turns any website into structured JSON, combining a crawl pipeline, text cleanup, and LLM-powered summarization. Running it yourself lets you avoid rate limits, keep crawl data private, and tune concurrency for your backlog of sites.

Architecture Overview

An instance bundles several services:

  • Firecrawl API – exposes REST endpoints for crawl jobs and returns normalized documents.
  • Worker – queues and executes crawl jobs, calling browser sessions and enrichment models.
  • Browser sandbox – a headless Chromium runtime (Playwright or Browserless) used for rendering.
  • Redis – task queue backbone (BullMQ) and dedupe cache.
  • Postgres – persistence layer for jobs, crawl results, and webhooks.

Keep every component on the same private network; only the API needs public exposure (often behind a reverse proxy).

Prerequisites

  • x86_64 machine with 4 vCPUs, 8 GB RAM, Docker 24+, and docker-compose plugin.
  • Domain pointing at your server (for TLS) and ports 80/443 open.
  • Node.js 20+ and pnpm 8+ if you prefer building from source instead of containers.
  • API keys for the LLM providers you plan to use (OPENAI_API_KEY, ANTHROPIC_API_KEY, or GEMINI_API_KEY).
  • Optional: FIRECRAWL_DEFAULT_USER_TOKEN if you want a pre-provisioned API key.

Configure Environment

  1. Clone the project:
    git clone https://github.com/mendableai/firecrawl.git
    cd firecrawl
    cp apps/api/.env.example .env
    
  2. Edit .env and set the essentials:
    DATABASE_URL=postgresql://firecrawl:supersecret@postgres:5432/firecrawl
    REDIS_URL=redis://redis:6379
    FIRECRAWL_URL=https://firecrawl.yourdomain.com
    FIRECRAWL_DEFAULT_USER_TOKEN=super-long-random-string
    OPENAI_API_KEY=sk-...
    SCRAPER_BROWSER=playwright
    
  3. For production, also set FIRECRAWL_MAX_CONCURRENCY, ALLOWED_ORIGINS, and webhook secrets as needed. Avoid checking the file into Git.

Compose Deployment

Place the following as docker-compose.yml at the repo root:

version: "3.9"
services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: firecrawl
      POSTGRES_PASSWORD: supersecret
      POSTGRES_DB: firecrawl
    volumes:
      - postgres-data:/var/lib/postgresql/data
  redis:
    image: redis:7
    command: redis-server --save 60 1 --loglevel warning
    volumes:
      - redis-data:/data
  browserless:
    image: ghcr.io/browserless/chrome:latest
    environment:
      - "MAX_CONCURRENT_SESSIONS=5"
    shm_size: 2gb
  api:
    build:
      context: .
      dockerfile: apps/api/Dockerfile
    env_file: .env
    depends_on:
      - postgres
      - redis
      - browserless
    ports:
      - "3000:3000"
  worker:
    build:
      context: .
      dockerfile: apps/worker/Dockerfile
    env_file: .env
    depends_on:
      - api
      - redis
      - browserless
volumes:
  postgres-data:
  redis-data:

Bring everything up:

docker compose up -d --build

Run migrations once:

docker compose exec api pnpm prisma migrate deploy

Verify and Use the API

  1. Confirm health:
    curl https://firecrawl.yourdomain.com/health
    
  2. Create an API token (if you did not set FIRECRAWL_DEFAULT_USER_TOKEN):
    docker compose exec api pnpm ts-node scripts/create-user-token.ts --email you@example.com
    
  3. Execute a crawl:
    curl -X POST https://firecrawl.yourdomain.com/v1/crawl \
      -H "Authorization: Bearer $FIRECRAWL_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"url": "https://docs.firecrawl.dev", "depth": 1, "includeScreenshots": false}'
    

Monitor progress in Redis (docker compose exec redis redis-cli monitor) or via the /v1/crawl/{jobId} endpoint.

Reverse Proxy & TLS

Point an edge proxy like Caddy or Nginx to the API container. Example Caddyfile snippet:

firecrawl.yourdomain.com {
  encode gzip
  reverse_proxy localhost:3000
  header { 
    Strict-Transport-Security "max-age=31536000; includeSubDomains"
  }
}

This keeps TLS management at the proxy while the internal network stays private.

Operations Checklist

  • Scaling: Increase worker replicas or MAX_CONCURRENT_SESSIONS if throughput is low.
  • Storage: Schedule Postgres dumps (pg_dump) and prune Redis keys for completed jobs.
  • Observability: Forward container logs to Loki or Vector and enable metrics with PROMETHEUS_METRICS=true.
  • Upgrades: Pull upstream changes, re-build with docker compose pull && docker compose up -d.
  • Security: Rotate API tokens quarterly and lock ingress with Cloudflare tunnels or IP allowlists.

Troubleshooting

| Symptom | Likely Cause | Fix | | --- | --- | --- | | ECONNREFUSED from crawler | Browserless container missing | Check docker compose ps and increase shm_size to 2g | | Jobs stuck in PENDING | Redis unreachable or queue blocked | Ensure redis-cli ping returns PONG and redeploy worker | | HTML returning empty JSON | Page needs JS rendering | Confirm SCRAPER_BROWSER=playwright and that browserless has enough memory | | 401 from API | Invalid token header | Prefix with Bearer and verify token still active | | High CPU usage | Concurrency too high | Lower FIRECRAWL_MAX_CONCURRENCY or scale horizontally |

With this setup, Firecrawl runs entirely in your environment, letting you orchestrate crawls, feed downstream RAG pipelines, and comply with internal governance policies.