API Access

Manage keys and run conversions over HTTP

Generate a bearer token for your account, submit a documentation URL, poll crawl status, unlock paid crawls, and download the consolidated Markdown artifact once it is ready.

Sign in to create or rotate your API key.

Intro

Use the API when you want repeatable, tool-friendly crawl workflows

Automate documentation intake

Create conversions from scripts, CI jobs, or internal tools instead of driving each crawl from the browser.

Keep credit usage predictable

Inspect discovery results before you start a paid crawl, then decide whether to unlock it with your available balance.

Pull finished Markdown directly

Once a crawl is complete, stream the consolidated Markdown artifact straight into downstream LLM or knowledge-base workflows.

The API mirrors the same discovery, unlock, crawl, and download lifecycle used by the web app, so API clients and browser users stay aligned on pricing, access control, and final artifacts.

🔐 API keys must stay server-side

Never call the API from browser JavaScript or mobile apps where the key can be extracted. Always use a backend service to proxy requests and keep keys secret.

Core Concepts

Use the same vocabulary as the API responses

These terms appear throughout the response payloads and lifecycle descriptions below.

Term Meaning
Seed URL The documentation root URL you submit to create a conversion.
Conversion The tracked workflow that discovers pages, optionally unlocks a paid crawl, and produces a Markdown artifact.
Tracking ID The identifier you use to poll status, start a crawl, and download the result for your account.
Seed ID The internal identifier for the underlying canonical seed. It is returned for correlation, but clients should key workflow logic off tracking_id.
Discovery The initial phase that identifies pages, counts billable pages, and determines whether a paid unlock is needed.
Unlock The moment credits are deducted to release on-hold pages for a paid crawl.
Artifact The final consolidated Markdown file returned by the download endpoint.

Quick Start

See the flow in three lines

This section is intentionally high-level. It shows the three requests you make in sequence so you can immediately understand the lifecycle. In production, poll the status endpoint between these steps and skip the paid start when the conversion becomes downloadable on its own.

  1. 1. Call POST /api/v1/conversions to begin discovery and get a tracking_id.
  2. 2. Poll GET /api/v1/conversions/{tracking_id} until the status becomes ready_to_crawl, download_available, completed, or error.
  3. 3. Call /start only for paid ready_to_crawl conversions, then download when the status is download_available or completed.

Submit a seed URL

Create a conversion from the documentation root URL you want to turn into Markdown. Creating the conversion starts discovery and does not deduct credits.

curl -X POST https://sitetomarkdown.com/api/v1/conversions \
  -H "Authorization: Bearer your_api_key_here" \
  -H "Content-Type: application/json" \
  --data '{"url":"https://docs.example.com/"}'

Start the crawl

Once discovery reaches ready_to_crawl, start the paid crawl if the conversion is not already downloadable.

# Poll GET /conversions/tracking_id_here until status is ready_to_crawl before this step.
curl -X POST \
  -H "Authorization: Bearer your_api_key_here" \
  https://sitetomarkdown.com/api/v1/conversions/tracking_id_here/start

Download the Markdown

When the crawl finishes, download the final consolidated Markdown artifact.

curl -H "Authorization: Bearer your_api_key_here" \
  https://sitetomarkdown.com/api/v1/conversions/tracking_id_here/download \
  -o docs.md

The detailed handling of tracking IDs, status checks, and production-ready flow control is covered later in the comprehensive example and endpoint reference.

Authentication

Every endpoint uses bearer authentication

Sign in to your SiteToMarkdown account, open this API Access page, and create or rotate a key. Send that key in the Authorization header on every API request.

Authorization: Bearer stmd_live_...

Requests without that header return 401 with missing_bearer_token. Requests with the wrong key return 401 with invalid_api_key.

🔐 Do not expose API keys in browser or mobile code

Always store API keys server-side. If your integration uses a JavaScript frontend, proxy requests through a backend service to hide the key from the client.

Base URL

https://sitetomarkdown.com/api/v1

Lifecycle

Choose the next request based on the returned status

A conversion starts with discovery. Discovery is the free phase that identifies pages, calculates billable work, and decides whether the crawl can be downloaded immediately or needs a paid unlock. The seed page can finish early, so pages.converted can be greater than zero before a paid crawl starts.

Treat download_available and completed as terminal downloadable states. Treat error as a terminal failure state and stop polling instead of looping indefinitely.

Status Meaning Client action
in_progress Discovery is still running. SiteToMarkdown is finding pages and calculating cost. Poll GET /api/v1/conversions/{tracking_id} again after a short delay.
ready_to_crawl Discovery is complete and the crawl cost is known, but paid pages are still on hold. Inspect pages and credits, then call POST /api/v1/conversions/{tracking_id}/start if you want to unlock the crawl.
crawling Credits were deducted and released pages are actively being fetched and converted. Continue polling until the artifact is ready.
download_available The Markdown artifact is ready and the authenticated account can download it now. Call GET /api/v1/conversions/{tracking_id}/download immediately.
completed The crawl has already been unlocked and completed. Treat it as a downloadable finished conversion. Call GET /api/v1/conversions/{tracking_id}/download. You do not need to call /start again.
error The seed or single-page conversion hit a processing error before an artifact became available. Stop polling and retry later with a new conversion if the source issue is resolved.

Billing and Credits

Credits are spent only when a paid crawl is unlocked

  • Creating a conversion starts discovery and does not deduct credits.
  • Discovery determines the discovered page count, billable page count, and crawl cost before a paid unlock happens.
  • Credits are deducted only when POST /api/v1/conversions/{tracking_id}/start succeeds for a paid crawl.
  • Free seeds, previously unlocked conversions, and single-page conversions can skip the paid start step and move directly toward download availability.
  • If crawl access was already unlocked for your account, repeating POST /api/v1/conversions/{tracking_id}/start returns success with deducted_cents set to 0.
  • All *_cents fields are integer USD cents. Formatted fields such as crawl_cost and credit_balance are display-only strings.
Field Meaning
credit_balance_cents Integer cent balance for programmatic decisions. Prefer this over formatted display strings in automation.
pages.discovered Total pages currently associated with the seed during or after discovery.
pages.billable Pages that would count toward paid crawl cost for the current account.
pages.converted Pages that have already been converted. The seed page may finish early, so this can be greater than zero before a paid crawl starts.
pages.on_hold Discovered child pages that are waiting for a paid unlock.
cost.crawl_cost_cents Total cost in cents to unlock the current paid crawl.
credits.covers_cost Whether the authenticated account balance is enough to unlock the crawl immediately.
page_equivalent_credits Whole billable pages your current balance can cover, rounded down.

Conversion Reuse

POST /conversions can reuse an existing canonical seed

SiteToMarkdown first looks for an exact seed match for the submitted URL. If none exists, it can reuse an already known parent or root seed for the same documentation tree instead of creating a brand-new crawl root.

The tracking ID remains scoped to the authenticated account. Repeating POST /api/v1/conversions for the same account and the same canonical seed returns the same tracking ID. A different account can still receive its own tracking ID for that same underlying seed.

A reused conversion may come back as in_progress, ready_to_crawl, download_available, completed, or error depending on the current crawl state and whether your account already has access.

Polling and Rate Limits

Poll conservatively and expect a short WAF block if you burst too hard

SiteToMarkdown is polling-only currently. Webhooks are not available yet, so clients should keep polling GET /api/v1/conversions/{tracking_id} every 5 to 10 seconds while the status is in_progress or crawling.

API traffic is rate limited by the WAF to 10 requests every 10 seconds from the same IP address. Exceeding that limit returns HTTP 429 Too Many Requests and blocks the IP address for 10 seconds before requests are accepted again.

That limit is shared by anything leaving through the same egress IP, so CI workers, serverless jobs, or multiple tenants behind one NAT can affect each other. If a Retry-After header is present on the response, treat it as the authoritative backoff hint.

Design clients to back off immediately when 429 responses start appearing during a burst. The comprehensive example below uses a 5-second polling interval, which stays comfortably below the documented limit for a single active conversion.

Operational Limits

Plan around retention as well as request pacing

Completed Markdown artifacts are retained for 90 days. After expiration, the download endpoint returns 410 artifact_expired. Do not keep polling; submit the seed URL again to create a fresh conversion.

The download endpoint streams the artifact as text/markdown, UTF-8 encoded, with a Content-Disposition attachment header while retained and accessible.

Submitted URLs must be absolute http or https documentation roots that are publicly reachable by SiteToMarkdown.

Crawls stay within the same documentation tree: pages under the submitted path, other paths on the same host, or the host root. Different hosts and subdomains are not crawled.

Private or authenticated documentation is not supported unless it is also publicly reachable to the crawler.

Webhooks are not available. Use polling against GET /api/v1/conversions/{tracking_id} instead.

Crawl scope examples

Submitted seed Linked URL Crawled? Reason
https://docs.example.com/ https://docs.example.com/guides/ Yes Same host, child path
https://docs.example.com/product-a/ https://docs.example.com/product-b/ Maybe Included only if linked and within the same tree
https://docs.example.com/ https://api.example.com/ No Different subdomain
https://docs.example.com/ https://github.com/example/repo No External domain
https://docs.example.com/page https://docs.example.com/page#section1 and #section2 One page URL fragments do not create separate pages

API reference

Reference

Each endpoint below includes its purpose, required parameters, and working request examples in curl, Python, and TypeScript.

GET /api/v1/balance

Get credit balance

Returns the authenticated account, current wallet balance, and the approximate number of whole billable pages that balance can cover at the current per-page rate.

curl -H "Authorization: Bearer your_api_key_here" \
  https://sitetomarkdown.com/api/v1/balance

Example responses

200 OK

Successful balance lookup.

{
    "user": {
        "id": 42,
        "email": "[email protected]"
    },
    "credit_balance_cents": 1350,
    "credit_balance": "$13.50",
    "page_equivalent_credits": 337,
    "cost_per_page_cents": 4
}

401 missing_bearer_token

Returned when the Authorization: Bearer <token> header is missing.

{
    "error": {
        "code": "missing_bearer_token",
        "message": "A bearer token is required."
    }
}
POST /api/v1/conversions

Create or reuse a conversion

Accepts a seed documentation URL, starts discovery without charging credits, reuses an existing canonical seed when possible, and returns a tracking ID scoped to the authenticated user.

Parameter Location Description
url JSON body Absolute http or https documentation root URL that is publicly reachable by SiteToMarkdown.
curl -X POST https://sitetomarkdown.com/api/v1/conversions \
  -H "Authorization: Bearer your_api_key_here" \
  -H "Content-Type: application/json" \
  --data '{"url":"https://docs.example.com/"}'

Example responses

202 Accepted

A seed URL was accepted and a tracking record was created or reused. Use tracking_id for all follow-up requests. seed_id is informational.

{
    "tracking_id": "trk_a1b2c3d4e5f6",
    "seed_id": 1182,
    "seed_url": "https://docs.example.com/",
    "status": "in_progress",
    "message": "Seed accepted. Poll the status endpoint for page count and cost."
}

422 invalid_url

The URL is missing or not a valid absolute URL.

{
    "error": {
        "code": "invalid_url",
        "message": "Provide a valid absolute URL."
    }
}
GET /api/v1/conversions/{tracking_id}

Inspect conversion status

Returns discovery progress, billable page counts, crawl cost, credit coverage, and whether the Markdown artifact is already downloadable.

Parameter Location Description
tracking_id Path Tracking identifier returned when the conversion was created.
curl -H "Authorization: Bearer your_api_key_here" \
  https://sitetomarkdown.com/api/v1/conversions/tracking_id_here

Example responses

200 ready_to_crawl

Discovery is complete and the crawl cost is known, but credits have not been spent yet.

{
    "tracking_id": "trk_a1b2c3d4e5f6",
    "seed_url": "https://docs.example.com/",
    "status": "ready_to_crawl",
    "pages": {
        "discovered": 18,
        "billable": 18,
        "converted": 1,
        "on_hold": 17
    },
    "cost": {
        "crawl_cost_cents": 72,
        "crawl_cost": "$0.72",
        "cost_per_page_cents": 4
    },
    "credits": {
        "balance_cents": 35,
        "covers_cost": false,
        "shortfall_cents": 37
    },
    "download": {
        "available": false,
        "url": null
    }
}

200 download_available

The crawl is finished and the authenticated account can download the Markdown artifact.

{
    "tracking_id": "trk_done12345678",
    "seed_url": "https://docs.example.com/",
    "status": "download_available",
    "pages": {
        "discovered": 18,
        "billable": 18,
        "converted": 18,
        "on_hold": 0
    },
    "cost": {
        "crawl_cost_cents": 72,
        "crawl_cost": "$0.72",
        "cost_per_page_cents": 4
    },
    "credits": {
        "balance_cents": 700,
        "covers_cost": true,
        "shortfall_cents": 0
    },
    "download": {
        "available": true,
        "url": "https://sitetomarkdown.com/api/v1/conversions/trk_done12345678/download"
    }
}

200 completed

The conversion is complete and still uses the same response shape as download_available.

{
    "tracking_id": "trk_done12345678",
    "seed_url": "https://docs.example.com/",
    "status": "completed",
    "pages": {
        "discovered": 18,
        "billable": 18,
        "converted": 18,
        "on_hold": 0
    },
    "cost": {
        "crawl_cost_cents": 72,
        "crawl_cost": "$0.72",
        "cost_per_page_cents": 4
    },
    "credits": {
        "balance_cents": 700,
        "covers_cost": true,
        "shortfall_cents": 0
    },
    "download": {
        "available": true,
        "url": "https://sitetomarkdown.com/api/v1/conversions/trk_done12345678/download"
    }
}

200 error

The conversion reached a terminal failure state before a downloadable artifact was produced.

{
    "tracking_id": "trk_failed123456",
    "seed_url": "https://docs.example.com/",
    "status": "error",
    "pages": {
        "discovered": 0,
        "billable": 0,
        "converted": 0,
        "on_hold": 0
    },
    "cost": {
        "crawl_cost_cents": 0,
        "crawl_cost": "$0.00",
        "cost_per_page_cents": 4
    },
    "credits": {
        "balance_cents": 1350,
        "covers_cost": true,
        "shortfall_cents": 0
    },
    "download": {
        "available": false,
        "url": null
    }
}

404 tracking_not_found

The tracking ID does not belong to the authenticated account or no longer exists.

{
    "error": {
        "code": "tracking_not_found",
        "message": "No conversion was found for this tracking ID."
    }
}
POST /api/v1/conversions/{tracking_id}/start

Unlock and start a paid crawl

Deducts credits when required, releases discovered child pages from on_hold, and transitions the crawl into active processing. Repeating the request after access is already unlocked returns success with no extra deduction.

Parameter Location Description
tracking_id Path Tracking identifier for a conversion that is usually ready_to_crawl. If the conversion is already unlocked or does not need a paid crawl, the endpoint can instead return start_not_required.
curl -X POST \
  -H "Authorization: Bearer your_api_key_here" \
  https://sitetomarkdown.com/api/v1/conversions/tracking_id_here/start

Example responses

200 crawling

Credits were deducted and on-hold pages were released for crawling.

{
    "tracking_id": "trk_a1b2c3d4e5f6",
    "status": "crawling",
    "message": "Crawl started successfully.",
    "credits": {
        "deducted_cents": 72,
        "remaining_balance_cents": 628
    }
}

200 already_unlocked

Repeated start requests are safe after access is already unlocked for the authenticated account.

{
    "tracking_id": "trk_done12345678",
    "status": "download_available",
    "message": "Crawl access was already unlocked and the download is ready.",
    "credits": {
        "deducted_cents": 0,
        "remaining_balance_cents": 700
    }
}

409 start_not_required

Free, single-page, or already-downloadable conversions do not require a paid start. Treat this as a recoverable state, then poll or download based on the returned status.

{
    "error": {
        "code": "start_not_required",
        "message": "This conversion does not require a paid start. Poll status or use the download endpoint when available.",
        "details": {
            "tracking_id": "trk_free12345678",
            "status": "download_available"
        }
    }
}

409 conversion_not_ready

The conversion is still in discovery, so there is nothing to unlock yet.

{
    "error": {
        "code": "conversion_not_ready",
        "message": "This conversion is not ready to start yet.",
        "details": {
            "tracking_id": "trk_a1b2c3d4e5f6",
            "status": "in_progress"
        }
    }
}

422 insufficient_credits

The authenticated account does not have enough balance to unlock the crawl.

{
    "error": {
        "code": "insufficient_credits",
        "message": "Your wallet balance does not cover this crawl cost.",
        "details": {
            "tracking_id": "trk_a1b2c3d4e5f6",
            "status": "ready_to_crawl",
            "credits": {
                "balance_cents": 35,
                "required_cents": 72,
                "shortfall_cents": 37
            }
        }
    }
}
GET /api/v1/conversions/{tracking_id}/download

Download the consolidated Markdown

Streams the finished Markdown file as text/markdown, UTF-8 encoded, with a Content-Disposition: attachment header when the artifact exists and the authenticated account has access. Error responses use application/json.

Parameter Location Description
tracking_id Path Tracking identifier for a conversion whose status is download_available or completed.
curl -H "Authorization: Bearer your_api_key_here" \
  https://sitetomarkdown.com/api/v1/conversions/tracking_id_here/download \
  -o docs.md

Example responses

200 text/markdown

Successful download responses stream the Markdown file body rather than JSON.

# Example response body
# Product Documentation

## Introduction

This Markdown file is the consolidated crawl artifact.

409 crawl_not_complete

The crawl has not finished yet, so there is no downloadable artifact.

{
    "error": {
        "code": "crawl_not_complete",
        "message": "The markdown download is not ready yet.",
        "details": {
            "tracking_id": "trk_a1b2c3d4e5f6",
            "status": "crawling"
        }
    }
}

410 artifact_expired

The crawl finished previously, but the retained artifact has expired and must be regenerated.

{
    "error": {
        "code": "artifact_expired",
        "message": "This Markdown artifact is no longer retained. Submit the seed URL again to create a fresh conversion.",
        "details": {
            "tracking_id": "trk_done12345678",
            "retention_days": 90
        }
    }
}

422 access_not_purchased

The crawl exists, but the authenticated account has not unlocked download access.

{
    "error": {
        "code": "access_not_purchased",
        "message": "This crawl exists, but your account has not unlocked download access yet.",
        "details": {
            "tracking_id": "trk_a1b2c3d4e5f6",
            "status": "ready_to_crawl"
        }
    }
}

Error Handling

All API errors use the same JSON envelope

Most errors return a machine-readable code, a human-readable message, and optional details you can log or branch on in client code.

{
    "error": {
        "code": "machine_readable_code",
        "message": "Human-readable explanation.",
        "details": {}
    }
}

The details object is present only when the endpoint has extra structured context to return. Successful download responses are the exception: they stream text/markdown instead of JSON. WAF rate-limit responses still use HTTP 429, but they are enforced upstream and should not be assumed to match this JSON envelope.

Client code should branch on error.code rather than only the HTTP status. For example, 409 start_not_required is usually recoverable, while 409 conversion_not_ready means you should keep polling.

HTTP Code Meaning Suggested handling
401 missing_bearer_token The request did not include an Authorization: Bearer <token> header. Send your API key in the Authorization header and retry.
401 invalid_api_key The bearer token was present but did not match an active API key. Rotate or correct the API key before retrying.
404 tracking_not_found The tracking ID does not belong to the authenticated account or no longer exists. Check that you are using the right tracking ID under the right account.
409 conversion_not_ready The conversion is still in discovery and cannot be started yet. Keep polling the status endpoint until it reaches ready_to_crawl, download_available, or completed.
409 start_not_required The conversion is free, single-page, already downloadable, or otherwise does not need a paid start. Poll status or call the download endpoint when the artifact is ready.
409 crawl_not_complete The artifact is not ready yet. Keep polling the status endpoint instead of downloading.
410 artifact_expired The crawl finished, but the retained download artifact has expired. Do not keep polling this tracking ID for download readiness. Submit the seed URL again to create a fresh conversion.
422 missing_url The JSON body did not include a url value. Send a JSON object with an absolute url field.
422 invalid_url The supplied URL was not a valid absolute URL. Validate the URL before creating the conversion.
422 insufficient_credits The account balance is too low to unlock the crawl. Add credits or choose a smaller/free conversion before retrying /start.
422 access_not_purchased The crawl exists, but the authenticated account has not unlocked download access yet. Call POST /api/v1/conversions/{tracking_id}/start first if the conversion is paid and ready_to_crawl.
429 too_many_requests The WAF rate limit was exceeded for your source IP. Back off immediately, honor Retry-After if it is present, and remember that multiple clients behind the same egress IP share the same limit.
500 start_failed The service could not unlock the crawl right now. Poll status before retrying. If the conversion already moved to crawling, download_available, or completed, do not issue another /start call. Repeated /start calls after unlock are safe and return deducted_cents: 0; credits are only deducted on the first successful unlock for the same conversion.

Comprehensive Example

Use the full lifecycle script when you need a production-ready flow

These examples include balance checking, status polling, and the logic needed to wait for discovery before starting a paid crawl.

The TypeScript example assumes a server-side runtime with global fetch, such as Node.js 18+, Deno, Bun, or an edge function. Never use API keys in client-side browser code.

#!/usr/bin/env bash
set -euo pipefail

API_KEY="your_api_key_here"
BASE_URL="https://sitetomarkdown.com/api/v1"
SEED_URL="https://docs.example.com/"

# Step 1: check your available balance before creating a crawl.
curl -s \
    -H "Authorization: Bearer $API_KEY" \
    "$BASE_URL/balance"

# Step 2: create a conversion and capture the returned tracking ID.
create_response=$(curl -s \
    -X POST "$BASE_URL/conversions" \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    --data "{\"url\":\"$SEED_URL\"}")

tracking_id=$(printf '%s' "$create_response" | jq -r '.tracking_id')
started=false
echo "Tracking ID: $tracking_id"

# Step 3: poll until the conversion is ready to download or ready to start.
while true; do
    status_response=$(curl -s \
        -H "Authorization: Bearer $API_KEY" \
        "$BASE_URL/conversions/$tracking_id")

    status=$(printf '%s' "$status_response" | jq -r '.status')
    echo "Current status: $status"

    if [[ "$status" == "ready_to_crawl" && "$started" == false ]]; then
        # Step 4: start the paid crawl once discovery is complete.
        start_response=$(curl -s \
            -X POST \
            -H "Authorization: Bearer $API_KEY" \
            "$BASE_URL/conversions/$tracking_id/start")

        start_code=$(printf '%s' "$start_response" | jq -r '.error.code // empty')

        if [[ "$start_code" == "start_not_required" ]]; then
            echo "Start not required; continuing to poll for download availability."
        else
            echo "$start_response"
            started=true
        fi
    elif [[ "$status" == "download_available" || "$status" == "completed" ]]; then
        break
    elif [[ "$status" == "error" ]]; then
        echo "Conversion failed before a downloadable artifact was produced." >&2
        exit 1
    fi

    sleep 5
done

# Step 5: download the consolidated Markdown artifact.
curl -H "Authorization: Bearer $API_KEY" \
    "$BASE_URL/conversions/$tracking_id/download" \
    -o docs.md