API Access
Manage keys and run conversions over HTTP
Generate a bearer token for your account, submit a documentation URL, poll crawl status, unlock paid crawls, and download the consolidated Markdown artifact once it is ready.
Intro
Use the API when you want repeatable, tool-friendly crawl workflows
Automate documentation intake
Create conversions from scripts, CI jobs, or internal tools instead of driving each crawl from the browser.
Keep credit usage predictable
Inspect discovery results before you start a paid crawl, then decide whether to unlock it with your available balance.
Pull finished Markdown directly
Once a crawl is complete, stream the consolidated Markdown artifact straight into downstream LLM or knowledge-base workflows.
The API mirrors the same discovery, unlock, crawl, and download lifecycle used by the web app, so API clients and browser users stay aligned on pricing, access control, and final artifacts.
🔐 API keys must stay server-side
Never call the API from browser JavaScript or mobile apps where the key can be extracted. Always use a backend service to proxy requests and keep keys secret.
Core Concepts
Use the same vocabulary as the API responses
These terms appear throughout the response payloads and lifecycle descriptions below.
| Term | Meaning |
|---|---|
| Seed URL | The documentation root URL you submit to create a conversion. |
| Conversion | The tracked workflow that discovers pages, optionally unlocks a paid crawl, and produces a Markdown artifact. |
| Tracking ID | The identifier you use to poll status, start a crawl, and download the result for your account. |
| Seed ID | The internal identifier for the underlying canonical seed. It is returned for correlation, but clients should key workflow logic off tracking_id. |
| Discovery | The initial phase that identifies pages, counts billable pages, and determines whether a paid unlock is needed. |
| Unlock | The moment credits are deducted to release on-hold pages for a paid crawl. |
| Artifact | The final consolidated Markdown file returned by the download endpoint. |
Quick Start
See the flow in three lines
This section is intentionally high-level. It shows the three requests you make in sequence so you can immediately understand the lifecycle. In production, poll the status endpoint between these steps and skip the paid start when the conversion becomes downloadable on its own.
- 1. Call
POST /api/v1/conversionsto begin discovery and get atracking_id. - 2. Poll
GET /api/v1/conversions/{tracking_id}until the status becomesready_to_crawl,download_available,completed, orerror. - 3. Call
/startonly for paidready_to_crawlconversions, then download when the status isdownload_availableorcompleted.
Submit a seed URL
Create a conversion from the documentation root URL you want to turn into Markdown. Creating the conversion starts discovery and does not deduct credits.
curl -X POST https://sitetomarkdown.com/api/v1/conversions \
-H "Authorization: Bearer your_api_key_here" \
-H "Content-Type: application/json" \
--data '{"url":"https://docs.example.com/"}'
Start the crawl
Once discovery reaches ready_to_crawl, start the paid crawl if the conversion is not already downloadable.
# Poll GET /conversions/tracking_id_here until status is ready_to_crawl before this step.
curl -X POST \
-H "Authorization: Bearer your_api_key_here" \
https://sitetomarkdown.com/api/v1/conversions/tracking_id_here/start
Download the Markdown
When the crawl finishes, download the final consolidated Markdown artifact.
curl -H "Authorization: Bearer your_api_key_here" \
https://sitetomarkdown.com/api/v1/conversions/tracking_id_here/download \
-o docs.md
The detailed handling of tracking IDs, status checks, and production-ready flow control is covered later in the comprehensive example and endpoint reference.
Authentication
Every endpoint uses bearer authentication
Sign in to your SiteToMarkdown account, open this API Access page, and create or rotate a key. Send that key in the Authorization header on every API request.
Authorization: Bearer stmd_live_...
Requests without that header return 401 with missing_bearer_token. Requests with the wrong key return 401 with invalid_api_key.
🔐 Do not expose API keys in browser or mobile code
Always store API keys server-side. If your integration uses a JavaScript frontend, proxy requests through a backend service to hide the key from the client.
Base URL
https://sitetomarkdown.com/api/v1
Lifecycle
Choose the next request based on the returned status
A conversion starts with discovery. Discovery is the free phase that identifies pages, calculates billable work, and decides whether the crawl can be downloaded immediately or needs a paid unlock. The seed page can finish early, so pages.converted can be greater than zero before a paid crawl starts.
Treat download_available and completed as terminal downloadable states. Treat error as a terminal failure state and stop polling instead of looping indefinitely.
| Status | Meaning | Client action |
|---|---|---|
| in_progress | Discovery is still running. SiteToMarkdown is finding pages and calculating cost. | Poll GET /api/v1/conversions/{tracking_id} again after a short delay. |
| ready_to_crawl | Discovery is complete and the crawl cost is known, but paid pages are still on hold. | Inspect pages and credits, then call POST /api/v1/conversions/{tracking_id}/start if you want to unlock the crawl. |
| crawling | Credits were deducted and released pages are actively being fetched and converted. | Continue polling until the artifact is ready. |
| download_available | The Markdown artifact is ready and the authenticated account can download it now. | Call GET /api/v1/conversions/{tracking_id}/download immediately. |
| completed | The crawl has already been unlocked and completed. Treat it as a downloadable finished conversion. | Call GET /api/v1/conversions/{tracking_id}/download. You do not need to call /start again. |
| error | The seed or single-page conversion hit a processing error before an artifact became available. | Stop polling and retry later with a new conversion if the source issue is resolved. |
Billing and Credits
Credits are spent only when a paid crawl is unlocked
- Creating a conversion starts discovery and does not deduct credits.
- Discovery determines the discovered page count, billable page count, and crawl cost before a paid unlock happens.
- Credits are deducted only when
POST /api/v1/conversions/{tracking_id}/startsucceeds for a paid crawl. - Free seeds, previously unlocked conversions, and single-page conversions can skip the paid start step and move directly toward download availability.
- If crawl access was already unlocked for your account, repeating
POST /api/v1/conversions/{tracking_id}/startreturns success withdeducted_centsset to0. - All
*_centsfields are integer USD cents. Formatted fields such ascrawl_costandcredit_balanceare display-only strings.
| Field | Meaning |
|---|---|
| credit_balance_cents | Integer cent balance for programmatic decisions. Prefer this over formatted display strings in automation. |
| pages.discovered | Total pages currently associated with the seed during or after discovery. |
| pages.billable | Pages that would count toward paid crawl cost for the current account. |
| pages.converted | Pages that have already been converted. The seed page may finish early, so this can be greater than zero before a paid crawl starts. |
| pages.on_hold | Discovered child pages that are waiting for a paid unlock. |
| cost.crawl_cost_cents | Total cost in cents to unlock the current paid crawl. |
| credits.covers_cost | Whether the authenticated account balance is enough to unlock the crawl immediately. |
| page_equivalent_credits | Whole billable pages your current balance can cover, rounded down. |
Conversion Reuse
POST /conversions can reuse an existing canonical seed
SiteToMarkdown first looks for an exact seed match for the submitted URL. If none exists, it can reuse an already known parent or root seed for the same documentation tree instead of creating a brand-new crawl root.
The tracking ID remains scoped to the authenticated account. Repeating POST /api/v1/conversions for the same account and the same canonical seed returns the same tracking ID. A different account can still receive its own tracking ID for that same underlying seed.
A reused conversion may come back as in_progress, ready_to_crawl, download_available, completed, or error depending on the current crawl state and whether your account already has access.
Polling and Rate Limits
Poll conservatively and expect a short WAF block if you burst too hard
SiteToMarkdown is polling-only currently. Webhooks are not available yet, so clients should keep polling GET /api/v1/conversions/{tracking_id} every 5 to 10 seconds while the status is in_progress or crawling.
API traffic is rate limited by the WAF to 10 requests every 10 seconds from the same IP address. Exceeding that limit returns HTTP 429 Too Many Requests and blocks the IP address for 10 seconds before requests are accepted again.
That limit is shared by anything leaving through the same egress IP, so CI workers, serverless jobs, or multiple tenants behind one NAT can affect each other. If a Retry-After header is present on the response, treat it as the authoritative backoff hint.
Design clients to back off immediately when 429 responses start appearing during a burst. The comprehensive example below uses a 5-second polling interval, which stays comfortably below the documented limit for a single active conversion.
Operational Limits
Plan around retention as well as request pacing
Completed Markdown artifacts are retained for 90 days. After expiration, the download endpoint returns 410 artifact_expired. Do not keep polling; submit the seed URL again to create a fresh conversion.
The download endpoint streams the artifact as text/markdown, UTF-8 encoded, with a Content-Disposition attachment header while retained and accessible.
Submitted URLs must be absolute http or https documentation roots that are publicly reachable by SiteToMarkdown.
Crawls stay within the same documentation tree: pages under the submitted path, other paths on the same host, or the host root. Different hosts and subdomains are not crawled.
Private or authenticated documentation is not supported unless it is also publicly reachable to the crawler.
Webhooks are not available. Use polling against GET /api/v1/conversions/{tracking_id} instead.
Crawl scope examples
| Submitted seed | Linked URL | Crawled? | Reason |
|---|---|---|---|
| https://docs.example.com/ | https://docs.example.com/guides/ | Yes | Same host, child path |
| https://docs.example.com/product-a/ | https://docs.example.com/product-b/ | Maybe | Included only if linked and within the same tree |
| https://docs.example.com/ | https://api.example.com/ | No | Different subdomain |
| https://docs.example.com/ | https://github.com/example/repo | No | External domain |
| https://docs.example.com/page | https://docs.example.com/page#section1 and #section2 | One page | URL fragments do not create separate pages |
API reference
Reference
Each endpoint below includes its purpose, required parameters, and working request examples in curl, Python, and TypeScript.
Get credit balance
Returns the authenticated account, current wallet balance, and the approximate number of whole billable pages that balance can cover at the current per-page rate.
curl -H "Authorization: Bearer your_api_key_here" \
https://sitetomarkdown.com/api/v1/balance
Example responses
200 OK
Successful balance lookup.
{
"user": {
"id": 42,
"email": "[email protected]"
},
"credit_balance_cents": 1350,
"credit_balance": "$13.50",
"page_equivalent_credits": 337,
"cost_per_page_cents": 4
}
401 missing_bearer_token
Returned when the Authorization: Bearer <token> header is missing.
{
"error": {
"code": "missing_bearer_token",
"message": "A bearer token is required."
}
}
Create or reuse a conversion
Accepts a seed documentation URL, starts discovery without charging credits, reuses an existing canonical seed when possible, and returns a tracking ID scoped to the authenticated user.
| Parameter | Location | Description |
|---|---|---|
| url | JSON body | Absolute http or https documentation root URL that is publicly reachable by SiteToMarkdown. |
curl -X POST https://sitetomarkdown.com/api/v1/conversions \
-H "Authorization: Bearer your_api_key_here" \
-H "Content-Type: application/json" \
--data '{"url":"https://docs.example.com/"}'
Example responses
202 Accepted
A seed URL was accepted and a tracking record was created or reused. Use tracking_id for all follow-up requests. seed_id is informational.
{
"tracking_id": "trk_a1b2c3d4e5f6",
"seed_id": 1182,
"seed_url": "https://docs.example.com/",
"status": "in_progress",
"message": "Seed accepted. Poll the status endpoint for page count and cost."
}
422 invalid_url
The URL is missing or not a valid absolute URL.
{
"error": {
"code": "invalid_url",
"message": "Provide a valid absolute URL."
}
}
Inspect conversion status
Returns discovery progress, billable page counts, crawl cost, credit coverage, and whether the Markdown artifact is already downloadable.
| Parameter | Location | Description |
|---|---|---|
| tracking_id | Path | Tracking identifier returned when the conversion was created. |
curl -H "Authorization: Bearer your_api_key_here" \
https://sitetomarkdown.com/api/v1/conversions/tracking_id_here
Example responses
200 ready_to_crawl
Discovery is complete and the crawl cost is known, but credits have not been spent yet.
{
"tracking_id": "trk_a1b2c3d4e5f6",
"seed_url": "https://docs.example.com/",
"status": "ready_to_crawl",
"pages": {
"discovered": 18,
"billable": 18,
"converted": 1,
"on_hold": 17
},
"cost": {
"crawl_cost_cents": 72,
"crawl_cost": "$0.72",
"cost_per_page_cents": 4
},
"credits": {
"balance_cents": 35,
"covers_cost": false,
"shortfall_cents": 37
},
"download": {
"available": false,
"url": null
}
}
200 download_available
The crawl is finished and the authenticated account can download the Markdown artifact.
{
"tracking_id": "trk_done12345678",
"seed_url": "https://docs.example.com/",
"status": "download_available",
"pages": {
"discovered": 18,
"billable": 18,
"converted": 18,
"on_hold": 0
},
"cost": {
"crawl_cost_cents": 72,
"crawl_cost": "$0.72",
"cost_per_page_cents": 4
},
"credits": {
"balance_cents": 700,
"covers_cost": true,
"shortfall_cents": 0
},
"download": {
"available": true,
"url": "https://sitetomarkdown.com/api/v1/conversions/trk_done12345678/download"
}
}
200 completed
The conversion is complete and still uses the same response shape as download_available.
{
"tracking_id": "trk_done12345678",
"seed_url": "https://docs.example.com/",
"status": "completed",
"pages": {
"discovered": 18,
"billable": 18,
"converted": 18,
"on_hold": 0
},
"cost": {
"crawl_cost_cents": 72,
"crawl_cost": "$0.72",
"cost_per_page_cents": 4
},
"credits": {
"balance_cents": 700,
"covers_cost": true,
"shortfall_cents": 0
},
"download": {
"available": true,
"url": "https://sitetomarkdown.com/api/v1/conversions/trk_done12345678/download"
}
}
200 error
The conversion reached a terminal failure state before a downloadable artifact was produced.
{
"tracking_id": "trk_failed123456",
"seed_url": "https://docs.example.com/",
"status": "error",
"pages": {
"discovered": 0,
"billable": 0,
"converted": 0,
"on_hold": 0
},
"cost": {
"crawl_cost_cents": 0,
"crawl_cost": "$0.00",
"cost_per_page_cents": 4
},
"credits": {
"balance_cents": 1350,
"covers_cost": true,
"shortfall_cents": 0
},
"download": {
"available": false,
"url": null
}
}
404 tracking_not_found
The tracking ID does not belong to the authenticated account or no longer exists.
{
"error": {
"code": "tracking_not_found",
"message": "No conversion was found for this tracking ID."
}
}
Unlock and start a paid crawl
Deducts credits when required, releases discovered child pages from on_hold, and transitions the crawl into active processing. Repeating the request after access is already unlocked returns success with no extra deduction.
| Parameter | Location | Description |
|---|---|---|
| tracking_id | Path | Tracking identifier for a conversion that is usually ready_to_crawl. If the conversion is already unlocked or does not need a paid crawl, the endpoint can instead return start_not_required. |
curl -X POST \
-H "Authorization: Bearer your_api_key_here" \
https://sitetomarkdown.com/api/v1/conversions/tracking_id_here/start
Example responses
200 crawling
Credits were deducted and on-hold pages were released for crawling.
{
"tracking_id": "trk_a1b2c3d4e5f6",
"status": "crawling",
"message": "Crawl started successfully.",
"credits": {
"deducted_cents": 72,
"remaining_balance_cents": 628
}
}
200 already_unlocked
Repeated start requests are safe after access is already unlocked for the authenticated account.
{
"tracking_id": "trk_done12345678",
"status": "download_available",
"message": "Crawl access was already unlocked and the download is ready.",
"credits": {
"deducted_cents": 0,
"remaining_balance_cents": 700
}
}
409 start_not_required
Free, single-page, or already-downloadable conversions do not require a paid start. Treat this as a recoverable state, then poll or download based on the returned status.
{
"error": {
"code": "start_not_required",
"message": "This conversion does not require a paid start. Poll status or use the download endpoint when available.",
"details": {
"tracking_id": "trk_free12345678",
"status": "download_available"
}
}
}
409 conversion_not_ready
The conversion is still in discovery, so there is nothing to unlock yet.
{
"error": {
"code": "conversion_not_ready",
"message": "This conversion is not ready to start yet.",
"details": {
"tracking_id": "trk_a1b2c3d4e5f6",
"status": "in_progress"
}
}
}
422 insufficient_credits
The authenticated account does not have enough balance to unlock the crawl.
{
"error": {
"code": "insufficient_credits",
"message": "Your wallet balance does not cover this crawl cost.",
"details": {
"tracking_id": "trk_a1b2c3d4e5f6",
"status": "ready_to_crawl",
"credits": {
"balance_cents": 35,
"required_cents": 72,
"shortfall_cents": 37
}
}
}
}
Download the consolidated Markdown
Streams the finished Markdown file as text/markdown, UTF-8 encoded, with a Content-Disposition: attachment header when the artifact exists and the authenticated account has access. Error responses use application/json.
| Parameter | Location | Description |
|---|---|---|
| tracking_id | Path | Tracking identifier for a conversion whose status is download_available or completed. |
curl -H "Authorization: Bearer your_api_key_here" \
https://sitetomarkdown.com/api/v1/conversions/tracking_id_here/download \
-o docs.md
Example responses
200 text/markdown
Successful download responses stream the Markdown file body rather than JSON.
# Example response body
# Product Documentation
## Introduction
This Markdown file is the consolidated crawl artifact.
409 crawl_not_complete
The crawl has not finished yet, so there is no downloadable artifact.
{
"error": {
"code": "crawl_not_complete",
"message": "The markdown download is not ready yet.",
"details": {
"tracking_id": "trk_a1b2c3d4e5f6",
"status": "crawling"
}
}
}
410 artifact_expired
The crawl finished previously, but the retained artifact has expired and must be regenerated.
{
"error": {
"code": "artifact_expired",
"message": "This Markdown artifact is no longer retained. Submit the seed URL again to create a fresh conversion.",
"details": {
"tracking_id": "trk_done12345678",
"retention_days": 90
}
}
}
422 access_not_purchased
The crawl exists, but the authenticated account has not unlocked download access.
{
"error": {
"code": "access_not_purchased",
"message": "This crawl exists, but your account has not unlocked download access yet.",
"details": {
"tracking_id": "trk_a1b2c3d4e5f6",
"status": "ready_to_crawl"
}
}
}
Error Handling
All API errors use the same JSON envelope
Most errors return a machine-readable code, a human-readable message, and optional details you can log or branch on in client code.
{
"error": {
"code": "machine_readable_code",
"message": "Human-readable explanation.",
"details": {}
}
}
The details object is present only when the endpoint has extra structured context to return. Successful download responses are the exception: they stream text/markdown instead of JSON. WAF rate-limit responses still use HTTP 429, but they are enforced upstream and should not be assumed to match this JSON envelope.
Client code should branch on error.code rather than only the HTTP status. For example, 409 start_not_required is usually recoverable, while 409 conversion_not_ready means you should keep polling.
| HTTP | Code | Meaning | Suggested handling |
|---|---|---|---|
| 401 | missing_bearer_token | The request did not include an Authorization: Bearer <token> header. |
Send your API key in the Authorization header and retry. |
| 401 | invalid_api_key | The bearer token was present but did not match an active API key. | Rotate or correct the API key before retrying. |
| 404 | tracking_not_found | The tracking ID does not belong to the authenticated account or no longer exists. | Check that you are using the right tracking ID under the right account. |
| 409 | conversion_not_ready | The conversion is still in discovery and cannot be started yet. | Keep polling the status endpoint until it reaches ready_to_crawl, download_available, or completed. |
| 409 | start_not_required | The conversion is free, single-page, already downloadable, or otherwise does not need a paid start. | Poll status or call the download endpoint when the artifact is ready. |
| 409 | crawl_not_complete | The artifact is not ready yet. | Keep polling the status endpoint instead of downloading. |
| 410 | artifact_expired | The crawl finished, but the retained download artifact has expired. | Do not keep polling this tracking ID for download readiness. Submit the seed URL again to create a fresh conversion. |
| 422 | missing_url | The JSON body did not include a url value. | Send a JSON object with an absolute url field. |
| 422 | invalid_url | The supplied URL was not a valid absolute URL. | Validate the URL before creating the conversion. |
| 422 | insufficient_credits | The account balance is too low to unlock the crawl. | Add credits or choose a smaller/free conversion before retrying /start. |
| 422 | access_not_purchased | The crawl exists, but the authenticated account has not unlocked download access yet. | Call POST /api/v1/conversions/{tracking_id}/start first if the conversion is paid and ready_to_crawl. |
| 429 | too_many_requests | The WAF rate limit was exceeded for your source IP. | Back off immediately, honor Retry-After if it is present, and remember that multiple clients behind the same egress IP share the same limit. |
| 500 | start_failed | The service could not unlock the crawl right now. | Poll status before retrying. If the conversion already moved to crawling, download_available, or completed, do not issue another /start call. Repeated /start calls after unlock are safe and return deducted_cents: 0; credits are only deducted on the first successful unlock for the same conversion. |
Comprehensive Example
Use the full lifecycle script when you need a production-ready flow
These examples include balance checking, status polling, and the logic needed to wait for discovery before starting a paid crawl.
The TypeScript example assumes a server-side runtime with global fetch, such as Node.js 18+, Deno, Bun, or an edge function. Never use API keys in client-side browser code.
#!/usr/bin/env bash
set -euo pipefail
API_KEY="your_api_key_here"
BASE_URL="https://sitetomarkdown.com/api/v1"
SEED_URL="https://docs.example.com/"
# Step 1: check your available balance before creating a crawl.
curl -s \
-H "Authorization: Bearer $API_KEY" \
"$BASE_URL/balance"
# Step 2: create a conversion and capture the returned tracking ID.
create_response=$(curl -s \
-X POST "$BASE_URL/conversions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
--data "{\"url\":\"$SEED_URL\"}")
tracking_id=$(printf '%s' "$create_response" | jq -r '.tracking_id')
started=false
echo "Tracking ID: $tracking_id"
# Step 3: poll until the conversion is ready to download or ready to start.
while true; do
status_response=$(curl -s \
-H "Authorization: Bearer $API_KEY" \
"$BASE_URL/conversions/$tracking_id")
status=$(printf '%s' "$status_response" | jq -r '.status')
echo "Current status: $status"
if [[ "$status" == "ready_to_crawl" && "$started" == false ]]; then
# Step 4: start the paid crawl once discovery is complete.
start_response=$(curl -s \
-X POST \
-H "Authorization: Bearer $API_KEY" \
"$BASE_URL/conversions/$tracking_id/start")
start_code=$(printf '%s' "$start_response" | jq -r '.error.code // empty')
if [[ "$start_code" == "start_not_required" ]]; then
echo "Start not required; continuing to poll for download availability."
else
echo "$start_response"
started=true
fi
elif [[ "$status" == "download_available" || "$status" == "completed" ]]; then
break
elif [[ "$status" == "error" ]]; then
echo "Conversion failed before a downloadable artifact was produced." >&2
exit 1
fi
sleep 5
done
# Step 5: download the consolidated Markdown artifact.
curl -H "Authorization: Bearer $API_KEY" \
"$BASE_URL/conversions/$tracking_id/download" \
-o docs.md