Turn Any Website into a Single LLM-Ready Markdown File

Paste a website URL, sitemap, or feed URL. SiteToMarkdown follows docs paths, XML sitemaps, and RSS or Atom feeds, strips out the noise, and gives you one clean Markdown file for NotebookLM, ChatGPT, GitHub Copilot, and RAG pipelines.

  • Flexible discovery – Start from docs paths, sitemap indexes, RSS, Atom, or JSON feeds
  • Clean Markdown – No HTML, nav, or ads
  • Works on tricky sites – Handles JS & bot-blocking
No subscriptions. Pay only when you convert a site.

Convert a Site to Markdown

Login with Google in seconds. No password required.

Works great with

Stop feeding your LLM a single page. Start from a docs path, sitemap, or feed and give it one structured source.

Why Pasting URLs into LLMs Doesn't Work Well

LLMs only see a slice

Pasting one URL into ChatGPT or NotebookLM gives shallow coverage. Most docs live across dozens or hundreds of pages.

Bots get blocked

Many docs sites block generic bots or rely on heavy JavaScript. Your AI tool never sees the real content.

Manual copy-paste fails

Adding pages to NotebookLM or RAG context one by one is slow, error-prone, and hard to keep updated.

SiteToMarkdown can start from a docs path, XML sitemap, RSS feed, Atom feed, or JSON feed, then collect the linked content into one Markdown file.

From Docs URL, Sitemap, or Feed to LLM Source in Under 5 Minutes

1

Paste a docs URL, sitemap, or feed

Enter https://yourdomain.com/docs/, /sitemap.xml, or an RSS or Atom feed URL.

2

We crawl & clean

We discover linked pages, render JS where needed, bypass blocking, and extract only the main content.

3

Download Markdown

Upload it to NotebookLM, ChatGPT, Copilot, or your RAG app as a single source.

Diagram showing how SiteToMarkdown crawls a docs site and converts it into a single Markdown file

Simple Pay-As-You-Go Pricing

No subscriptions. Just pay for what you convert.

Already converted sites are delivered instantly with a 25% discount.
$5
Up to 50 pages
$10
Up to 200 pages
$25
Up to 1,000 pages
  • Docs paths, sitemap, RSS, Atom, and JSON feed support
  • Instant download if cached, with a 25% discount
  • JS rendering
  • Advanced bot bypass
  • Clean Markdown: No ads, navigation, or irrelevant content

Try It Free with Popular Docs Sites

Tailwind CSS Docs

186 pages converted to a single markdown file

View & Download →

OpenAI API Overview

55 pages converted to a single markdown file

View & Download →

X (Twitter) API

12 pages converted to a single markdown file

View & Download →

Frequently Asked Questions

What exactly does SiteToMarkdown do?

SiteToMarkdown converts an entire docs-style site into one clean, LLM-ready Markdown file. Start from a docs path, sitemap, or feed URL, and we discover linked pages, extract the main content, remove noise, and merge everything into a single structured export.

Does it only work with NotebookLM?

No. The output is standard Markdown, so it works with Google NotebookLM, ChatGPT and other file-based LLM workflows, GitHub Copilot (including Agents), and RAG pipelines.

What types of URLs work best?

Documentation sites, help centers, knowledge bases, XML sitemaps, and feed URLs work best. You can start from a docs root like /docs/ or /help/, a sitemap such as /sitemap.xml, or an RSS, Atom, or JSON feed URL.

Do sitemaps and feeds work?

Yes. SiteToMarkdown supports XML sitemaps and common feed formats including RSS, Atom, and JSON Feed. If feed entries point to pages on the same site, we can use them as the crawl source automatically.

How do you discover pages from a docs URL?

You can start from a docs path, sitemap, or feed. From there, SiteToMarkdown follows the docs structure to collect the relevant linked pages and turn them into one cohesive Markdown file.

How do you handle JS-rendered docs and tricky sites?

We use headless browsers to render JavaScript before extraction, which helps on Single Page Applications (SPAs) like React or Vue. We also handle many bot-blocking and “tricky” docs sites so you get the real content, not an empty shell.

Will the Markdown include nav, headers, footers, or ads?

The goal is clean, readable Markdown focused on the main page content—not navigation, site chrome, or ads. Exact results can vary by site, but the output is designed for LLM context (dense and low-noise), not as an HTML dump.

Why is Markdown better than pasting URLs or ingesting raw HTML for RAG?

Markdown preserves structure (headings), stays token-efficient (no <div> soup), and keeps code blocks intact. That makes it easier to split into chunks, embed, and retrieve accurately in RAG pipelines—and more reliable than pasting one URL at a time into an LLM.

How do I use it with GitHub Copilot Agents?

Convert a docs site (e.g., Tailwind, Stripe, Next.js, OpenAI, or your own API), then commit the Markdown file into your repo (often under docs/). In Copilot Chat, reference it with #-mentions (and combine with @workspace / #codebase) so Copilot Agents use the latest docs as high-quality context.

What if NotebookLM can’t import my site due to domain restrictions?

If NotebookLM rejects a webpage or even a feed URL (e.g., “Unable to import this webpage due to domain restrictions”), SiteToMarkdown can often still fetch the content and deliver it as a single Markdown file you can upload as one source.

How does pricing work?

No subscriptions—pay only when you convert a site. Pricing tiers are based on page count (up to 50, 200, or 1,000 pages). If a site has already been converted and is cached, it can be delivered instantly with a discount.

Can I try it before converting my own docs?

Yes. You can download free pre-converted Markdown exports on Free Downloads (for example: Tailwind CSS docs, OpenAI API overview, and X API docs) to test your NotebookLM, ChatGPT, Copilot, or RAG workflow.

Do I need to install anything?

No. SiteToMarkdown is entirely web-based—you paste a URL, convert, and download the final Markdown file.

How do I keep the Markdown up to date when docs change?

Re-run the same docs URL, sitemap, or feed in SiteToMarkdown, then replace the older Markdown file in your tool or repo. This is a simple way to keep NotebookLM sources, Copilot context files, and RAG datasets aligned with the latest docs.