ghost-scraper

ghost-scraper MCP server extracts structured content from Ghost CMS sites, including posts, pages, authors, and tags. It handles pagination and metadata retrieval via API calls. Data analysts and developers use it for content archiving, migration, or dataset building from Ghost blogs.

ghost-cms
web-scraping
data-extraction
|

Overview

ghost-scraper is an MCP server for scraping data from Ghost publishing platforms. Ghost is an open-source headless CMS used for blogs and newsletters. This server enables programmatic access to site content, converting web pages into structured JSON data without browser automation.

Key Capabilities

  • Post and page extraction: Retrieves titles, bodies, excerpts, feature images, and publish dates from individual or listed content.
  • Author and tag data: Pulls user profiles, bios, and categorization metadata.
  • Site navigation: Follows sitemaps, RSS feeds, or paginated endpoints to crawl full collections.

Specific tools are not enumerated, but the server exposes scraping endpoints tailored to Ghost's API-like structure and frontend renders.

Use Cases

  1. Content migration: Scrape all posts from a Ghost site (extract_posts) to import into another CMS like WordPress.

  2. Dataset creation: Collect articles, tags, and authors from multiple Ghost blogs for training NLP models on publishing content.

  3. Archiving discontinued sites: Pull full histories including metadata to preserve blogs before shutdown.

  4. Competitive analysis: Extract publish frequency and topics from competitor Ghost newsletters.

Who This Is For

Data analysts building corpora from online publications, developers automating content pipelines, researchers studying blog trends, and site admins migrating Ghost instances. Requires basic API integration knowledge.

PlaygroundUpdated Apr 8, 2026