Ghost Upload Content Manipulation Playbook
This document describes how content is transformed when syncing from the Quarto site to Ghost.
Processing Pipeline Overview
Quarto HTML → Extract → Pre-process → Parse → Transform → Clean → Upload
1. Content Extraction (extract_article_content)
- Reads local HTML file from
public/ghost-content/posts/{slug}/index.html - Extracts OpenGraph metadata (og:image, og:description, etc.)
- Detects special content types before parsing:
- Observable JS (
has_observable_js) - Plotly charts (
has_plotly) - Code annotations (
has_code_annotations) - Code folds (
has_code_folds) - LaTeX math (
has_math)
- Observable JS (
2. Pre-Processing (Before HTML Parsing)
Order matters - these run before the HTML is parsed into a DOM:
| Function | Purpose | Why Pre-parse? |
|---|---|---|
embed_plotly_charts |
Extract Plotly data and embed as HTML cards | Scripts would be lost during parsing |
embed_ojs_cells |
Wrap OJS placeholder divs | Preserve empty divs Ghost would strip |
embed_videos |
Wrap <video> tags, add autoplay/loop |
Ghost strips video tags |
embed_code_folds |
Wrap <details class="code-fold"> |
Ghost strips details/summary |
embed_cell_outputs |
Wrap <div class="cell-output"> |
Preserve classes for CSS styling |
3. Post-Processing (After HTML Parsing)
- Extract content using selectors:
#quarto-content main→body style_callout_blocks- Convert Quarto callouts to Ghost’skg-callout-cardformatprocess_code_annotations- Transform annotated code into custom HTML with markers
4. Content Cleaning (clean_content)
| Transformation | Regex/Method | Purpose |
|---|---|---|
| Inline SVGs → Data URI | RE_SVG + convert_inline_svgs |
Ghost strips inline SVG |
| Large SVGs → Placeholder | Size > 500KB | Upload separately to Ghost |
| Relative img URLs → Absolute | RE_IMG_SRC |
Fix paths for Ghost |
| Relative asset hrefs → Absolute | RE_HREF |
Fix asset links |
| Remove Quarto attributes | RE_QUARTO_ATTR ({#id}, {.class}) |
Clean up Quarto syntax |
| Remove fig-alt markers | RE_FIG_ALT ({fig-alt="..."}) |
Clean up |
| Remove empty paragraphs | RE_EMPTY_P (<p>\s*</p>) |
Clean up |
| Ensure img alt text | ensure_image_alt_text |
Accessibility |
Code Injection System
Per-post code is injected into codeinjection_head based on content detection:
| Condition | Injection Constant | Purpose |
|---|---|---|
has_code_blocks |
PRISM_CODE_INJECTION |
Syntax highlighting + language detection |
has_plotly |
PLOTLY_CODE_INJECTION |
Plotly.js CDN |
has_annotations |
ANNOTATION_CODE_INJECTION |
CSS + click-to-highlight JS |
has_code_folds |
CODE_FOLD_CSS_INJECTION |
Toggle styling + dark mode |
has_math |
MATHJAX_CODE_INJECTION |
MathJax 3 for LaTeX rendering |
has_ojs |
Custom OJS runtime | Links to quarto-ojs runtime from source site |
Ghost HTML Card Pattern
Ghost strips many HTML elements. The workaround is wrapping in “HTML cards”:
<!--kg-card-begin: html-->
<preserved-content>
<!--kg-card-end: html-->Used for: videos, code-folds, cell-outputs, OJS cells, Plotly charts.
Image Handling
Small SVGs (< 500KB)
- Optimize (
optimize_svg): remove comments, metadata, reduce decimal precision - Base64 encode
- Embed as
<img src="data:image/svg+xml;base64,...">
Large SVGs (> 500KB)
- Insert placeholder:
__LARGE_SVG_PLACEHOLDER_N__ - Upload to Ghost via API (
upload_svg_to_ghost) - Replace placeholder with Ghost image URL
Feature Images
Priority order: 1. og:image from HTML 2. media:content from RSS 3. Local preview image (preview.png, banner.jpg, *-preview.*)
Mobiledoc Format
Ghost uses mobiledoc internally. The tool generates:
{
"version": "0.3.1",
"markups": [],
"atoms": [],
"cards": [["html", {"html": "..."}]],
"sections": [[10, 0]]
}For posts with uploaded images, native image cards are interspersed with HTML cards.
Known Limitations / Missing Features
Not Currently Handled
Tables - Ghost may strip complex tables; may need HTML card wrapping
Footnotes - Quarto footnote markup may not survive Ghost processing
Adding New Content Type Support
Pattern to follow:
- Detection: Add
has_<feature>(content: &str) -> boolfunction - Regex: Add
static RE_<FEATURE>: LazyLock<Regex>if needed - Embedding: Add
embed_<feature>(content: &str) -> Stringif Ghost strips it - Code Injection: Add
const <FEATURE>_CODE_INJECTION: &strfor CSS/JS - Wire up in
extract_article_contentandcreate_post_from_entry
Important Regex Safety Notes
Quarto Attribute Regex
The RE_QUARTO_ATTR regex must require # or . prefix to avoid stripping LaTeX math:
// CORRECT: Only matches {#id} or {.class}
r"\{[#.][a-zA-Z][^}]*\}"
// WRONG: Would strip LaTeX like {aligned}, {Moon}, {Earth}
r"\{[#.]?[a-zA-Z][^}]*\}"Code Annotation Regex
The outer_cell_regex must require data-code-annotations attribute to avoid matching across sections:
// CORRECT: Only matches cells with data-code-annotations
r#"<div[^>]*class="[^"]*cell[^"]*"[^>]*data-code-annotations="[^"]*"[^>]*>..."#
// WRONG: Matches any cell div, may capture content between cells
r#"<div[^>]*class="[^"]*cell[^"]*"[^>]*>..."#Debugging Tips
- Run with
--verbosefor detailed logging - Run with
--dry-runto see what would happen - Check Ghost Admin → Post → Code injection to see injected CSS/JS
- View page source on Ghost to verify HTML cards preserved content
- If math content disappears, check regex patterns aren’t matching LaTeX brace syntax