Skip to content

Migrate from Paligo + Webpack to Docusaurus 3#3

Open
matenadasdi wants to merge 3 commits intomainfrom
claude/festive-shamir-31a238
Open

Migrate from Paligo + Webpack to Docusaurus 3#3
matenadasdi wants to merge 3 commits intomainfrom
claude/festive-shamir-31a238

Conversation

@matenadasdi
Copy link
Copy Markdown
Collaborator

Summary

Replaces the custom Paligo CMS + Webpack pipeline with a Docusaurus 3 site backed by an XML → Markdown/MDX migration. Preserves every existing live URL (444/444), ports all integrations (OneTrust, GTM, Intercom, Google Search Widget), and matches the live site's visual identity and navigation.

What's in the box

Migration pipeline (migration/)

  • convert.py — Paligo XML export → Markdown/MDX. Walks the <e:structure> tree, resolves linked components by origin UUID, converts DocBook bodies (sections, admonitions, tables, procedures, code blocks, lists, cross-references). Detects topichead nodes (non-clickable categories) and linktype="import" subtrees (skipped — content is reused via xi:include).
  • build_url_map.py — extracts live page titles from out/en/*.html to drive slug assignment.
  • apply_slugs_v2.py — section-aware slug matcher; only applies a slug when the matched URL lives in the same top-level section, preventing cross-section misroutes for common titles like "Getting started".

Docusaurus features wired up

  • Tabs — XML <para role="tabs"> + <procedure role="tab-content"> patterns become MDX <Tabs> / <TabItem> components.
  • Glossary — rendered as an HTML <dl> with anchor IDs. Inline glossterm references become <GlossTerm baseform="..."> components with a hover tooltip that click through to the glossary anchor.
  • Reusable content fragments via MDX partials — every xi:include target gets one file at src/partials/<readable-slug>.mdx (e.g. opening-the-workflow-editor.mdx). Section-context references render the partial as a JSX component; list-context references use a placeholder line that the Docusaurus markdown.preprocessor expands inline before MDX parsing — keeps step numbering continuous while preserving a single source of truth.
  • Topichead categories — Paligo's non-clickable navigation labels become _category_.json entries with link: null, so they expand children without navigating.
  • Sidebar labels driven by the live HTML toc (migration/live_nav_labels.json) so navigation reads identically to the published site — e.g. "Insights for the Build Cache" instead of just "Insights".
  • Image references rewritten to /img/_paligo/uuid-<hex>.<ext> using the Paligo HTML export as the canonical image store (665 files), since the XML's src/remap attrs often pointed to legacy filenames.
  • Portal landing page (src/pages/index.tsx) rebuilt as a React component with the six product cards, Bitrise brand styling, and search input wired for the Google Search Widget.

Integrations

  • OneTrust cookie consent banner (production only) via headTags
  • GTM via customFields.gtmId (env var)
  • Google Search Widget via customFields.genSearchWidgetConfigId (env var)
  • Intercom app id wired via customFields.intercomAppId

Why this approach

  • Single source of truth for reusable content. The 688 unique reused chunks live in src/partials/ once. Editing the partial cascades to every consumer on next build — no more hunting through dozens of duplicated copies.
  • Build-time list expansion sidesteps the fundamental MDX/Markdown limitation that a <Component /> inside a numbered list renders as a block and breaks step numbering. The on-disk consumer file has 1. <Partial_X />; the build splices in the partial's actual list items before MDX parses.
  • Section-aware slug matcher prevents the all-"Getting started"-pages-collide problem that the original simple title-match had.

Reviewer notes

  • The legacy stack (paligo.js, webpack.config.js, middleware.js, build.js, src/html/, src/js/, etc.) is left in tree on purpose. Cutover happens in a separate PR after this one is verified end-to-end.
  • migration/docs/ is in .gitignore — it's the Python script's intermediate output, copied into docs/ by the bash pipeline.
  • static/img/_paligo/ holds the 665 Paligo-exported images. Largest part of the diff in raw line count.
  • src/partials/ has 592 generated MDX partial files. Each has a readable filename derived from the resource title (e.g. opening-the-workflow-editor.mdx).
  • The Docusaurus markdown.preprocessor in docusaurus.config.ts does several things in one pass: list-context partial expansion, JSX-tag escaping for placeholder text like <Git provider> or <connected-app-id>, and {kebab-case} placeholder escaping. Each step has an inline comment explaining the trigger pattern.

Test plan

  • Run npm install && npx docusaurus build — should complete with zero broken-link errors
  • Confirm URL coverage: 444 of the live pages resolve
  • Spot-check the portal page (/) — six product cards render, search input is present, OneTrust banner shows in production builds
  • Spot-check a topichead category in the sidebar (e.g. "Getting started" under Bitrise Platform) — should expand/collapse on click without navigating
  • Spot-check a list-context partial (/en/bitrise-ci/workflows-and-pipelines/workflows/creating-a-workflow.html) — 7 steps render in continuous numbering, first 2 from "Opening the Workflow Editor"
  • Spot-check a glossary inline term — hovering shows the popover, clicking jumps to the glossary anchor
  • Edit src/partials/opening-the-workflow-editor.mdx, rebuild, confirm every consumer reflects the change

🤖 Generated with Claude Code

matenadasdi and others added 3 commits May 6, 2026 18:01
Replaces the custom Paligo CMS + Webpack pipeline with a Docusaurus 3 + MDX site
backed by an XML→Markdown migration script. Preserves every existing live URL
(444/444 coverage), the integrations layer (OneTrust, GTM, Intercom, Search
Widget), and the Bitrise visual identity.

Key components

- migration/convert.py — Paligo XML export → Markdown/MDX. Walks the structure
  tree, pulls content from canonical resources via UUID origin lookup,
  renders DocBook to Markdown (admonitions, tables, procedures, code blocks,
  cross-references). Detects topichead nodes (non-clickable categories) and
  linktype="import" subtrees (skipped — content is reused via xi:include).
- migration/build_url_map.py — extracts live page titles from the Paligo HTML
  output to drive slug assignment.
- migration/apply_slugs_v2.py — section-aware slug matcher; only applies a
  slug when the matched URL lives in the same top-level section, preventing
  cross-section misroutes for common titles like "Getting started".

Docusaurus features wired up

- Tabs / TabItem from XML <para role="tabs"> + <procedure role="tab-content">
  patterns, emitted as MDX components.
- Glossary as an HTML <dl> with anchor IDs; inline glossterms become
  <GlossTerm baseform="..."> components that show a hover tooltip and click
  through to the glossary anchor.
- Reusable content fragments via MDX partials. Every xi:include target gets
  one file at src/partials/<readable-slug>.mdx (e.g. opening-the-workflow-
  editor.mdx). Section-context references render the partial as a JSX
  component; list-context references use a placeholder line that the
  Docusaurus markdown.preprocessor expands inline before MDX parsing — keeps
  step numbering continuous while preserving a single source of truth.
- Topichead components in the Paligo structure render as toggle-only sidebar
  categories (link: null) so they expand children without navigating, matching
  the live site behavior.
- Sidebar labels driven by migration/live_nav_labels.json extracted from the
  Paligo HTML toc, so navigation matches the published site (e.g. "Insights
  for the Build Cache" instead of just "Insights").
- Sidebar wiring: every category with an index.md gets link.type=doc; topichead
  categories get link: null.
- Image references rewritten to /img/_paligo/uuid-<hex>.<ext> using the Paligo
  HTML export as the canonical image store (665 files), since the XML's src/
  remap attributes often pointed to legacy filenames that don't exist on disk.
- Portal landing page (src/pages/index.tsx) rebuilt as a React component with
  six product cards, Bitrise brand styling, and the search input wired for the
  Google Search Widget.

Build pipeline

- docusaurus.config.ts handles markdown.preprocessor for: list-context partial
  expansion, JSX-tag escaping for placeholder text like <Git provider> or
  <connected-app-id>, and the kebab-case {placeholder} pattern that MDX would
  otherwise read as a JSX expression.
- format: 'detect' so .mdx files are parsed as MDX and .md files stay plain.
- Site builds clean (zero broken-link errors) at 444/444 URL coverage.

Removes nothing — the old paligo.js / webpack.config.js / middleware.js stay
in tree until cutover is approved.
Step-by-step instructions covering Node.js install, repo clone, npm install,
dev server, production build, and troubleshooting — written for someone with
no prior Docusaurus or Node.js knowledge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single source of truth for anyone editing the docs — humans or LLMs. Ports
the Bitrise Style Guide (voice, capitalization, terminology, lists, titles,
punctuation) from Confluence into actionable rules with side-by-side examples.

Adds the codebase-specific mechanics that arent on Confluence: frontmatter,
.md vs .mdx, partial imports and the build-time list-context expansion,
GlossTerm usage, image conventions, redirect rules, and the JSX-escape
pitfalls the migration surfaced.

Claude Code reads CLAUDE.md automatically every session, so well-written
guidance here becomes both a human reference and an LLM context primer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant