Compiler Passes

Overview

After the frontend parses markdown files into an intermediate representation (IR), SchemaFlux runs a pipeline of 12 ordered passes. Each pass transforms or enriches the IR. Passes may add data but must never remove data set by a previous pass.

Passes implement the Pass interface:

type Pass interface {
    Name() string
    Run(p *ir.Program) error
}

Passes can optionally declare dependencies by implementing DependencyAware:

type DependencyAware interface {
    Requires() []string
}

The registry validates ordering before execution begins -- if a pass's dependencies haven't run, the build fails.

Pass Order

The passes execute in this fixed order:

# Pass Description
1 SlugResolution Generate URL-safe slugs from filenames or field values
2 Favorites Resolve favorited entities from a JSON file
3 Sort Order entities by configurable field and direction
4 Enrichment Load external enrichment data from a JSON cache
5 Affiliate Generate affiliate shopping links for configured providers
6 Taxonomy Group entities by taxonomy fields, compute entries and letter groups
7 RelatedEntities Score and rank related entities by shared taxonomy overlap
8 GraphEnrichment Compute graph-level metadata (connections, clusters)
9 ContentAnalysis Analyze body content: word count, reading time, TOC extraction
10 URLResolution Compute canonical URLs, breadcrumbs, and relative paths
11 Schema Generate JSON-LD structured data and Open Graph metadata
12 Validation Validate required fields, field types, and enum constraints

Pass Details

1. SlugResolution

Generates a URL-safe slug for each entity. The slug source is configured by data.entity_slug.source:

  • "filename" -- derive the slug from the markdown filename (default)
  • "field:<name>" -- derive from a frontmatter field

2. Favorites

Reads a JSON file specified by extra.favorites and marks matching entities as favorites. These appear prominently on the homepage.

3. Sort

Sorts entities by a frontmatter field. Configured by sort.field and sort.order (asc/desc). Default sort is alphabetical by title.

4. Enrichment

Loads cached enrichment data from JSON files in the enrichment cache directory. This allows pre-computed data (e.g., AI-generated descriptions, nutritional info) to be merged into entity metadata without re-processing.

5. Affiliate

Generates affiliate links for configured providers. Scans entity fields at configured JSON paths (e.g., ingredients[].searchTerm) and builds URLs using provider templates with tag substitution from environment variables.

6. Taxonomy

The core grouping pass. For each configured taxonomy:

  • Collects entity field values
  • Groups entities into taxonomy entries
  • Filters entries below min_entities
  • Sorts entries alphabetically
  • Computes letter groups for A-Z navigation
  • Builds valid slug maps for fast lookup

7. RelatedEntities

When related_entities.enabled is true, this pass computes related entities for each entity by scoring shared taxonomy membership. Entities that share more taxonomy values rank higher. The top N (default 3) are stored on the IR.

8. GraphEnrichment

Computes graph-level metadata: entity connections through shared taxonomy values, hierarchical relationships (domain/subdomain), and cluster membership. This data powers architecture visualizations on the homepage.

9. ContentAnalysis

Analyzes entity body content:

  • Word count
  • Reading time estimate (200 WPM)
  • Table of contents extraction from headings
  • Source code extraction (for code entity types)

10. URLResolution

Computes canonical URLs, relative URLs, and breadcrumb navigation for each entity. URLs are based on the entity slug and the site's base URL.

11. Schema

Generates JSON-LD structured data and Open Graph metadata for each entity. Uses the configured structured_data.field_mappings to map frontmatter fields to Schema.org properties. Also generates share image SVGs.

12. Validation

The final pass validates all entities against the declared field schema:

  • Required fields must be present
  • Field values must match declared types
  • Enum fields must use allowed values
  • Diagnostics are emitted for validation failures

IR Immutability

After all passes complete, the IR is considered frozen. The HTML backend reads the IR to generate output but never modifies it. This separation ensures backends can be swapped without affecting the transformation logic.

Timing

Each pass reports its execution duration. The compiler logs these timings at the end of a build, making it easy to identify performance bottlenecks.