URL Matching

URL matching is how PageBridge connects Google Search Console URLs to Sanity documents. Understanding this process is key to getting accurate data in Studio.

How it works

The URLMatcher takes a list of URLs from GSC and attempts to match each one to a Sanity document:

  1. Fetch documents per content type — for each configured content type, queries Sanity documents and extracts their slug values
  2. Normalize URLs — strips www., query strings, and fragments from GSC URLs
  3. Try each content type — for each URL, tries to extract a slug using that content type's configuredpathPrefix
  4. Exact match — checks if any Sanity document of that type has a matching slug
  5. Normalized match — tries with trailing slash variations
  6. Store diagnostics — if no match is found across all content types, stores the reason and similar slugs for debugging

Configuration

URL matching behavior is controlled by the urlConfigs array in the gscSite document in Sanity. Each entry maps a content type to a specific URL structure:

// URLMatcher configuration (multiple content types with different paths)
{
  urlConfigs: [
    {
      contentType: 'post',
      slugField: 'slug',
      pathPrefix: '/blog/articles'     // Blog posts under /blog/articles
    },
    {
      contentType: 'page',
      slugField: 'slug'
      // No pathPrefix = root-level URLs like /about, /contact
    },
    {
      contentType: 'legalPage',
      slugField: 'slug',
      pathPrefix: '/legal'              // Legal content under /legal
    }
  ],
  baseUrl: 'https://example.com'        // Site base URL
}
typescript

Path prefix per content type

Each content type can have its own pathPrefix. For example, if blog posts are at https://example.com/blog/articles/my-post, set pathPrefix: '/blog/articles'. PageBridge will strip the base URL and prefix to extract my-post, then look for a matching document of that content type.

To match root-level URLs (like https://example.com/about), leave pathPrefix empty or omit it entirely.

Slug field

PageBridge expects the slug field to be a Sanity slug type with a current property. The field name defaults to slug but can be customized per site.

Match confidence levels

ConfidenceDescription
exactSlug extracted from URL exactly matches a Sanity document slug
normalizedMatch found after normalizing trailing slashes or casing
fuzzyMatch found via Levenshtein distance (close but not exact)
noneNo matching document found

Why URLs don't match

  • outside_path_prefix — the URL is not under the configured prefix (e.g., homepage, category pages, tag pages)
  • no_slug_extracted — could not parse a slug from the URL path
  • no_matching_document — slug was extracted but no Sanity document has that slug. Check for typos, deleted content, or draft-only documents.

Similar slug suggestions

For unmatched URLs, PageBridge calculates Levenshtein distance against all available slugs and suggests close matches. This helps identify typos or slight differences (e.g., my-postt vs my-post).

Debugging matches

Use the diagnose command to inspect unmatched URLs:

# See all unmatched URLs
pagebridge diagnose --site sc-domain:example.com

# Diagnose a specific URL during sync
pagebridge sync --site sc-domain:example.com --diagnose-url "https://example.com/blog/my-post"
bash

See also