SKOS Mapping Decisions

GRCSchema crosswalks compliance vocabularies (DISA, NIST, RegGenome, …) through the canonical Shared vocabulary using five W3C-standard SKOS predicates: exactMatch, closeMatch, broadMatch, narrowMatch, and relatedMatch. Every mapping carries one of these predicates, and picking the right one matters: consumers reason about your data based on these labels, and auditors reason about your reasoning.

This page is the working summary of GRCSchema's published mapping framework: fourteen numbered decision rules and six anti-patterns, research-grounded and version-controlled, written up in Cougias, Auditable Compliance Crosswalks: A Rules-Based Approach to SKOS Mapping in the Era of LLM-Based Ontology Matching (2026), doi:10.13140/RG.2.2.15866.45768. That paper is the source of truth for how GRCSchema handles SKOS. The rules below cite their numbers so a challenged decision can point at the rule it applied.

The five predicates

exactMatch (R1)

Two terms are semantically interchangeable: identical extension and intension. A consumer may swap one for the other in any reasoning chain without changing the conclusion.

Example: shared:created ↔ dcterms:created. Both mean "the timestamp at which this artifact was created." Identical semantics, identical type, no meaningful gap.

exactMatch is the strongest claim in the system, and the most dangerous, because it is symmetric and transitive: assert A exactMatch B and B exactMatch C, and every conforming reasoner silently concludes A exactMatch C, whether or not any human ever examined that pair. Overclaims cascade.

So it carries two guards. R10: authoring an exactMatch requires the mapping's notes field to describe at least one reasoning case where substitution preserves truth. R8: the predicate is validated against a semantic-overlap score, and an exactMatch whose descriptions don't overlap strongly enough gets flagged and downgraded. Optimistic equivalence is the single most common crosswalk failure (anti-pattern A1), and when you're genuinely torn between exactMatch and closeMatch, the answer is closeMatch.

closeMatch (R2)

The two terms overlap substantially but differ in scope or granularity: interchangeable in some contexts, not all.

Example: cklb:id (DISA's integer checklist id) closeMatch shared:identifier. Both identify a record. They differ in datatype (integer vs URL) and addressability (local vs IRI), so they can't be swapped freely. For the practical purpose of "look this up," they're close enough.

One warning, straight from anti-pattern A3: closeMatch has formal semantics. It is not a polite way to say "I'm not sure." The distinction matters: doubt between exactMatch and closeMatch resolves to closeMatch, the weaker of the two (R11). But doubt about whether any meaningful relation exists at all resolves to relatedMatch or to no mapping, never to a closeMatch you can't defend.

broadMatch / narrowMatch (R3, R4)

The source term is hierarchically narrower than (broadMatch) or broader than (narrowMatch) the target. A specific implementation of a general concept maps narrowMatch in one direction and broadMatch in the other.

Example: cklb:groupId broadMatch shared:identifier – the CKLB groupId IS an identifier, but identifiers in the Shared vocabulary cover a wider scope.

These predicates are direction-sensitive, and inverting them is a known failure mode (anti-pattern A2). Hierarchical predicates also require structural evidence – shared lexical stems plus scope containment (R9) – not just a similarity score. Authoring tools accept either direction. The JSON-LD output flips automatically.

relatedMatch (R5)

The two terms are associatively linked: neither hierarchical nor equivalent.

Example: cklb:hostname relatedMatch shared:asset. A hostname names the asset a checklist was run on, but isn't itself an asset and isn't a kind of asset.

relatedMatch is the lowest-information predicate, and it earns its keep two ways: as the honest home for genuine association, and as the honest home for uncertainty (R11). What it must not become is a dumping ground for mappings nobody wanted to think about (anti-pattern A4).

When uncertain, go weaker (R11)

If you're torn between two predicates, pick the weaker one: exactMatch → closeMatch → relatedMatch. Stronger predicates carry stronger implications under reasoners, and a weak predicate can always be strengthened later with fresh deliberation. Never silently strengthen in the other direction.

Bridges and versions (R12)

Two of the three ways crosswalks decay are structural, and R12 exists for both. Re-versioning: underlying standards move (800-53 revisions, DCWF 1.0 to 2.0, FHIR R4 to R5), so every mapping's notes carry the versions it was authored against, the citation graph gets reconciled against current authority documents periodically, and an unresolved citation is treated as audit-blocking, not cosmetic. Bridge-layer collapse: when a known intermediate vocabulary sits between two endpoints (the DoD stack runs STIG → CCI → NIST 800-53, and assessment tooling keys on the CCI layer), don't author the layer-skipping shortcut just because the bridge isn't imported yet. File the bridge as pending and wait. A missing mapping is recoverable. A wrong-shape mapping that gets cited downstream is not.

Compound requirements (R13)

SKOS models binary mappings: one source concept, one target concept. Compliance requirements are frequently conjunctive. A single control can demand minimum password length AND character-class diversity, and pretending that's several independent binary mappings loses the conjunction. When one source concept genuinely maps to a combination of targets, the mapping carries explicit n-ary annotation (R13).

Automated and LLM-proposed mappings (R7, R14)

Similarity scoring may propose predicates: very high similarity with an identical URI stem proposes exactMatch (reviewed carefully per R1), strong similarity proposes closeMatch, moderate proposes relatedMatch, and below that no mapping is proposed at all. These are proposals only. Hierarchical predicates always require structural evidence beyond similarity, and LLM-proposed predicates get no special trust: they pass through the same R8 validation gate as everything else.

How the proposer scores: Tversky first, embeddings as an opt-in extender

GRCSchema's auto-proposer (DR-006 Phase 3) is a hybrid. The first pass scores every candidate pair with Tversky similarity on tokenized descriptions — the same scoring family R8 names by name for its validation gate, and the family R7's confidence bands were calibrated against (Faria 2013, Volz 2009). Tversky's bands are deterministic, reproducible across runs and across years, and inspectable: a challenger can see which tokens overlapped and which didn't, which is what makes a published mapping defensible at audit time. Pure cosine on embeddings can't make that promise. Providers silently update their embedding endpoints, dense vectors share so much natural-language baseline that the R7 thresholds lose their meaning without a separate calibration pass on our corpus, and "the cosine was 0.94" is a number nobody can argue with productively.

But pure Tversky has a known failure mode that matters for compliance vocabularies specifically. DISA calls a thing an "asset," NIST calls it a "system," MITRE calls it a "target," and bag-of-words has nothing to say about any of it. The synonym-heavy pair scores near zero on Tversky, the proposer says "no mapping," and the cross-vocabulary bridge that should exist never gets authored. Silent missing mappings are the worst kind of audit gap because no audit cycle catches them — the bridge that should have existed is invisible to the auditor too.

So the proposer offers a second pass on demand. When the Tversky first pass returns less than the author was expecting, they can click Find more matches with embeddings, which scores the remaining pairs (the ones Tversky dropped below the no-propose threshold) with embedding cosine on name + description. Embedding-derived proposals only land for pairs that simultaneously (a) scored below Tversky's 0.60 threshold AND (b) scored above a calibrated embedding threshold. They arrive in the review table flagged as lower-confidence synonym candidates — the author can accept them, override them, or skip. R8 still applies: an embedding-proposed exactMatch whose Tversky-on-descriptions is below 0.9 gets downgraded to closeMatch exactly as if a human had proposed it.

The audit row records both pieces of information. The proposed_by column distinguishes similarity (Tversky) from embedding (the extender) from human (manual authoring) from llm (later, if a model-based proposer ships under R14). The proposed_score column records the score from the method that actually made the proposal. The rule_applied array records which numbered rules fired — ['R7','R8','R11'] for a closeMatch that started as an exactMatch and got R8-downgraded, ['R7','R9'] for a hierarchical predicate that satisfied the stem-and-scope check. When a challenge arrives a year from now, the maintainer can answer "you applied R7 to a Tversky score of 0.93, but R8 should have downgraded it" — adjudicable, not a matter of taste.

The framing matters: embeddings are an extender, not a replacement. The deterministic Tversky path is the canonical proposer because it's the one R7's bands were calibrated against, the one R8 requires for validation regardless, and the one a challenger can argue with token-by-token. The embedding pass is the synonym-recovery lift that compliance vocabularies need, bounded by the same validation gate the rest of the framework runs on, and surfaced as a deliberate authorial gesture rather than a default that has to be argued against per-row.

Decision flowchart

Are the two terms semantically interchangeable in every context? (R1)
├── Yes → exactMatch (write the R10 justification)
└── No
    │
    Is one term hierarchically narrower or broader than the other? (R3/R4)
    ├── Source is narrower → broadMatch
    ├── Source is broader  → narrowMatch
    └── No
        │
        Do they overlap substantially, differing mainly in scope
        or granularity? (R2)
        ├── Yes → closeMatch
        └── No
            │
            Genuinely associated? (R5)
            ├── Yes → relatedMatch
            └── Uncertain → relatedMatch, or no mapping at all (R11, A3)

How decisions get made

Author proposes a mapping in the schema-app's Mappings panel (property detail page → "+ Add mapping").
exactMatch requires notes (R10): one concrete reasoning case where substitution preserves truth.
Mapping is visible immediately to team members. To the public it appears only after both endpoint vocabularies reach O (Open for comments) or P (Published) status.
Public review window: at status O, the mapping appears in the public JSON-LD output but isn't yet canonical. This is when consumers can comment.
Promotion to P: when both endpoint vocabularies are Published, the mapping is canonical. Changing it after that requires bumping the vocabulary's version.

Challenging a decision

Disagree with a published mapping? Contact us with the source IRI, the target IRI, the predicate you dispute, and the rule you believe was misapplied. Citing the rule number isn't ceremony – it's what makes the disagreement adjudicable instead of a matter of taste. The maintainer team reviews challenges and can re-grade a mapping, with the change and its rationale entering the same audit trail as the original decision.

Audit trail

Every mapping carries created_at, created_by, and notes. Every re-grade appends rather than overwrites. A crosswalk that can't show who decided what, under which rule, and when, is exactly the kind that doesn't survive its next audit – which is the argument of the practitioner companion piece, Why Your Compliance Crosswalk Won't Survive Its Next Audit.