Why a Multi-lensatic Data Format

Open any compliance program and the same unit of work wears five name tags. A regulation mandates it in legal language. A technical control specifies how it gets configured and graded. A workforce framework hands it to a role. A proficiency model grades how hard it is to do. And now the AI catalogs claim software can do it. Five vocabularies, written by five communities, for five purposes. One piece of work.

A unified data structure across those communities is impossible, and the impossibility is structural: the field is a federation. Every standards body encodes what its community cares about, and communities don't surrender that. They shouldn't have to. What can exist is a common data format: shared shapes and citable addresses for the elements of compliance data that travel between systems, without impinging on any content provider's vocabulary or any developer's extensions. Translation, not conquest. That's what GRCSchema.org is for.

So read the survey below as a witness list, not a leaderboard. Each format earned its place by answering a question its community actually has. They're grouped by the question they answer, because that grouping (not the alphabet) is how you'll reach for them in practice.

The regulatory mandate lens: what am I obligated to do?

Akoma Ntoso ("linked hearts" in the Akan language of West Africa) began as an initiative of the Africa i-Parliament Action Plan and defines XML representations of parliamentary, legislative, and judiciary documents. It became an OASIS standard in 2018 and seeded OASIS' LegalDocML.

LegalXML, managed by OASIS Open, splits into LegalDocML (document structure for legislatures, courts, and contracts, based on Akoma Ntoso) and LegalRuleML (a rule interchange language that lets implementers structure, evaluate, and compare legal arguments).

StratML is the Strategy Markup Language, born from the US E-Gov Act's requirement that federal agencies publish strategic and performance plans in machine-readable form. Its goal: make goals, objectives, and stakeholders shareable, linkable, and analyzable.

RegGenome publishes machine-readable regulatory content at corpus scale, spun out of the University of Cambridge's Regulatory Genome Project. Where the formats above structure individual documents, RegGenome structures the world's regulatory text as data.

The technical control lens: what must be configured, and how is it graded?

OSCAL, NIST's Open Security Controls Assessment Language, provides machine-readable representations of control catalogs, baselines, system security plans, and assessment results in XML, JSON, and YAML. FedRAMP's automation effort is built on it.

CCI, the DoD's Control Correlation Identifier, gives a standard identifier to each singular, actionable statement inside an IA control. CCI is the bridge between high-level policy and low-level technical implementation: it lets one configuration check trace upward to every framework that demanded it.

CIS Benchmarks, from the Center for Internet Security, are consensus-developed secure configuration guidelines covering operating systems, cloud platforms, applications, and network devices. STIGs answer "what must be configured" for defense systems. The Benchmarks answer it for everyone else.

Secure Controls Framework (SCF) maintains a harmonized catalog of controls mapped to a large set of laws, regulations, and frameworks, with typed, strength-scored relationships in its Set Theory Relationship Mapping work.

NIST OLIR, the Online Informative References program (home of the Derived Relationship Mapping tool), publishes NIST's own crosswalks between reference documents and focal documents like the Cybersecurity Framework.

Unified Compliance Framework pioneered large-scale regulatory harmonization and holds multiple patents on compliance mapping and dictionary structures.

Open Policy Agent approaches controls from the enforcement side: a general-purpose policy engine whose Rego language expresses policy as code across microservices, Kubernetes, CI/CD pipelines, and APIs.

The asset and vulnerability vocabularies: what exists, and what's wrong with it?

CPE, Common Platform Enumeration, is the structured naming scheme for IT systems, software, and packages, built on URI syntax.

CVE records publicly known cybersecurity vulnerabilities: an ID, a description, and at least one public reference, consumed by security products worldwide including the U.S. National Vulnerability Database.

SWID Tags, Software Identification Tags, identify installed software. NIST's validation methodology (NISTIR 8060) defines what a well-formed tag must carry.

The threat and incident lenses: who attacks, and what happened?

MITRE ATT&CK is the knowledge base of adversary tactics and techniques drawn from real-world observations, foundational to threat models across government and industry.

STIX is the language for exchanging cyber threat intelligence. TAXII is the application-layer protocol that carries it.

VERIS provides a common language for describing security incidents in a structured, repeatable way. It's the vocabulary underneath the Verizon DBIR.

OCSF, the Open Cybersecurity Schema Framework, is the open, vendor-agnostic schema for security event data, adopted across the major cloud and security platforms since 2022. If your telemetry travels between tools, it increasingly travels as OCSF.

The workforce and proficiency lenses: whose job is this, and what does it demand?

O*NET is the United States' primary source of occupational information: standardized descriptors for nearly 1,000 occupations covering the entire U.S. economy.

NICE, the NIST-led Workforce Framework for Cybersecurity, and the DoD's Cyber Workforce Framework (DCWF) define the work roles, tasks, knowledge, and skills of the cybersecurity workforce.

CTDL, the Credential Transparency Description Language, describes credentials, organizations, assessments, learning opportunities, and competencies richly enough to compare credentials against each other.

Rich Skill Descriptors build on CTDL-ASN so that achievements, pathways, and learner records can make machine-readable references to skills.

The dictionary and provenance layer: what do the words mean, and who said so?

SKOS, the W3C's Simple Knowledge Organization System, standardizes concepts, labels, and relationships for thesauri and classification schemes. Its mapping predicates (exactMatch, closeMatch, broadMatch, narrowMatch, relatedMatch) are how crosswalks state how well two terms correspond. See SKOS Mapping Decisions.

Dictionary APIs from Merriam-Webster and Wordnik ship structured JSON for definitions, pronunciations, and etymologies. Their wire shapes, validated by millions of queries a day, inform how GRCSchema renders lexical data.

Bibliographic and citation formats carry the provenance discipline this whole field depends on: BibTeX for reference management, CSL for automating citation formatting, Zotero (whose schema structures citation data as JSON), and IFLA's FRBR entity-relationship model for bibliographic records.

Document structure standards round it out: SGML (ISO 8879), the ancestor of generalized declarative markup, and ReSpec, the W3C's tooling for writing technical specifications.

Where GRCSchema.org fits

Every format above keeps its sovereignty. Nobody on this list is asked to change a word, adopt anyone else's identifiers, or map into a winner, because there will never be a winner, for the same reason there will never be a universal language.

What GRCSchema.org adds is the connective tissue: JSON-LD shapes with public, citable addresses, so that a record from any of these witnesses can travel into your systems carrying its meaning, its provenance, and its source's name. Walk in holding whichever vocabulary you have. The structure is what we share.

If a format belongs on this list and isn't here, tell us. The list grows the way the field does: one witness at a time.