The Disambiguation Schema

Disambiguation of Persons, Organizations, and Groups is very important in the world of GRC and SecOps.

Why disambiguation is important is covered in other documentation. We all use at least one Identity Management system, and undoubtedly larger organizations use multiple simultaneously.

Federated Identities you probably use

In a typical organization, you’ll have the following systems that need to associate real persons with their digital identities:

  • Human Resources, not just for payroll & benefits but also for any possible identity badges, etc.
    • E-mail servers that are tied to a directory server (hopefully) such as an LDAP or Active Directory server.
    • Security systems for signing you on to your computers (hopefully).
    • Physical access management (maybe) for managing electronic keys, etc.

Even the few identity management systems we mention above each have their own schemas. Here are the additional categories of identity management sources you might have to integrate (we won’t go into depth on each of the schemas for these):

Microsoft

  • Active Directory (Powershell)
    • Azure AD
    • Dynamics365
    • Exchange
    • Office 365
    • SQL Server
    • Windows Local

Oracle

  • eBusiness Suite (EBS)
    • IDCS
    • RDBMS
    • Oracle HR

Infrastructure software

  • Database Application Table Connector (JDBC)
    • GIT Enterprise
    • LDAP (OpenLDAP, eDirectory, Active Directory, Apache DS)
    • Linux (RHEL, CentOS, Ubuntu)
    • Redhat IPA
    • SCIM 2
    • Script Connector

Cloud Services

  • eBusiness Suite (EBS)
    • IDCS
    • RDBMS
    • Freshdesk
    • Freshservice
    • Google GSuite
    • Salesforce.com
    • SAP S/4 Hana
    • Slack (SCIM)
    • Tableau
    • Boomi (Read-only)
    • Lastpass (Read-only)

HR systems

  • ADP
    • KinHR
    • Oracle HR
    • PayChex
    • SAP HR
    • Source Adapter (Multi-Format)
    • Workday

One would think that Federated Identify Management systems would exist to unite all of these identities. Fat Chance. The problem is that there are many Federated Identity Management systems – some of which can coordinate their frameworks, rules, and policies with each other, and some of which can’t.

International Authority Sources

Suppose you need to track any of your staff’s names (or contributions) that they’ve ever published. In that case, there’s a good chance that your people’s names (and their contributions) are in one of these international libraries’ databases below. Each of these libraries, whether physical or virtual, maintains Authority Source Records1 for people’s and organization’s names and publications.

BAVBiblioteca Apostolica Vaticana
BNEBiblioteca Nacional de España
BNFBibliothèque Nationale de France
DNBDeutsche Nationalbibliothek
EGAXABibliotheca Alexandrina (Egypt)
GRCSGRCSchema.org federated repository
ICCUIstituto Centrale per il Catalogo Unico
JPGGetty Research Institute
LCLibrary of Congress/NACO
LACLibrary and Archives Canada
NKCNational Library of the Czech Republic
NLANational Library of Australia
NLIaraNational Library of Israel (Arabic)
NLIcyrNational Library of Israel (Cyrillic)
NLIhebNational Library of Israel (Hebrew)
NLIlatNational Library of Israel (Latin)
NUKATThe National Union Catalog of Poland
OCLCOnline Computer Library Center
PTBNPBiblioteca Nacional de Portugal
SELIBRNational Library of Sweden
SWNLSwiss National Library
VIAFVirtual International Authority File

The best that we can do is maintain a sort-of Babel fish translator2 when trying to tie each person to any identity source. For every name associated with a real person, we can track a represented version of it, along with the Authority Source for that version and the ID associated with the record in the Authority Source. The simplified JSON structure below makes this easier to understand. The schema for adding Authority Sources to your data structure is very straightforward.

This documentation covers the schema for disambiguation records as Things. Therefore, elementId, @id, coreMetaData, and context will always be present in Things.

PropertyExpected TypeDescription
liveStatusStringA Boolean field of "live" (1::boolean) or "deprecated" (0::boolean).
elementIdStringA unique and persistent identifier for the record within the system's data set.
@idURLThe full unique link to the item so it's traversable by that property.
statusDateDatetimeThe status date value.
coreMetaDataObjectThe object representation of the Thing CoreMetaData.
authorityRecordsObjectAn array of Authority Record IDs.
contextContextThe JSON-LD context for the item in question.

Live Status

This is a simple Boolean field that annotates whether the Thing in question has been disambiguated (1) or not (0 or null).

Status Date

This is a date-time field for when the Live Status was set.

Authority Records

This is an array of Disambiguated Record ID properties linked to an Authority Source. It contains the element ID of the Authority Record and a pointer to the Authority Source in question that created the record.

PropertyExpected TypeDescription
elementIdStringA unique and persistent identifier for the record within the system's data set.
@idURLThe full unique link to the item so it's traversable by that property.
authoritySourceObjectThe object representation of the Thing AuthoritySource.

Footnotes

  1. Authority Source Records are MARC21 or Unimarc Name Authority records that have been processed for improved uniformity.
  2. “The Babel fish is a small, bright yellow fish, which can be placed in someone's ear in order for them to be able to hear any language translated into their first language.” Douglas Adams, The Hitchhiker’s Guide to the Galaxy.