The Disambiguation Schema
Disambiguation of Persons, Organizations, and Groups is very important in the world of GRC and SecOps.
Why disambiguation is important is covered in other documentation. We all use at least one Identity Management system, and undoubtedly larger organizations use multiple simultaneously.
Federated Identities you probably use
In a typical organization, you’ll have the following systems that need to associate real persons with their digital identities:
- Human Resources, not just for payroll & benefits but also for any possible identity badges, etc.
- E-mail servers that are tied to a directory server (hopefully) such as an LDAP or Active Directory server.
- Security systems for signing you on to your computers (hopefully).
- Physical access management (maybe) for managing electronic keys, etc.
Even the few identity management systems we mention above each have their own schemas. Here are the additional categories of identity management sources you might have to integrate (we won’t go into depth on each of the schemas for these):
Microsoft
- Active Directory (Powershell)
- Azure AD
- Dynamics365
- Exchange
- Office 365
- SQL Server
- Windows Local
Oracle
- eBusiness Suite (EBS)
- IDCS
- RDBMS
- Oracle HR
Infrastructure software
- Database Application Table Connector (JDBC)
- GIT Enterprise
- LDAP (OpenLDAP, eDirectory, Active Directory, Apache DS)
- Linux (RHEL, CentOS, Ubuntu)
- Redhat IPA
- SCIM 2
- Script Connector
Cloud Services
- eBusiness Suite (EBS)
- IDCS
- RDBMS
- Freshdesk
- Freshservice
- Google GSuite
- Salesforce.com
- SAP S/4 Hana
- Slack (SCIM)
- Tableau
- Boomi (Read-only)
- Lastpass (Read-only)
HR systems
- ADP
- KinHR
- Oracle HR
- PayChex
- SAP HR
- Source Adapter (Multi-Format)
- Workday
One would think that Federated Identify Management systems would exist to unite all of these identities. Fat Chance. The problem is that there are many Federated Identity Management systems – some of which can coordinate their frameworks, rules, and policies with each other, and some of which can’t.
International Authority Sources
Suppose you need to track any of your staff’s names (or contributions) that they’ve ever published. In that case, there’s a good chance that your people’s names (and their contributions) are in one of these international libraries’ databases below. Each of these libraries, whether physical or virtual, maintains Authority Source Records1 for people’s and organization’s names and publications.
BAV | Biblioteca Apostolica Vaticana |
---|---|
BNE | Biblioteca Nacional de España |
BNF | Bibliothèque Nationale de France |
DNB | Deutsche Nationalbibliothek |
EGAXA | Bibliotheca Alexandrina (Egypt) |
GRCS | GRCSchema.org federated repository |
ICCU | Istituto Centrale per il Catalogo Unico |
JPG | Getty Research Institute |
LC | Library of Congress/NACO |
LAC | Library and Archives Canada |
NKC | National Library of the Czech Republic |
NLA | National Library of Australia |
NLIara | National Library of Israel (Arabic) |
NLIcyr | National Library of Israel (Cyrillic) |
NLIheb | National Library of Israel (Hebrew) |
NLIlat | National Library of Israel (Latin) |
NUKAT | The National Union Catalog of Poland |
OCLC | Online Computer Library Center |
PTBNP | Biblioteca Nacional de Portugal |
SELIBR | National Library of Sweden |
SWNL | Swiss National Library |
VIAF | Virtual International Authority File |
The best that we can do is maintain a sort-of Babel fish translator2 when trying to tie each person to any identity source. For every name associated with a real person, we can track a represented version of it, along with the Authority Source for that version and the ID associated with the record in the Authority Source. The simplified JSON structure below makes this easier to understand. The schema for adding Authority Sources to your data structure is very straightforward.
This documentation covers the schema for disambiguation records as Things. Therefore, elementId, @id, coreMetaData, and context will always be present in Things.
Property | Expected Type | Description |
---|---|---|
liveStatus | String | A Boolean field of "live" (1::boolean) or "deprecated" (0::boolean). |
elementId | String | A unique and persistent identifier for the record within the system's data set. |
@id | URL | The full unique link to the item so it's traversable by that property. |
statusDate | Datetime | The status date value. |
coreMetaData | Object | The object representation of the Thing CoreMetaData. |
authorityRecords | Object | An array of Authority Record IDs. |
context | Context | The JSON-LD context for the item in question. |
Live Status
This is a simple Boolean field that annotates whether the Thing in question has been disambiguated (1) or not (0 or null).
Status Date
This is a date-time field for when the Live Status was set.
Authority Records
This is an array of Disambiguated Record ID properties linked to an Authority Source. It contains the element ID of the Authority Record and a pointer to the Authority Source in question that created the record.
Property | Expected Type | Description |
---|---|---|
elementId | String | A unique and persistent identifier for the record within the system's data set. |
@id | URL | The full unique link to the item so it's traversable by that property. |
authoritySource | Object | The object representation of the Thing AuthoritySource. |
Footnotes
- Authority Source Records are MARC21 or Unimarc Name Authority records that have been processed for improved uniformity. ↩
- “The Babel fish is a small, bright yellow fish, which can be placed in someone's ear in order for them to be able to hear any language translated into their first language.” Douglas Adams, The Hitchhiker’s Guide to the Galaxy. ↩