Skip to content

Dictionary JSON

The dictionary JSON shape is designed to be simple enough for scripts and notebooks, but stable enough to import/export through the governance API later.

{
"profile_name": "infra_incidents",
"profile_description": "Infrastructure incident terminology",
"terms": [
{
"canonical_value": "kubernetes",
"slot": "TOOL",
"aliases": [
"k8s",
{ "value": "kube", "confidence": 0.95 }
]
}
],
"profile_stop_list": [
{ "value": "tmp", "target": "alias", "reason": "too generic" }
],
"global_stop_list": [
{ "value": "unknown", "target": "both", "reason": "global noise" }
]
}

A term defines a canonical value and its semantic slot.

{
"canonical_value": "postgresql",
"slot": "DATABASE",
"aliases": ["pg", "postgres", "psql"]
}

Aliases can be strings or objects with metadata.

"k8s"
{ "value": "kube", "confidence": 0.95 }

Stop lists reduce noisy matches. Use them for short values, generic words, or terms that look like aliases but are too ambiguous in your corpus.

{ "value": "tmp", "target": "alias", "reason": "too generic" }
  1. Start with a small dictionary.
  2. Validate it locally.
  3. Run extraction on representative documents.
  4. Add stop-list entries for false positives.
  5. Import the dictionary into the governance platform when the UI/API workflow is ready.