Dictionary JSON

The dictionary JSON shape is designed to be simple enough for scripts and notebooks, but stable enough to import/export through the governance API later.

Minimal shape

{
  "profile_name": "infra_incidents",
  "profile_description": "Infrastructure incident terminology",
  "terms": [
    {
      "canonical_value": "kubernetes",
      "slot": "TOOL",
      "aliases": [
        "k8s",
        { "value": "kube", "confidence": 0.95 }
      ]
    }
  ],
  "profile_stop_list": [
    { "value": "tmp", "target": "alias", "reason": "too generic" }
  ],
  "global_stop_list": [
    { "value": "unknown", "target": "both", "reason": "global noise" }
  ]
}

Terms

A term defines a canonical value and its semantic slot.

{
  "canonical_value": "postgresql",
  "slot": "DATABASE",
  "aliases": ["pg", "postgres", "psql"]
}

Aliases

Aliases can be strings or objects with metadata.

"k8s"

{ "value": "kube", "confidence": 0.95 }

Stop lists

Stop lists reduce noisy matches. Use them for short values, generic words, or terms that look like aliases but are too ambiguous in your corpus.

{ "value": "tmp", "target": "alias", "reason": "too generic" }

Recommended workflow

Start with a small dictionary.
Validate it locally.
Run extraction on representative documents.
Add stop-list entries for false positives.
Import the dictionary into the governance platform when the UI/API workflow is ready.