Skip to content

Graph-Hybrid Mode

The graph-hybrid mode is the recommended mode for FTM data. It combines LLM-based schema inference with semantic alignment via knowledge graph search, using embeddings to bridge terminology gaps between the LLM's natural language and the ontology's formal vocabulary.

Overview

Aspect Value
Ontology Required Yes (loaded into graph)
LLM Calls for Schema 2 (same as LLM mode) + 1 (consolidation)
Type Consistency Excellent
Setup Time High (ontology graph required)
Best For FTM data, semantic alignment needed

Pipeline

graph TD
    A[Phase 1: LLM-First Inference] --> B[Phase 2: Semantic Alignment]
    B --> C[Phase 3: Property Enrichment]
    C --> D[Phase 4: Consolidation]

    A -- "Unbiased — LLM does NOT see ontology" --> A
    B -- "Exact → alt-label → embedding → LLM rerank" --> B
    C -- "Data-driven: only properties in actual data" --> C
    D -- "LLM coherence review, protects ontology types" --> D

Prerequisites

# Step 1: Load ontology into graph (once)
aletheia build-ontology-graph \
  --use-case my_case \
  --knowledge-graph my_ontology

# Step 2: Build knowledge graph with graph-hybrid
aletheia build-knowledge-graph \
  --use-case my_case \
  --knowledge-graph my_graph \
  --schema-mode graph-hybrid \
  --ontology-graph my_ontology

Phase 1: Unbiased LLM Inference

The LLM analyzes sample data without seeing the ontology. This prevents the LLM from forcing data into ontology terms — it extracts what's naturally in the data.

Input: FTM entities from OpenSanctions

LLM infers (unbiased):
  Entity Types:
    - Organization (most common)
    - Sanction (designation records)
    - Person (individuals)

  Relationships:
    - HAS_ENTITY (sanction → target)
    - HAS_ALIAS (organization → alias)

Phase 2: Semantic Alignment

Each inferred type is matched against ontology concepts through a priority cascade:

Priority Method Confidence Example
1 Exact match (case-insensitive) 0.95 "Person" → Person
1b Alt-label match 0.90 "Company" → Organization (via alt-label)
2 Embedding similarity Variable "Airport" → Aerodrome General
3 LLM reranking Variable Close candidates ranked by an LLM

Why the Priority Cascade Matters

Exact matches are processed first to prevent the embedding search from "stealing" names already in the ontology. Without this ordering, searching for "Person" might return "PersonOfInterest" as the top embedding result.

Duplicate Prevention

Once an ontology concept is claimed by an exact match, it cannot be assigned to another inferred type via embedding search. This ensures one-to-one mapping.

Relationship Alignment

Relationship types follow a similar cascade with two additional priority levels:

Priority Method Example
0 Source class name match "ISSUED_BY" aligns to ontology class "Issues"
0b Alt-label match Relationship alt-labels
1 Direct exact match "HAS_ALIAS" → HAS_ALIAS
2 Name root match (strip affixes) "OWNED_BY" matches "Ownership"

Unaligned Types

Types below the confidence threshold are kept as-is in the schema:

✓ Airport → Aerodrome General (89%)
✓ Person → Person (95%)
⚠️ CustomType → (kept as-is, best match was 54%)

Alignment Report

{
  "mode": "graph-hybrid",
  "overall_confidence": 0.91,
  "entity_alignments": [
    {
      "inferred_name": "Airport",
      "ontology_name": "Aerodrome General",
      "confidence": 0.89,
      "alternatives": ["Runway", "Heliport"]
    }
  ],
  "failed_entities": [
    {
      "entity": "CustomType",
      "reason": "Best match (54%) below threshold (70%)"
    }
  ]
}

Phase 3: Property Enrichment (Data-Driven)

For aligned entities, ontology properties are added — but only those present in the actual data. This prevents schema bloat when ontologies define many properties irrelevant to the dataset.

Phase 3: Property Enrichment
  ✓ Person: +8 properties (filtered from 32 in ontology)
     - birthDate, nationality, ...
  ✓ Organization: +5 properties (filtered from 45 in ontology)
     - jurisdiction, status, ...
  ⊘ Filtered: 64 properties not in data

Properties are sourced from:

  • Direct properties of the matched ontology class
  • Inherited properties from parent classes

Enriched Docstrings

Each aligned entity type gets an enriched docstring that includes property metadata:

Aircraft — A fixed-wing or rotary-wing aircraft.
Key attributes: registration (Aircraft registration mark), icaoCode (ICAO type designator)

These flow to Graphiti's extraction and deduplication prompts, giving the LLM concrete signal about what attributes to extract.

Phase 4: Consolidation

The common final step applied to all schema modes. An LLM reviews the complete schema for:

  • Redundant types — merges semantically similar entity or relationship types
  • Over-specialized types — consolidates overly narrow types
  • Naming inconsistencies — normalizes type names

Ontology-aligned types are protected from removal.

Edge Type Map

The generated schema includes an edge type map controlling which relationship types are valid between entity pairs. All types are placed in a ("Entity", "Entity") catch-all entry, ensuring valid types are never rejected because the source/target labels don't match exact pairs.

Pros and Cons

Advantages

  • Semantic understanding — matches by meaning, not string similarity
  • Language agnostic — "Persona" matches "Person", "Airport" matches "Aerodrome"
  • LLM discovery — finds patterns not in ontology
  • Transparency — alignment report shows every decision
  • Data-driven enrichment — only relevant ontology properties added

Disadvantages

  • Requires ontology graph — extra setup step
  • Ontology concepts may be lost — if the LLM doesn't discover them, they won't appear
  • Embedding dependency — requires consistent embedding model

When to Use

  • FTM/OpenSanctions data — designed for this use case
  • Semantic alignment needed — domain terms don't match ontology terms
  • LLM discovery valuable — want to find patterns in data
  • Transparency required — need to audit alignment decisions

When NOT to Use

  • Must preserve all ontology concepts — use ontology-first instead
  • No ontology available — use llm mode
  • Quick prototyping — use none or llm

Comparison with Ontology-First

Aspect Graph-Hybrid Ontology-First
Primary source LLM inference Ontology
LLM role Discovery + alignment Enhancement only
All ontology concepts? No (only what LLM finds) Yes (guaranteed)
Discovers new types Yes Yes
Data-driven filtering Property enrichment Schema pruning
Non-reified relationships No Yes

If you need every ontology concept guaranteed in the schema, use ontology-first.

Troubleshooting

Low Alignment Confidence

If too many types aren't aligning:

  1. Lower threshold: --alignment-confidence 0.6
  2. Check ontology: Ensure concepts exist in ontology graph
  3. Review alignment report: See what alternatives were considered

Missing Ontology Concepts

If important ontology concepts are missing from schema:

→ Use ontology-first mode instead (guarantees all ontology concepts)

Wrong Alignments

If alignments are incorrect:

  1. Raise threshold: --alignment-confidence 0.8
  2. Review alternatives: Check alignment report for better options
  3. Extend ontology: Add more specific classes or alt-labels