Graph-Hybrid Mode¶

The graph-hybrid mode is the recommended mode for FTM data. It combines LLM-based schema inference with semantic alignment via knowledge graph search, using embeddings to bridge terminology gaps between the LLM's natural language and the ontology's formal vocabulary.

Overview¶

Aspect	Value
Ontology Required	Yes (loaded into graph)
LLM Calls for Schema	2 (same as LLM mode) + 1 (consolidation)
Type Consistency	Excellent
Setup Time	High (ontology graph required)
Best For	FTM data, semantic alignment needed

Pipeline¶

graph TD
    A[Phase 1: LLM-First Inference] --> B[Phase 2: Semantic Alignment]
    B --> C[Phase 3: Property Enrichment]
    C --> D[Phase 4: Consolidation]

    A -- "Unbiased — LLM does NOT see ontology" --> A
    B -- "Exact → alt-label → embedding → LLM rerank" --> B
    C -- "Data-driven: only properties in actual data" --> C
    D -- "LLM coherence review, protects ontology types" --> D

Prerequisites¶

# Step 1: Load ontology into graph (once)
aletheia build-ontology-graph \
  --use-case my_case \
  --knowledge-graph my_ontology

# Step 2: Build knowledge graph with graph-hybrid
aletheia build-knowledge-graph \
  --use-case my_case \
  --knowledge-graph my_graph \
  --schema-mode graph-hybrid \
  --ontology-graph my_ontology

Phase 1: Unbiased LLM Inference¶

The LLM analyzes sample data without seeing the ontology. This prevents the LLM from forcing data into ontology terms — it extracts what's naturally in the data.

Input: FTM entities from OpenSanctions

LLM infers (unbiased):
  Entity Types:
    - Organization (most common)
    - Sanction (designation records)
    - Person (individuals)

  Relationships:
    - HAS_ENTITY (sanction → target)
    - HAS_ALIAS (organization → alias)

Phase 2: Semantic Alignment¶

Each inferred type is matched against ontology concepts through a priority cascade:

Priority	Method	Confidence	Example
1	Exact match (case-insensitive)	0.95	"Person" → Person
1b	Alt-label match	0.90	"Company" → Organization (via alt-label)
2	Embedding similarity	Variable	"Airport" → Aerodrome General
3	LLM reranking	Variable	Close candidates ranked by an LLM

Why the Priority Cascade Matters¶

Exact matches are processed first to prevent the embedding search from "stealing" names already in the ontology. Without this ordering, searching for "Person" might return "PersonOfInterest" as the top embedding result.

Duplicate Prevention¶

Once an ontology concept is claimed by an exact match, it cannot be assigned to another inferred type via embedding search. This ensures one-to-one mapping.

Relationship Alignment¶

Relationship types follow a similar cascade with two additional priority levels:

Priority	Method	Example
0	Source class name match	"ISSUED_BY" aligns to ontology class "Issues"
0b	Alt-label match	Relationship alt-labels
1	Direct exact match	"HAS_ALIAS" → HAS_ALIAS
2	Name root match (strip affixes)	"OWNED_BY" matches "Ownership"

Unaligned Types¶

Types below the confidence threshold are kept as-is in the schema:

✓ Airport → Aerodrome General (89%)
✓ Person → Person (95%)
⚠️ CustomType → (kept as-is, best match was 54%)

Alignment Report¶

{
  "mode": "graph-hybrid",
  "overall_confidence": 0.91,
  "entity_alignments": [
    {
      "inferred_name": "Airport",
      "ontology_name": "Aerodrome General",
      "confidence": 0.89,
      "alternatives": ["Runway", "Heliport"]
    }
  ],
  "failed_entities": [
    {
      "entity": "CustomType",
      "reason": "Best match (54%) below threshold (70%)"
    }
  ]
}

Phase 3: Property Enrichment (Data-Driven)¶

For aligned entities, ontology properties are added — but only those present in the actual data. This prevents schema bloat when ontologies define many properties irrelevant to the dataset.

Phase 3: Property Enrichment
  ✓ Person: +8 properties (filtered from 32 in ontology)
     - birthDate, nationality, ...
  ✓ Organization: +5 properties (filtered from 45 in ontology)
     - jurisdiction, status, ...
  ⊘ Filtered: 64 properties not in data

Properties are sourced from:

Direct properties of the matched ontology class
Inherited properties from parent classes

Enriched Docstrings¶

Each aligned entity type gets an enriched docstring that includes property metadata:

Aircraft — A fixed-wing or rotary-wing aircraft.
Key attributes: registration (Aircraft registration mark), icaoCode (ICAO type designator)

These flow to Graphiti's extraction and deduplication prompts, giving the LLM concrete signal about what attributes to extract.

Phase 4: Consolidation¶

The common final step applied to all schema modes. An LLM reviews the complete schema for:

Redundant types — merges semantically similar entity or relationship types
Over-specialized types — consolidates overly narrow types
Naming inconsistencies — normalizes type names

Ontology-aligned types are protected from removal.

Edge Type Map¶

The generated schema includes an edge type map controlling which relationship types are valid between entity pairs. All types are placed in a ("Entity", "Entity") catch-all entry, ensuring valid types are never rejected because the source/target labels don't match exact pairs.

Pros and Cons¶

Advantages¶

Semantic understanding — matches by meaning, not string similarity
Language agnostic — "Persona" matches "Person", "Airport" matches "Aerodrome"
LLM discovery — finds patterns not in ontology
Transparency — alignment report shows every decision
Data-driven enrichment — only relevant ontology properties added

Disadvantages¶

Requires ontology graph — extra setup step
Ontology concepts may be lost — if the LLM doesn't discover them, they won't appear
Embedding dependency — requires consistent embedding model

When to Use¶

FTM/OpenSanctions data — designed for this use case
Semantic alignment needed — domain terms don't match ontology terms
LLM discovery valuable — want to find patterns in data
Transparency required — need to audit alignment decisions

When NOT to Use¶

Must preserve all ontology concepts — use ontology-first instead
No ontology available — use llm mode
Quick prototyping — use none or llm

Comparison with Ontology-First¶

Aspect	Graph-Hybrid	Ontology-First
Primary source	LLM inference	Ontology
LLM role	Discovery + alignment	Enhancement only
All ontology concepts?	No (only what LLM finds)	Yes (guaranteed)
Discovers new types	Yes	Yes
Data-driven filtering	Property enrichment	Schema pruning
Non-reified relationships	No	Yes

If you need every ontology concept guaranteed in the schema, use ontology-first.

Troubleshooting¶

Low Alignment Confidence¶

If too many types aren't aligning:

Lower threshold: --alignment-confidence 0.6
Check ontology: Ensure concepts exist in ontology graph
Review alignment report: See what alternatives were considered

Missing Ontology Concepts¶

If important ontology concepts are missing from schema:

→ Use ontology-first mode instead (guarantees all ontology concepts)

Wrong Alignments¶

If alignments are incorrect:

Raise threshold: --alignment-confidence 0.8
Review alternatives: Check alignment report for better options
Extend ontology: Add more specific classes or alt-labels

Ontology-First Mode — Ontology as primary source
Hybrid Mode — String-based alignment
LLM Mode — No alignment
Overview — Comparison of all modes