Graph-Hybrid Mode¶
The graph-hybrid mode is the recommended mode for FTM data. It combines LLM-based schema inference with semantic alignment via knowledge graph search, using embeddings to bridge terminology gaps between the LLM's natural language and the ontology's formal vocabulary.
Overview¶
| Aspect | Value |
|---|---|
| Ontology Required | Yes (loaded into graph) |
| LLM Calls for Schema | 2 (same as LLM mode) + 1 (consolidation) |
| Type Consistency | Excellent |
| Setup Time | High (ontology graph required) |
| Best For | FTM data, semantic alignment needed |
Pipeline¶
graph TD
A[Phase 1: LLM-First Inference] --> B[Phase 2: Semantic Alignment]
B --> C[Phase 3: Property Enrichment]
C --> D[Phase 4: Consolidation]
A -- "Unbiased — LLM does NOT see ontology" --> A
B -- "Exact → alt-label → embedding → LLM rerank" --> B
C -- "Data-driven: only properties in actual data" --> C
D -- "LLM coherence review, protects ontology types" --> D Prerequisites¶
# Step 1: Load ontology into graph (once)
aletheia build-ontology-graph \
--use-case my_case \
--knowledge-graph my_ontology
# Step 2: Build knowledge graph with graph-hybrid
aletheia build-knowledge-graph \
--use-case my_case \
--knowledge-graph my_graph \
--schema-mode graph-hybrid \
--ontology-graph my_ontology
Phase 1: Unbiased LLM Inference¶
The LLM analyzes sample data without seeing the ontology. This prevents the LLM from forcing data into ontology terms — it extracts what's naturally in the data.
Input: FTM entities from OpenSanctions
LLM infers (unbiased):
Entity Types:
- Organization (most common)
- Sanction (designation records)
- Person (individuals)
Relationships:
- HAS_ENTITY (sanction → target)
- HAS_ALIAS (organization → alias)
Phase 2: Semantic Alignment¶
Each inferred type is matched against ontology concepts through a priority cascade:
| Priority | Method | Confidence | Example |
|---|---|---|---|
| 1 | Exact match (case-insensitive) | 0.95 | "Person" → Person |
| 1b | Alt-label match | 0.90 | "Company" → Organization (via alt-label) |
| 2 | Embedding similarity | Variable | "Airport" → Aerodrome General |
| 3 | LLM reranking | Variable | Close candidates ranked by an LLM |
Why the Priority Cascade Matters¶
Exact matches are processed first to prevent the embedding search from "stealing" names already in the ontology. Without this ordering, searching for "Person" might return "PersonOfInterest" as the top embedding result.
Duplicate Prevention¶
Once an ontology concept is claimed by an exact match, it cannot be assigned to another inferred type via embedding search. This ensures one-to-one mapping.
Relationship Alignment¶
Relationship types follow a similar cascade with two additional priority levels:
| Priority | Method | Example |
|---|---|---|
| 0 | Source class name match | "ISSUED_BY" aligns to ontology class "Issues" |
| 0b | Alt-label match | Relationship alt-labels |
| 1 | Direct exact match | "HAS_ALIAS" → HAS_ALIAS |
| 2 | Name root match (strip affixes) | "OWNED_BY" matches "Ownership" |
Unaligned Types¶
Types below the confidence threshold are kept as-is in the schema:
✓ Airport → Aerodrome General (89%)
✓ Person → Person (95%)
⚠️ CustomType → (kept as-is, best match was 54%)
Alignment Report¶
{
"mode": "graph-hybrid",
"overall_confidence": 0.91,
"entity_alignments": [
{
"inferred_name": "Airport",
"ontology_name": "Aerodrome General",
"confidence": 0.89,
"alternatives": ["Runway", "Heliport"]
}
],
"failed_entities": [
{
"entity": "CustomType",
"reason": "Best match (54%) below threshold (70%)"
}
]
}
Phase 3: Property Enrichment (Data-Driven)¶
For aligned entities, ontology properties are added — but only those present in the actual data. This prevents schema bloat when ontologies define many properties irrelevant to the dataset.
Phase 3: Property Enrichment
✓ Person: +8 properties (filtered from 32 in ontology)
- birthDate, nationality, ...
✓ Organization: +5 properties (filtered from 45 in ontology)
- jurisdiction, status, ...
⊘ Filtered: 64 properties not in data
Properties are sourced from:
- Direct properties of the matched ontology class
- Inherited properties from parent classes
Enriched Docstrings¶
Each aligned entity type gets an enriched docstring that includes property metadata:
Aircraft — A fixed-wing or rotary-wing aircraft.
Key attributes: registration (Aircraft registration mark), icaoCode (ICAO type designator)
These flow to Graphiti's extraction and deduplication prompts, giving the LLM concrete signal about what attributes to extract.
Phase 4: Consolidation¶
The common final step applied to all schema modes. An LLM reviews the complete schema for:
- Redundant types — merges semantically similar entity or relationship types
- Over-specialized types — consolidates overly narrow types
- Naming inconsistencies — normalizes type names
Ontology-aligned types are protected from removal.
Edge Type Map¶
The generated schema includes an edge type map controlling which relationship types are valid between entity pairs. All types are placed in a ("Entity", "Entity") catch-all entry, ensuring valid types are never rejected because the source/target labels don't match exact pairs.
Pros and Cons¶
Advantages¶
- Semantic understanding — matches by meaning, not string similarity
- Language agnostic — "Persona" matches "Person", "Airport" matches "Aerodrome"
- LLM discovery — finds patterns not in ontology
- Transparency — alignment report shows every decision
- Data-driven enrichment — only relevant ontology properties added
Disadvantages¶
- Requires ontology graph — extra setup step
- Ontology concepts may be lost — if the LLM doesn't discover them, they won't appear
- Embedding dependency — requires consistent embedding model
When to Use¶
- FTM/OpenSanctions data — designed for this use case
- Semantic alignment needed — domain terms don't match ontology terms
- LLM discovery valuable — want to find patterns in data
- Transparency required — need to audit alignment decisions
When NOT to Use¶
- Must preserve all ontology concepts — use
ontology-firstinstead - No ontology available — use
llmmode - Quick prototyping — use
noneorllm
Comparison with Ontology-First¶
| Aspect | Graph-Hybrid | Ontology-First |
|---|---|---|
| Primary source | LLM inference | Ontology |
| LLM role | Discovery + alignment | Enhancement only |
| All ontology concepts? | No (only what LLM finds) | Yes (guaranteed) |
| Discovers new types | Yes | Yes |
| Data-driven filtering | Property enrichment | Schema pruning |
| Non-reified relationships | No | Yes |
If you need every ontology concept guaranteed in the schema, use ontology-first.
Troubleshooting¶
Low Alignment Confidence¶
If too many types aren't aligning:
- Lower threshold:
--alignment-confidence 0.6 - Check ontology: Ensure concepts exist in ontology graph
- Review alignment report: See what alternatives were considered
Missing Ontology Concepts¶
If important ontology concepts are missing from schema:
→ Use ontology-first mode instead (guarantees all ontology concepts)
Wrong Alignments¶
If alignments are incorrect:
- Raise threshold:
--alignment-confidence 0.8 - Review alternatives: Check alignment report for better options
- Extend ontology: Add more specific classes or alt-labels
Related¶
- Ontology-First Mode — Ontology as primary source
- Hybrid Mode — String-based alignment
- LLM Mode — No alignment
- Overview — Comparison of all modes