Aletheia Ontology Integration¶
Aletheia uses ontologies to guide entity and relationship extraction, ensuring consistency and domain alignment. This page explains how the integration works.
How Ontologies Improve Extraction¶
Without Ontology¶
The LLM decides entity types based on its training:
Text: "The Boeing 737-800 operated by Lufthansa..."
LLM extraction (unpredictable):
- Entity: "Boeing 737-800" (type: "airplane" or "aircraft" or "vehicle"?)
- Entity: "Lufthansa" (type: "company" or "airline" or "operator"?)
- Relationship: "operated by" or "flown by" or "belongs to"?
With Ontology¶
The ontology constrains and guides the LLM:
Ontology defines:
- Aircraft (with properties: type, registration)
- Operator (with properties: name, country)
- HAS_OPERATOR relationship
LLM extraction (consistent):
- Entity: "Boeing 737-800" (type: Aircraft)
- Entity: "Lufthansa" (type: Operator)
- Relationship: HAS_OPERATOR
Integration Approaches¶
Aletheia supports multiple ways to use ontologies:
1. Schema Modes¶
| Mode | Ontology Usage | Best For |
|---|---|---|
none | No schema | Quick exploration |
llm / inference | LLM infers from samples | Unknown domains |
ontology | Strict ontology adherence | Formal compliance |
hybrid | LLM + ontology validation (string matching) | Balanced approach |
graph-hybrid | LLM-first + semantic alignment via ontology graph | Recommended for FTM |
ontology-first | Ontology primary, LLM enhancement, data-driven pruning | Complete ontologies |
All modes except none apply a Phase 4 consolidation step where an LLM reviews the final schema for redundancies and naming inconsistencies. Ontology-derived types are protected from removal.
See Schema Inference for detailed explanations.
2. Graph-Hybrid Mode (Recommended)¶
This mode stores the ontology in the graph database and uses semantic similarity to align extracted entities:
┌─────────────────┐ ┌──────────────────┐
│ Ontology TTL │────►│ Ontology Graph │
│ (eccairs.ttl) │ │ (FalkorDB) │
└─────────────────┘ └────────┬─────────┘
│
│ Semantic
│ Alignment
▼
┌─────────────────┐ ┌──────────────────┐
│ Source Text │────►│ Knowledge Graph │
│ (incidents) │ │ (FalkorDB) │
└─────────────────┘ └──────────────────┘
Advantages: - Ontology concepts have embeddings for semantic matching - "plane" aligns to Aircraft even without exact match - Supports ontology evolution without re-ingestion
Step-by-Step Integration¶
Step 1: Prepare Ontology File¶
Ontologies should be in Turtle (.ttl) format:
@prefix ex: <http://example.org/aviation#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ex:Occurrence a owl:Class ;
rdfs:label "Occurrence" ;
rdfs:comment "An aviation safety incident or accident" .
ex:Aircraft a owl:Class ;
rdfs:label "Aircraft" ;
rdfs:subClassOf ex:Occurrence ;
rdfs:comment "An aircraft involved in an occurrence" .
ex:hasAircraft a owl:ObjectProperty ;
rdfs:domain ex:Occurrence ;
rdfs:range ex:Aircraft ;
rdfs:label "has aircraft" .
Place in your use case: use_cases/<name>/ontology/<name>.ttl
Step 2: Load Ontology to Graph¶
This creates a graph containing: - Nodes: One per ontology class (with embeddings) - Edges: SUBCLASS_OF relationships preserving hierarchy - Metadata: URI, labels, comments
Verify with:
Step 3: Build Knowledge Graph with Ontology¶
aletheia build-knowledge-graph \
--use-case aviation_safety \
--knowledge-graph aviation_safety \
--schema-mode graph-hybrid \
--ontology-graph aviation_ontology
The --ontology-graph parameter tells Aletheia where to find the ontology for alignment.
Step 4: Verify Alignment¶
Check that extracted entities align with ontology types:
redis-cli GRAPH.QUERY aviation_safety \
"MATCH (n:Entity) RETURN labels(n), count(*) ORDER BY count(*) DESC"
Ontology Loading Details¶
What Gets Loaded¶
From a TTL file, Aletheia extracts:
| OWL Element | Graph Representation |
|---|---|
owl:Class | Entity node with ontology_type |
rdfs:subClassOf | SUBCLASS_OF edge |
rdfs:label | Node name property |
rdfs:comment | Node summary property |
| Class URI | Node uri property |
Classification¶
The GenericOntologyLoader classifies each class using transitive ancestry:
| Type | Criteria | Example |
|---|---|---|
class | Concrete entity type | Runway, Engine |
abstract_class | Has subclasses, or matches known abstract patterns | Thing, LegalEntity |
relationship_class | Any ancestor is Interval (checked transitively) | Ownership, Interest |
For multi-level hierarchies (e.g., Ownership → Interest → Interval), the loader checks the full ancestry chain. A ModelingProfile can provide explicit classification hints to override heuristics.
Embeddings¶
Each ontology class gets an embedding vector based on: - Class label (rdfs:label) - Class description (rdfs:comment)
This enables semantic matching during extraction:
Extracted: "airplane" → Nearest ontology class: Aircraft (0.92 similarity)
Extracted: "carrier" → Nearest ontology class: Operator (0.88 similarity)
Supported Ontology Formats¶
Turtle (.ttl)¶
Primary format, recommended:
RDF/XML (.rdf, .owl)¶
Supported via rdflib conversion:
FollowTheMoney¶
Special handling for FTM YAML schemas:
Configuration Options¶
Environment Variables¶
# LLM for ontology analysis
ALETHEIA_REASONING_MODEL=o3-mini # Complex analysis
ALETHEIA_FAST_MODEL=gpt-4o-mini # Extraction
# Database
FALKORDB_HOST=localhost
FALKORDB_PORT=6379
Ontology Loader Options¶
The GenericOntologyLoader accepts:
| Parameter | Description | Default |
|---|---|---|
skip_value_classes | Skip enumeration classes | True |
include_properties | Load property definitions | False |
max_hierarchy_depth | Limit subclass traversal | 10 |
Troubleshooting¶
"No ontology classes found"¶
Cause: TTL file doesn't use owl:Class declarations.
Fix: Ensure classes are declared:
"Entities not aligning to ontology"¶
Cause: Ontology graph not specified or embedding mismatch.
Fix: 1. Verify ontology graph exists: redis-cli GRAPH.LIST 2. Use same embedding model for ontology and knowledge graph 3. Check --ontology-graph parameter is correct
"Too many abstract classes"¶
Cause: Classification heuristics too aggressive.
Fix: Create a ModelingProfile in your use case's ontology directory to explicitly classify problematic classes. The profile provides hints that override the heuristic rules.
Next Steps¶
- Schema Inference Modes - All extraction strategies
- ECCAIRS Ontology - Aviation safety example
- FTM Ontology - Sanctions/investigations example