Skip to content

Aletheia Ontology Integration

Aletheia uses ontologies to guide entity and relationship extraction, ensuring consistency and domain alignment. This page explains how the integration works.

How Ontologies Improve Extraction

Without Ontology

The LLM decides entity types based on its training:

Text: "The Boeing 737-800 operated by Lufthansa..."

LLM extraction (unpredictable):
- Entity: "Boeing 737-800" (type: "airplane" or "aircraft" or "vehicle"?)
- Entity: "Lufthansa" (type: "company" or "airline" or "operator"?)
- Relationship: "operated by" or "flown by" or "belongs to"?

With Ontology

The ontology constrains and guides the LLM:

Ontology defines:
- Aircraft (with properties: type, registration)
- Operator (with properties: name, country)
- HAS_OPERATOR relationship

LLM extraction (consistent):
- Entity: "Boeing 737-800" (type: Aircraft)
- Entity: "Lufthansa" (type: Operator)
- Relationship: HAS_OPERATOR

Integration Approaches

Aletheia supports multiple ways to use ontologies:

1. Schema Modes

Mode Ontology Usage Best For
none No schema Quick exploration
llm / inference LLM infers from samples Unknown domains
ontology Strict ontology adherence Formal compliance
hybrid LLM + ontology validation (string matching) Balanced approach
graph-hybrid LLM-first + semantic alignment via ontology graph Recommended for FTM
ontology-first Ontology primary, LLM enhancement, data-driven pruning Complete ontologies

All modes except none apply a Phase 4 consolidation step where an LLM reviews the final schema for redundancies and naming inconsistencies. Ontology-derived types are protected from removal.

See Schema Inference for detailed explanations.

This mode stores the ontology in the graph database and uses semantic similarity to align extracted entities:

┌─────────────────┐     ┌──────────────────┐
│  Ontology TTL   │────►│  Ontology Graph  │
│  (eccairs.ttl)  │     │  (FalkorDB)      │
└─────────────────┘     └────────┬─────────┘
                                 │ Semantic
                                 │ Alignment
┌─────────────────┐     ┌──────────────────┐
│  Source Text    │────►│  Knowledge Graph │
│  (incidents)    │     │  (FalkorDB)      │
└─────────────────┘     └──────────────────┘

Advantages: - Ontology concepts have embeddings for semantic matching - "plane" aligns to Aircraft even without exact match - Supports ontology evolution without re-ingestion

Step-by-Step Integration

Step 1: Prepare Ontology File

Ontologies should be in Turtle (.ttl) format:

@prefix ex: <http://example.org/aviation#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:Occurrence a owl:Class ;
    rdfs:label "Occurrence" ;
    rdfs:comment "An aviation safety incident or accident" .

ex:Aircraft a owl:Class ;
    rdfs:label "Aircraft" ;
    rdfs:subClassOf ex:Occurrence ;
    rdfs:comment "An aircraft involved in an occurrence" .

ex:hasAircraft a owl:ObjectProperty ;
    rdfs:domain ex:Occurrence ;
    rdfs:range ex:Aircraft ;
    rdfs:label "has aircraft" .

Place in your use case: use_cases/<name>/ontology/<name>.ttl

Step 2: Load Ontology to Graph

aletheia build-ontology-graph \
  --use-case aviation_safety \
  --knowledge-graph aviation_ontology

This creates a graph containing: - Nodes: One per ontology class (with embeddings) - Edges: SUBCLASS_OF relationships preserving hierarchy - Metadata: URI, labels, comments

Verify with:

redis-cli GRAPH.QUERY aviation_ontology \
  "MATCH (n) RETURN n.name, n.ontology_type LIMIT 10"

Step 3: Build Knowledge Graph with Ontology

aletheia build-knowledge-graph \
  --use-case aviation_safety \
  --knowledge-graph aviation_safety \
  --schema-mode graph-hybrid \
  --ontology-graph aviation_ontology

The --ontology-graph parameter tells Aletheia where to find the ontology for alignment.

Step 4: Verify Alignment

Check that extracted entities align with ontology types:

redis-cli GRAPH.QUERY aviation_safety \
  "MATCH (n:Entity) RETURN labels(n), count(*) ORDER BY count(*) DESC"

Ontology Loading Details

What Gets Loaded

From a TTL file, Aletheia extracts:

OWL Element Graph Representation
owl:Class Entity node with ontology_type
rdfs:subClassOf SUBCLASS_OF edge
rdfs:label Node name property
rdfs:comment Node summary property
Class URI Node uri property

Classification

The GenericOntologyLoader classifies each class using transitive ancestry:

Type Criteria Example
class Concrete entity type Runway, Engine
abstract_class Has subclasses, or matches known abstract patterns Thing, LegalEntity
relationship_class Any ancestor is Interval (checked transitively) Ownership, Interest

For multi-level hierarchies (e.g., Ownership → Interest → Interval), the loader checks the full ancestry chain. A ModelingProfile can provide explicit classification hints to override heuristics.

Embeddings

Each ontology class gets an embedding vector based on: - Class label (rdfs:label) - Class description (rdfs:comment)

This enables semantic matching during extraction:

Extracted: "airplane"  →  Nearest ontology class: Aircraft (0.92 similarity)
Extracted: "carrier"   →  Nearest ontology class: Operator (0.88 similarity)

Supported Ontology Formats

Turtle (.ttl)

Primary format, recommended:

@prefix ex: <http://example.org/> .
ex:Person a owl:Class .

RDF/XML (.rdf, .owl)

Supported via rdflib conversion:

<owl:Class rdf:about="http://example.org/Person"/>

FollowTheMoney

Special handling for FTM YAML schemas:

Person:
  extends: Thing
  properties:
    - name
    - birthDate

Configuration Options

Environment Variables

# LLM for ontology analysis
ALETHEIA_REASONING_MODEL=o3-mini  # Complex analysis
ALETHEIA_FAST_MODEL=gpt-4o-mini   # Extraction

# Database
FALKORDB_HOST=localhost
FALKORDB_PORT=6379

Ontology Loader Options

The GenericOntologyLoader accepts:

Parameter Description Default
skip_value_classes Skip enumeration classes True
include_properties Load property definitions False
max_hierarchy_depth Limit subclass traversal 10

Troubleshooting

"No ontology classes found"

Cause: TTL file doesn't use owl:Class declarations.

Fix: Ensure classes are declared:

ex:MyClass a owl:Class .  # Required

"Entities not aligning to ontology"

Cause: Ontology graph not specified or embedding mismatch.

Fix: 1. Verify ontology graph exists: redis-cli GRAPH.LIST 2. Use same embedding model for ontology and knowledge graph 3. Check --ontology-graph parameter is correct

"Too many abstract classes"

Cause: Classification heuristics too aggressive.

Fix: Create a ModelingProfile in your use case's ontology directory to explicitly classify problematic classes. The profile provides hints that override the heuristic rules.

Next Steps