Skip to content

Ontology-First Mode

The ontology-first mode treats the ontology as the authoritative source while allowing an LLM to discover additional patterns. All ontology concepts are preserved in the schema — nothing is lost because the LLM failed to discover it from samples.

Overview

Aspect Value
Ontology Required Yes (loaded into graph)
LLM Calls for Schema Optional (enhancement only)
Type Consistency Excellent
Setup Time High (ontology graph required)
Best For Complete ontologies where you can't afford to lose concepts

Why Ontology-First?

The Graph-Hybrid Problem

In graph-hybrid mode, the LLM discovers types from sample data, then aligns them with the ontology. If the LLM doesn't discover a type, it won't appear in the schema:

Ontology defines: Person, Organization, Sanction, HAS_ALIAS, OWNS, MEMBER_OF

LLM discovers (from samples):
  - Person ✓
  - Organization ✓
  - Sanction ✓
  - HAS_ALIAS ✓
  - OWNS ✗ (not in samples)
  - MEMBER_OF ✗ (not in samples)

Graph-hybrid result:
  Missing: OWNS, MEMBER_OF  ← Lost because LLM didn't find them

The Ontology-First Solution

ontology-first loads all ontology concepts first, then optionally enhances with the LLM:

Phase 1: Load ALL from ontology
  - Person, Organization, Sanction, HAS_ALIAS, OWNS, MEMBER_OF

Phase 2: LLM enhancement (optional)
  - Discovers: CustomType (not in ontology)

Phase 3: Merge
  - Person, Organization, Sanction, HAS_ALIAS, OWNS, MEMBER_OF, CustomType
  ← All ontology concepts preserved + LLM additions

Phase 4: Consolidation
  - LLM reviews final schema for redundancies
  ← Ontology types protected from removal

Pipeline

graph TD
    A[Phase 1: Load Ontology Schema] --> B[Phase 2: LLM Enhancement]
    B --> C[Phase 3: Merge + Data-Driven Pruning]
    C --> D[Phase 4: Consolidation]

    A -- "Reified classes<br/>Non-reified object properties<br/>Datatype properties" --> A
    B -- "Patterns not in ontology<br/>Directive hints prevent duplicates" --> B
    C -- "Remove types absent from data<br/>Protect extensions + non-reified" --> C
    D -- "LLM coherence review<br/>Protect ontology types" --> D

Prerequisites

# Load ontology into graph (once)
aletheia build-ontology-graph \
  --use-case my_case \
  --knowledge-graph my_ontology

Usage

aletheia build-knowledge-graph \
  --use-case my_case \
  --knowledge-graph my_graph \
  --schema-mode ontology-first \
  --ontology-graph my_ontology

Phase 1: Load Ontology Schema

Phase 1 extracts the complete type system from the ontology. Unlike graph-hybrid (which only gets what the LLM finds), ontology-first loads everything.

Entity Types

The ontology loader classifies each OWL/RDFS class using transitive ancestry:

Classification Rule Result
Entity class Concrete class (no subclasses, not a known abstract pattern) Becomes an entity type
Relationship class Any ancestor is Interval (checked transitively) Becomes an edge type
Abstract class Has subclasses, or matches known patterns (Thing, LegalEntity) Excluded from schema

Transitive ancestry matters for multi-level hierarchies. For example, Ownership → Interest → Interval — the loader checks the full chain, not just direct parents.

Relationship Types

Phase 1 extracts relationships from two sources:

Reified relationship classes — OWL classes that model relationships as entities (e.g., Ownership connects an owner to an asset). These are common in FTM ontologies.

Non-reified object properties — Direct object properties between entity classes (e.g., locatedIn linking Airport to Country). These are extracted via get_non_reified_relationships():

Derivation Rule Example Result
ModelingProfile override Explicit mapping in profile Custom name
Multi-word label "located in" LOCATED_IN
camelCase property addressEntity HAS_ADDRESS
Fallback foo HAS_FOO

When both directions of a relationship are declared (via owl:inverseOf), the loader keeps the more general domain and discards the inverse to avoid duplicate edge types.

Enriched Entity Type Docstrings

Each entity type's docstring is enriched with property metadata from the ontology:

Aircraft — A fixed-wing or rotary-wing aircraft involved in an occurrence.
Key attributes: registration (Aircraft registration mark), icaoCode (ICAO aircraft type designator)

These enriched docstrings flow to Graphiti's extraction AND deduplication prompts, giving the LLM concrete signal about what attributes to look for.

What Gets Loaded

From Ontology Included
Entity classes (owl:Class) Yes
Reified relationship classes Converted to edge types
Non-reified object properties Converted to edge types
Datatype properties Yes (as entity properties)
Class hierarchy Yes (for classification)
Abstract classes Filtered out

Phase 2: LLM Enhancement

The LLM analyzes sample data to find patterns not in the ontology. This phase is optional but catches edge cases the ontology doesn't cover.

What the LLM Sees

The prompt provides:

  • Sample data from the parser
  • A list of committed relationships — ontology-derived types that the LLM must not duplicate

Exclusion Rules

The LLM's discoveries are filtered to prevent overlap with the ontology:

Excluded Reason
Ontology entity types Already loaded in Phase 1
Abstract classes Not concrete types
Relationship classes Already converted to edge types
Verb-form duplicates e.g., OWNED_BY when Ownership already exists

Directive Hints

Phase 2 shows the LLM all committed relationship types from Phase 1 with an explicit directive: do not create duplicates. This solves a key problem — without directive hints, the LLM often generates HAS_OPERATOR alongside the ontology's existing OPERATED_BY, producing redundant types.

Phase 3: Merge and Data-Driven Pruning

Phase 3 combines the ontology base with LLM discoveries, then prunes the schema against actual data.

Merge Rules

  1. Ontology concepts are authoritative — never replaced by LLM discoveries
  2. LLM can only add — new types that don't exist in the ontology
  3. Duplicates resolve to the ontology version — if the LLM finds "Person", the ontology's "Person" wins

Reconciliation Against Non-Reified Types

Before merging, LLM-discovered relationship types are checked against non-reified ontology types. If an LLM discovery matches an existing non-reified type (by target entity or name root), it is discarded.

Data-Driven Pruning

After merging, the schema is pruned against the parser's schema_distribution — a map of entity types actually present in the data:

Entity pruning: Types not matching any key in schema_distribution are removed.

Relationship pruning: Types whose source_class isn't in the data are removed. Three categories are protected from pruning:

Protected Category Why
LLM-discovered types (no source_class) May represent patterns not tied to a single ontology class
Extension types (from_extension=True) Defined in ontology extension files
Non-reified types (from_non_reified=True) Derived from object properties, not class presence

Phase 4: Consolidation

The common final step for all schema modes. An LLM reviews the complete schema for coherence:

  • Merges semantically similar types
  • Removes over-specialized types
  • Normalizes naming inconsistencies

Ontology-derived and extension types are protected from removal — the LLM can merge LLM-discovered types but cannot delete anything that came from the ontology.

Deduplication Against Non-Reified Types

During consolidation, relationship types are checked for overlap with non-reified types using two heuristics:

  1. Target entity match — case-insensitive comparison of target entity names
  2. Name root match — strip common affixes (HAS_, IS_, _OF, _BY) and compare roots

Edge Type Map

The generated schema includes an edge type map that controls which relationship types are valid between entity pairs. All types are placed in a ("Entity", "Entity") catch-all entry, ensuring valid types are never rejected because the source/target labels don't match exact pairs.

Edge docstrings use an "e.g.," prefix for source/target examples so the LLM treats them as guidance rather than strict constraints.

Pros and Cons

Advantages

  • All ontology concepts preserved — nothing from the ontology is lost
  • Expert knowledge retained — domain model takes precedence
  • LLM augmentation — can still discover new patterns
  • Data-driven focus — pruning removes irrelevant types
  • Predictable — you know exactly what the ontology provides

Disadvantages

  • Requires complete ontology — works best with well-maintained ontologies
  • Larger initial schemas — before pruning, all ontology classes are included
  • Less LLM flexibility — ontology constrains what the LLM can discover

When to Use

  • Can't afford to lose concepts — ontology types must be in schema
  • Complete ontology exists — comprehensive domain model available
  • HAS_ALIAS problem — relationship types not discovered by LLM from samples
  • Regulatory compliance — schema must match a formal specification

When NOT to Use

  • Ontology is incomplete — use graph-hybrid for better alignment
  • No ontology — use llm mode
  • LLM discovery is primary goal — use llm or graph-hybrid

Comparison: Ontology Modes

Aspect ontology ontology-first graph-hybrid
Primary source Ontology only Ontology LLM
LLM involvement None Enhancement Discovery + alignment
All ontology concepts Yes Yes Only if LLM discovers them
Discovers new types No Yes Yes
Data-driven pruning No Yes Yes (property enrichment)
Non-reified relationships No Yes No
Setup complexity Medium High High

Example: Aviation Domain

# Load aviation ontology
aletheia build-ontology-graph \
  --use-case aviation_safety \
  --knowledge-graph aviation_ontology

# Build with ontology-first
aletheia build-knowledge-graph \
  --use-case aviation_safety \
  --knowledge-graph aviation_graph \
  --schema-mode ontology-first \
  --ontology-graph aviation_ontology

Ontology provides: Occurrence, Aircraft, Airport, Operator, OCCURRED_AT, INVOLVED_AIRCRAFT, OPERATED_BY

LLM might discover: WeatherCondition (not in ontology), HAS_WEATHER (new relationship)

Final schema: Complete aviation ontology + weather concepts, pruned to types present in data.