Ontology-First Mode¶

The ontology-first mode treats the ontology as the authoritative source while allowing an LLM to discover additional patterns. All ontology concepts are preserved in the schema — nothing is lost because the LLM failed to discover it from samples.

Overview¶

Aspect	Value
Ontology Required	Yes (loaded into graph)
LLM Calls for Schema	Optional (enhancement only)
Type Consistency	Excellent
Setup Time	High (ontology graph required)
Best For	Complete ontologies where you can't afford to lose concepts

Why Ontology-First?¶

The Graph-Hybrid Problem¶

In graph-hybrid mode, the LLM discovers types from sample data, then aligns them with the ontology. If the LLM doesn't discover a type, it won't appear in the schema:

Ontology defines: Person, Organization, Sanction, HAS_ALIAS, OWNS, MEMBER_OF

LLM discovers (from samples):
  - Person ✓
  - Organization ✓
  - Sanction ✓
  - HAS_ALIAS ✓
  - OWNS ✗ (not in samples)
  - MEMBER_OF ✗ (not in samples)

Graph-hybrid result:
  Missing: OWNS, MEMBER_OF  ← Lost because LLM didn't find them

The Ontology-First Solution¶

ontology-first loads all ontology concepts first, then optionally enhances with the LLM:

Phase 1: Load ALL from ontology
  - Person, Organization, Sanction, HAS_ALIAS, OWNS, MEMBER_OF

Phase 2: LLM enhancement (optional)
  - Discovers: CustomType (not in ontology)

Phase 3: Merge
  - Person, Organization, Sanction, HAS_ALIAS, OWNS, MEMBER_OF, CustomType
  ← All ontology concepts preserved + LLM additions

Phase 4: Consolidation
  - LLM reviews final schema for redundancies
  ← Ontology types protected from removal

Pipeline¶

graph TD
    A[Phase 1: Load Ontology Schema] --> B[Phase 2: LLM Enhancement]
    B --> C[Phase 3: Merge + Data-Driven Pruning]
    C --> D[Phase 4: Consolidation]

    A -- "Reified classes<br/>Non-reified object properties<br/>Datatype properties" --> A
    B -- "Patterns not in ontology<br/>Directive hints prevent duplicates" --> B
    C -- "Remove types absent from data<br/>Protect extensions + non-reified" --> C
    D -- "LLM coherence review<br/>Protect ontology types" --> D

Prerequisites¶

# Load ontology into graph (once)
aletheia build-ontology-graph \
  --use-case my_case \
  --knowledge-graph my_ontology

Usage¶

aletheia build-knowledge-graph \
  --use-case my_case \
  --knowledge-graph my_graph \
  --schema-mode ontology-first \
  --ontology-graph my_ontology

Phase 1: Load Ontology Schema¶

Phase 1 extracts the complete type system from the ontology. Unlike graph-hybrid (which only gets what the LLM finds), ontology-first loads everything.

Entity Types¶

The ontology loader classifies each OWL/RDFS class using transitive ancestry:

Classification	Rule	Result
Entity class	Concrete class (no subclasses, not a known abstract pattern)	Becomes an entity type
Relationship class	Any ancestor is `Interval` (checked transitively)	Becomes an edge type
Abstract class	Has subclasses, or matches known patterns (`Thing`, `LegalEntity`)	Excluded from schema

Transitive ancestry matters for multi-level hierarchies. For example, Ownership → Interest → Interval — the loader checks the full chain, not just direct parents.

Relationship Types¶

Phase 1 extracts relationships from two sources:

Reified relationship classes — OWL classes that model relationships as entities (e.g., Ownership connects an owner to an asset). These are common in FTM ontologies.

Non-reified object properties — Direct object properties between entity classes (e.g., locatedIn linking Airport to Country). These are extracted via get_non_reified_relationships():

Derivation Rule	Example	Result
ModelingProfile override	Explicit mapping in profile	Custom name
Multi-word label	"located in"	`LOCATED_IN`
camelCase property	`addressEntity`	`HAS_ADDRESS`
Fallback	`foo`	`HAS_FOO`

When both directions of a relationship are declared (via owl:inverseOf), the loader keeps the more general domain and discards the inverse to avoid duplicate edge types.

Enriched Entity Type Docstrings¶

Each entity type's docstring is enriched with property metadata from the ontology:

Aircraft — A fixed-wing or rotary-wing aircraft involved in an occurrence.
Key attributes: registration (Aircraft registration mark), icaoCode (ICAO aircraft type designator)

These enriched docstrings flow to Graphiti's extraction AND deduplication prompts, giving the LLM concrete signal about what attributes to look for.

What Gets Loaded¶

From Ontology	Included
Entity classes (`owl:Class`)	Yes
Reified relationship classes	Converted to edge types
Non-reified object properties	Converted to edge types
Datatype properties	Yes (as entity properties)
Class hierarchy	Yes (for classification)
Abstract classes	Filtered out

Phase 2: LLM Enhancement¶

The LLM analyzes sample data to find patterns not in the ontology. This phase is optional but catches edge cases the ontology doesn't cover.

What the LLM Sees¶

The prompt provides:

Sample data from the parser
A list of committed relationships — ontology-derived types that the LLM must not duplicate

Exclusion Rules¶

The LLM's discoveries are filtered to prevent overlap with the ontology:

Excluded	Reason
Ontology entity types	Already loaded in Phase 1
Abstract classes	Not concrete types
Relationship classes	Already converted to edge types
Verb-form duplicates	e.g., `OWNED_BY` when `Ownership` already exists

Directive Hints¶

Phase 2 shows the LLM all committed relationship types from Phase 1 with an explicit directive: do not create duplicates. This solves a key problem — without directive hints, the LLM often generates HAS_OPERATOR alongside the ontology's existing OPERATED_BY, producing redundant types.

Phase 3: Merge and Data-Driven Pruning¶

Phase 3 combines the ontology base with LLM discoveries, then prunes the schema against actual data.

Merge Rules¶

Ontology concepts are authoritative — never replaced by LLM discoveries
LLM can only add — new types that don't exist in the ontology
Duplicates resolve to the ontology version — if the LLM finds "Person", the ontology's "Person" wins

Reconciliation Against Non-Reified Types¶

Before merging, LLM-discovered relationship types are checked against non-reified ontology types. If an LLM discovery matches an existing non-reified type (by target entity or name root), it is discarded.

Data-Driven Pruning¶

After merging, the schema is pruned against the parser's schema_distribution — a map of entity types actually present in the data:

Entity pruning: Types not matching any key in schema_distribution are removed.

Relationship pruning: Types whose source_class isn't in the data are removed. Three categories are protected from pruning:

Protected Category	Why
LLM-discovered types (no `source_class`)	May represent patterns not tied to a single ontology class
Extension types (`from_extension=True`)	Defined in ontology extension files
Non-reified types (`from_non_reified=True`)	Derived from object properties, not class presence

Phase 4: Consolidation¶

The common final step for all schema modes. An LLM reviews the complete schema for coherence:

Merges semantically similar types
Removes over-specialized types
Normalizes naming inconsistencies

Ontology-derived and extension types are protected from removal — the LLM can merge LLM-discovered types but cannot delete anything that came from the ontology.

Deduplication Against Non-Reified Types¶

During consolidation, relationship types are checked for overlap with non-reified types using two heuristics:

Target entity match — case-insensitive comparison of target entity names
Name root match — strip common affixes (HAS_, IS_, _OF, _BY) and compare roots

Edge Type Map¶

The generated schema includes an edge type map that controls which relationship types are valid between entity pairs. All types are placed in a ("Entity", "Entity") catch-all entry, ensuring valid types are never rejected because the source/target labels don't match exact pairs.

Edge docstrings use an "e.g.," prefix for source/target examples so the LLM treats them as guidance rather than strict constraints.

Pros and Cons¶

Advantages¶

All ontology concepts preserved — nothing from the ontology is lost
Expert knowledge retained — domain model takes precedence
LLM augmentation — can still discover new patterns
Data-driven focus — pruning removes irrelevant types
Predictable — you know exactly what the ontology provides

Disadvantages¶

Requires complete ontology — works best with well-maintained ontologies
Larger initial schemas — before pruning, all ontology classes are included
Less LLM flexibility — ontology constrains what the LLM can discover

When to Use¶

Can't afford to lose concepts — ontology types must be in schema
Complete ontology exists — comprehensive domain model available
HAS_ALIAS problem — relationship types not discovered by LLM from samples
Regulatory compliance — schema must match a formal specification

When NOT to Use¶

Ontology is incomplete — use graph-hybrid for better alignment
No ontology — use llm mode
LLM discovery is primary goal — use llm or graph-hybrid

Comparison: Ontology Modes¶

Aspect	ontology	ontology-first	graph-hybrid
Primary source	Ontology only	Ontology	LLM
LLM involvement	None	Enhancement	Discovery + alignment
All ontology concepts	Yes	Yes	Only if LLM discovers them
Discovers new types	No	Yes	Yes
Data-driven pruning	No	Yes	Yes (property enrichment)
Non-reified relationships	No	Yes	No
Setup complexity	Medium	High	High

Example: Aviation Domain¶

# Load aviation ontology
aletheia build-ontology-graph \
  --use-case aviation_safety \
  --knowledge-graph aviation_ontology

# Build with ontology-first
aletheia build-knowledge-graph \
  --use-case aviation_safety \
  --knowledge-graph aviation_graph \
  --schema-mode ontology-first \
  --ontology-graph aviation_ontology

Ontology provides: Occurrence, Aircraft, Airport, Operator, OCCURRED_AT, INVOLVED_AIRCRAFT, OPERATED_BY

LLM might discover: WeatherCondition (not in ontology), HAS_WEATHER (new relationship)

Final schema: Complete aviation ontology + weather concepts, pruned to types present in data.

Ontology Mode — Pure ontology, no LLM
Graph-Hybrid Mode — LLM primary with alignment
Overview — Comparison of all modes