Ontology-First Mode¶
The ontology-first mode treats the ontology as the authoritative source while allowing an LLM to discover additional patterns. All ontology concepts are preserved in the schema — nothing is lost because the LLM failed to discover it from samples.
Overview¶
| Aspect | Value |
|---|---|
| Ontology Required | Yes (loaded into graph) |
| LLM Calls for Schema | Optional (enhancement only) |
| Type Consistency | Excellent |
| Setup Time | High (ontology graph required) |
| Best For | Complete ontologies where you can't afford to lose concepts |
Why Ontology-First?¶
The Graph-Hybrid Problem¶
In graph-hybrid mode, the LLM discovers types from sample data, then aligns them with the ontology. If the LLM doesn't discover a type, it won't appear in the schema:
Ontology defines: Person, Organization, Sanction, HAS_ALIAS, OWNS, MEMBER_OF
LLM discovers (from samples):
- Person ✓
- Organization ✓
- Sanction ✓
- HAS_ALIAS ✓
- OWNS ✗ (not in samples)
- MEMBER_OF ✗ (not in samples)
Graph-hybrid result:
Missing: OWNS, MEMBER_OF ← Lost because LLM didn't find them
The Ontology-First Solution¶
ontology-first loads all ontology concepts first, then optionally enhances with the LLM:
Phase 1: Load ALL from ontology
- Person, Organization, Sanction, HAS_ALIAS, OWNS, MEMBER_OF
Phase 2: LLM enhancement (optional)
- Discovers: CustomType (not in ontology)
Phase 3: Merge
- Person, Organization, Sanction, HAS_ALIAS, OWNS, MEMBER_OF, CustomType
← All ontology concepts preserved + LLM additions
Phase 4: Consolidation
- LLM reviews final schema for redundancies
← Ontology types protected from removal
Pipeline¶
graph TD
A[Phase 1: Load Ontology Schema] --> B[Phase 2: LLM Enhancement]
B --> C[Phase 3: Merge + Data-Driven Pruning]
C --> D[Phase 4: Consolidation]
A -- "Reified classes<br/>Non-reified object properties<br/>Datatype properties" --> A
B -- "Patterns not in ontology<br/>Directive hints prevent duplicates" --> B
C -- "Remove types absent from data<br/>Protect extensions + non-reified" --> C
D -- "LLM coherence review<br/>Protect ontology types" --> D Prerequisites¶
# Load ontology into graph (once)
aletheia build-ontology-graph \
--use-case my_case \
--knowledge-graph my_ontology
Usage¶
aletheia build-knowledge-graph \
--use-case my_case \
--knowledge-graph my_graph \
--schema-mode ontology-first \
--ontology-graph my_ontology
Phase 1: Load Ontology Schema¶
Phase 1 extracts the complete type system from the ontology. Unlike graph-hybrid (which only gets what the LLM finds), ontology-first loads everything.
Entity Types¶
The ontology loader classifies each OWL/RDFS class using transitive ancestry:
| Classification | Rule | Result |
|---|---|---|
| Entity class | Concrete class (no subclasses, not a known abstract pattern) | Becomes an entity type |
| Relationship class | Any ancestor is Interval (checked transitively) | Becomes an edge type |
| Abstract class | Has subclasses, or matches known patterns (Thing, LegalEntity) | Excluded from schema |
Transitive ancestry matters for multi-level hierarchies. For example, Ownership → Interest → Interval — the loader checks the full chain, not just direct parents.
Relationship Types¶
Phase 1 extracts relationships from two sources:
Reified relationship classes — OWL classes that model relationships as entities (e.g., Ownership connects an owner to an asset). These are common in FTM ontologies.
Non-reified object properties — Direct object properties between entity classes (e.g., locatedIn linking Airport to Country). These are extracted via get_non_reified_relationships():
| Derivation Rule | Example | Result |
|---|---|---|
| ModelingProfile override | Explicit mapping in profile | Custom name |
| Multi-word label | "located in" | LOCATED_IN |
| camelCase property | addressEntity | HAS_ADDRESS |
| Fallback | foo | HAS_FOO |
When both directions of a relationship are declared (via owl:inverseOf), the loader keeps the more general domain and discards the inverse to avoid duplicate edge types.
Enriched Entity Type Docstrings¶
Each entity type's docstring is enriched with property metadata from the ontology:
Aircraft — A fixed-wing or rotary-wing aircraft involved in an occurrence.
Key attributes: registration (Aircraft registration mark), icaoCode (ICAO aircraft type designator)
These enriched docstrings flow to Graphiti's extraction AND deduplication prompts, giving the LLM concrete signal about what attributes to look for.
What Gets Loaded¶
| From Ontology | Included |
|---|---|
Entity classes (owl:Class) | Yes |
| Reified relationship classes | Converted to edge types |
| Non-reified object properties | Converted to edge types |
| Datatype properties | Yes (as entity properties) |
| Class hierarchy | Yes (for classification) |
| Abstract classes | Filtered out |
Phase 2: LLM Enhancement¶
The LLM analyzes sample data to find patterns not in the ontology. This phase is optional but catches edge cases the ontology doesn't cover.
What the LLM Sees¶
The prompt provides:
- Sample data from the parser
- A list of committed relationships — ontology-derived types that the LLM must not duplicate
Exclusion Rules¶
The LLM's discoveries are filtered to prevent overlap with the ontology:
| Excluded | Reason |
|---|---|
| Ontology entity types | Already loaded in Phase 1 |
| Abstract classes | Not concrete types |
| Relationship classes | Already converted to edge types |
| Verb-form duplicates | e.g., OWNED_BY when Ownership already exists |
Directive Hints¶
Phase 2 shows the LLM all committed relationship types from Phase 1 with an explicit directive: do not create duplicates. This solves a key problem — without directive hints, the LLM often generates HAS_OPERATOR alongside the ontology's existing OPERATED_BY, producing redundant types.
Phase 3: Merge and Data-Driven Pruning¶
Phase 3 combines the ontology base with LLM discoveries, then prunes the schema against actual data.
Merge Rules¶
- Ontology concepts are authoritative — never replaced by LLM discoveries
- LLM can only add — new types that don't exist in the ontology
- Duplicates resolve to the ontology version — if the LLM finds "Person", the ontology's "Person" wins
Reconciliation Against Non-Reified Types¶
Before merging, LLM-discovered relationship types are checked against non-reified ontology types. If an LLM discovery matches an existing non-reified type (by target entity or name root), it is discarded.
Data-Driven Pruning¶
After merging, the schema is pruned against the parser's schema_distribution — a map of entity types actually present in the data:
Entity pruning: Types not matching any key in schema_distribution are removed.
Relationship pruning: Types whose source_class isn't in the data are removed. Three categories are protected from pruning:
| Protected Category | Why |
|---|---|
LLM-discovered types (no source_class) | May represent patterns not tied to a single ontology class |
Extension types (from_extension=True) | Defined in ontology extension files |
Non-reified types (from_non_reified=True) | Derived from object properties, not class presence |
Phase 4: Consolidation¶
The common final step for all schema modes. An LLM reviews the complete schema for coherence:
- Merges semantically similar types
- Removes over-specialized types
- Normalizes naming inconsistencies
Ontology-derived and extension types are protected from removal — the LLM can merge LLM-discovered types but cannot delete anything that came from the ontology.
Deduplication Against Non-Reified Types¶
During consolidation, relationship types are checked for overlap with non-reified types using two heuristics:
- Target entity match — case-insensitive comparison of target entity names
- Name root match — strip common affixes (
HAS_,IS_,_OF,_BY) and compare roots
Edge Type Map¶
The generated schema includes an edge type map that controls which relationship types are valid between entity pairs. All types are placed in a ("Entity", "Entity") catch-all entry, ensuring valid types are never rejected because the source/target labels don't match exact pairs.
Edge docstrings use an "e.g.," prefix for source/target examples so the LLM treats them as guidance rather than strict constraints.
Pros and Cons¶
Advantages¶
- All ontology concepts preserved — nothing from the ontology is lost
- Expert knowledge retained — domain model takes precedence
- LLM augmentation — can still discover new patterns
- Data-driven focus — pruning removes irrelevant types
- Predictable — you know exactly what the ontology provides
Disadvantages¶
- Requires complete ontology — works best with well-maintained ontologies
- Larger initial schemas — before pruning, all ontology classes are included
- Less LLM flexibility — ontology constrains what the LLM can discover
When to Use¶
- Can't afford to lose concepts — ontology types must be in schema
- Complete ontology exists — comprehensive domain model available
- HAS_ALIAS problem — relationship types not discovered by LLM from samples
- Regulatory compliance — schema must match a formal specification
When NOT to Use¶
- Ontology is incomplete — use
graph-hybridfor better alignment - No ontology — use
llmmode - LLM discovery is primary goal — use
llmorgraph-hybrid
Comparison: Ontology Modes¶
| Aspect | ontology | ontology-first | graph-hybrid |
|---|---|---|---|
| Primary source | Ontology only | Ontology | LLM |
| LLM involvement | None | Enhancement | Discovery + alignment |
| All ontology concepts | Yes | Yes | Only if LLM discovers them |
| Discovers new types | No | Yes | Yes |
| Data-driven pruning | No | Yes | Yes (property enrichment) |
| Non-reified relationships | No | Yes | No |
| Setup complexity | Medium | High | High |
Example: Aviation Domain¶
# Load aviation ontology
aletheia build-ontology-graph \
--use-case aviation_safety \
--knowledge-graph aviation_ontology
# Build with ontology-first
aletheia build-knowledge-graph \
--use-case aviation_safety \
--knowledge-graph aviation_graph \
--schema-mode ontology-first \
--ontology-graph aviation_ontology
Ontology provides: Occurrence, Aircraft, Airport, Operator, OCCURRED_AT, INVOLVED_AIRCRAFT, OPERATED_BY
LLM might discover: WeatherCondition (not in ontology), HAS_WEATHER (new relationship)
Final schema: Complete aviation ontology + weather concepts, pruned to types present in data.
Related¶
- Ontology Mode — Pure ontology, no LLM
- Graph-Hybrid Mode — LLM primary with alignment
- Overview — Comparison of all modes