Schema Inference¶
This section provides comprehensive documentation on how Aletheia infers and manages schemas for knowledge graph construction.

What is Schema Inference?¶
When building a knowledge graph, the LLM extracts entities and relationships from text. Schema inference determines what vocabulary of types the LLM should use during extraction.
Without schema guidance:
Input: "Hamas is a terrorist organization designated by the US State Department"
LLM decides freely:
- Entity types: Person? Organization? TerroristGroup? GovernmentAgency?
- Relationship types: DESIGNATED_BY? SANCTIONS? IS_A? ASSOCIATED_WITH?
- Properties: ???
With schema guidance:
Input: "Hamas is a terrorist organization designated by the US State Department"
Using defined schema:
- Entity: "Hamas" (type: Organization)
- Entity: "US State Department" (type: Organization)
- Relationship: SANCTION (US State Department → Hamas)
Why Schema Matters¶
Without Schema: Chaos¶
In one real evaluation, unconstrained extraction produced 579 unique relationship types with massive semantic overlap:
| Variants | Should Be |
|---|---|
| LOCATED_IN, IS_LOCATED_IN, BASED_IN, SITUATED_IN | LOCATED_IN |
| DESIGNATED_BY, SANCTIONED_BY, LISTED_BY | SANCTION |
| WORKS_FOR, EMPLOYED_BY, WORKS_AT | EMPLOYED_BY |
This fragmentation destroys retrieval precision—queries miss relevant results because the same relationship has dozens of names.
With Schema: Consistency¶
A well-defined schema ensures:
- Type consistency: Same concepts always use same names
- Relationship clarity: Clear, queryable relationship vocabulary
- Property standardization: Consistent attribute names across entities
- Better retrieval: Queries find all relevant results
Available Schema Modes¶
Aletheia provides 6 distinct schema modes (plus an alias) to balance automation vs control:
| Mode | Description | Ontology Required | Recommended For |
|---|---|---|---|
none | No schema, Graphiti defaults | No | Quick prototyping |
llm | Two-stage LLM inference | No | Unknown data |
ontology | Strict ontology adherence | Yes | Formal domains |
hybrid | LLM + ontology validation | Yes | Balanced approach |
graph-hybrid | LLM + semantic alignment | Yes | FTM data |
ontology-first | Ontology primary, LLM enhancement | Yes | Complete ontologies |
Decision Guide¶
graph TD
A[Do you have an ontology?] -->|No| B[Need consistent types?]
A -->|Yes| C[Is ontology complete/authoritative?]
B -->|No| D[none]
B -->|Yes| E[llm]
C -->|Yes, use it exactly| F[ontology-first]
C -->|No, LLM should discover| G[Need semantic alignment?]
G -->|Yes| H[graph-hybrid]
G -->|No| I[hybrid] Quick recommendations:
| Scenario | Mode |
|---|---|
| Exploring new data quickly | none |
| Unknown data, no ontology | llm |
| FTM/OpenSanctions data | graph-hybrid |
| Aviation/domain with formal ontology | ontology-first |
| Need strict schema control | ontology |
Core Concepts¶
Entity Types¶
Entity types define the kinds of nodes in your knowledge graph:
class Organization(BaseModel):
"""A corporation, government body, or other organization."""
jurisdiction: str | None = None
incorporation_date: str | None = None
status: str | None = None
Entity types are: - PascalCase names (Person, Organization, Aircraft) - Pydantic models with typed properties - Passed to Graphiti's entity_types parameter
Relationship Types¶
Relationship types define the kinds of edges:
class Sanction(BaseModel):
"""A sanction designation between entities."""
pass
# Usage in EDGE_TYPES dict
EDGE_TYPES = {
"SANCTION": Sanction,
"HAS_ALIAS": HasAlias,
"OWNS": Owns,
}
Relationship types are: - UPPER_SNAKE_CASE names (SANCTION, HAS_ALIAS, OWNS) - Pydantic models (usually empty, properties optional) - Passed to Graphiti's edge_types parameter
Generated Schema Files¶
Schema inference produces Python files in schemas/<graph_name>/:
schemas/
└── my_graph/
├── __init__.py
├── schema_v1.py # Generated Pydantic models
└── metadata.json # Provenance information
Example schema_v1.py:
"""Generated schema for knowledge graph."""
from pydantic import BaseModel, Field
# Entity Types
class Person(BaseModel):
"""A natural person."""
birth_date: str | None = None
nationality: str | None = None
class Organization(BaseModel):
"""An organization or company."""
jurisdiction: str | None = None
# Relationship Types
class Sanction(BaseModel):
"""A sanction designation."""
pass
# Exports
ENTITY_TYPES = {
"Person": Person,
"Organization": Organization,
}
EDGE_TYPES = {
"SANCTION": Sanction,
}
Ontologies¶
For modes that use ontologies (ontology, hybrid, graph-hybrid, ontology-first), you need:
- TTL/OWL file defining classes and relationships
- Ontology graph loaded into the database
# Load ontology into graph (once)
aletheia build-ontology-graph \
--use-case my_case \
--knowledge-graph my_ontology
See Ontology Mode for details on ontology format and loading.
Schema Inference Pipeline¶
┌─────────────────────────────────────────────────────────┐
│ INPUT │
├─────────────────────────────────────────────────────────┤
│ Sample Data (from parser) │
│ + Ontology (if applicable) │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ MODE-SPECIFIC PROCESSING │
├─────────────────────────────────────────────────────────┤
│ - none: Use Graphiti defaults │
│ - llm/inference: Two-stage LLM analysis │
│ - ontology: Extract from TTL/OWL │
│ - hybrid: LLM + ontology string validation │
│ - graph-hybrid: LLM + semantic alignment via graph │
│ - ontology-first: Ontology base + LLM enhancement │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ PHASE 4: CONSOLIDATION (all modes) │
├─────────────────────────────────────────────────────────┤
│ LLM reviews schema for redundancies │
│ Merges semantically similar types │
│ Protects ontology/extension types from removal │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ OUTPUT │
├─────────────────────────────────────────────────────────┤
│ SchemaDefinition: │
│ - entity_types: List[EntityTypeDefinition] │
│ - relationship_types: List[RelationshipTypeDefinition] │
│ - Edge type map with ("Entity","Entity") catch-all │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ CODE GENERATION │
├─────────────────────────────────────────────────────────┤
│ schemas/<graph_name>/schema_v1.py │
│ - Pydantic models (with enriched docstrings) │
│ - ENTITY_TYPES + EDGE_TYPES + EDGE_TYPE_MAP │
│ - CoerciveBaseModel for scalar/list handling │
└─────────────────────────────────────────────────────────┘
Learn More¶
- None Mode - No schema, quick prototyping
- LLM Mode - Automatic schema discovery with prompts
- Ontology Mode - Strict ontology adherence
- Hybrid Mode - LLM + ontology validation
- Graph-Hybrid Mode - Semantic alignment (recommended)
- Ontology-First Mode - Ontology as primary source