Skip to content

Schema Inference

This section provides comprehensive documentation on how Aletheia infers and manages schemas for knowledge graph construction.

Schema Inference Overview

What is Schema Inference?

When building a knowledge graph, the LLM extracts entities and relationships from text. Schema inference determines what vocabulary of types the LLM should use during extraction.

Without schema guidance:

Input: "Hamas is a terrorist organization designated by the US State Department"

LLM decides freely:
  - Entity types: Person? Organization? TerroristGroup? GovernmentAgency?
  - Relationship types: DESIGNATED_BY? SANCTIONS? IS_A? ASSOCIATED_WITH?
  - Properties: ???

With schema guidance:

Input: "Hamas is a terrorist organization designated by the US State Department"

Using defined schema:
  - Entity: "Hamas" (type: Organization)
  - Entity: "US State Department" (type: Organization)
  - Relationship: SANCTION (US State Department → Hamas)

Why Schema Matters

Without Schema: Chaos

In one real evaluation, unconstrained extraction produced 579 unique relationship types with massive semantic overlap:

Variants Should Be
LOCATED_IN, IS_LOCATED_IN, BASED_IN, SITUATED_IN LOCATED_IN
DESIGNATED_BY, SANCTIONED_BY, LISTED_BY SANCTION
WORKS_FOR, EMPLOYED_BY, WORKS_AT EMPLOYED_BY

This fragmentation destroys retrieval precision—queries miss relevant results because the same relationship has dozens of names.

With Schema: Consistency

A well-defined schema ensures:

  • Type consistency: Same concepts always use same names
  • Relationship clarity: Clear, queryable relationship vocabulary
  • Property standardization: Consistent attribute names across entities
  • Better retrieval: Queries find all relevant results

Available Schema Modes

Aletheia provides 6 distinct schema modes (plus an alias) to balance automation vs control:

Mode Description Ontology Required Recommended For
none No schema, Graphiti defaults No Quick prototyping
llm Two-stage LLM inference No Unknown data
ontology Strict ontology adherence Yes Formal domains
hybrid LLM + ontology validation Yes Balanced approach
graph-hybrid LLM + semantic alignment Yes FTM data
ontology-first Ontology primary, LLM enhancement Yes Complete ontologies

Decision Guide

graph TD
    A[Do you have an ontology?] -->|No| B[Need consistent types?]
    A -->|Yes| C[Is ontology complete/authoritative?]

    B -->|No| D[none]
    B -->|Yes| E[llm]

    C -->|Yes, use it exactly| F[ontology-first]
    C -->|No, LLM should discover| G[Need semantic alignment?]

    G -->|Yes| H[graph-hybrid]
    G -->|No| I[hybrid]

Quick recommendations:

Scenario Mode
Exploring new data quickly none
Unknown data, no ontology llm
FTM/OpenSanctions data graph-hybrid
Aviation/domain with formal ontology ontology-first
Need strict schema control ontology

Core Concepts

Entity Types

Entity types define the kinds of nodes in your knowledge graph:

class Organization(BaseModel):
    """A corporation, government body, or other organization."""
    jurisdiction: str | None = None
    incorporation_date: str | None = None
    status: str | None = None

Entity types are: - PascalCase names (Person, Organization, Aircraft) - Pydantic models with typed properties - Passed to Graphiti's entity_types parameter

Relationship Types

Relationship types define the kinds of edges:

class Sanction(BaseModel):
    """A sanction designation between entities."""
    pass

# Usage in EDGE_TYPES dict
EDGE_TYPES = {
    "SANCTION": Sanction,
    "HAS_ALIAS": HasAlias,
    "OWNS": Owns,
}

Relationship types are: - UPPER_SNAKE_CASE names (SANCTION, HAS_ALIAS, OWNS) - Pydantic models (usually empty, properties optional) - Passed to Graphiti's edge_types parameter

Generated Schema Files

Schema inference produces Python files in schemas/<graph_name>/:

schemas/
└── my_graph/
    ├── __init__.py
    ├── schema_v1.py      # Generated Pydantic models
    └── metadata.json     # Provenance information

Example schema_v1.py:

"""Generated schema for knowledge graph."""
from pydantic import BaseModel, Field

# Entity Types
class Person(BaseModel):
    """A natural person."""
    birth_date: str | None = None
    nationality: str | None = None

class Organization(BaseModel):
    """An organization or company."""
    jurisdiction: str | None = None

# Relationship Types
class Sanction(BaseModel):
    """A sanction designation."""
    pass

# Exports
ENTITY_TYPES = {
    "Person": Person,
    "Organization": Organization,
}

EDGE_TYPES = {
    "SANCTION": Sanction,
}

Ontologies

For modes that use ontologies (ontology, hybrid, graph-hybrid, ontology-first), you need:

  1. TTL/OWL file defining classes and relationships
  2. Ontology graph loaded into the database
# Load ontology into graph (once)
aletheia build-ontology-graph \
  --use-case my_case \
  --knowledge-graph my_ontology

See Ontology Mode for details on ontology format and loading.

Schema Inference Pipeline

┌─────────────────────────────────────────────────────────┐
│                    INPUT                                 │
├─────────────────────────────────────────────────────────┤
│  Sample Data (from parser)                              │
│  + Ontology (if applicable)                             │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│          MODE-SPECIFIC PROCESSING                        │
├─────────────────────────────────────────────────────────┤
│  - none: Use Graphiti defaults                          │
│  - llm/inference: Two-stage LLM analysis                │
│  - ontology: Extract from TTL/OWL                       │
│  - hybrid: LLM + ontology string validation             │
│  - graph-hybrid: LLM + semantic alignment via graph     │
│  - ontology-first: Ontology base + LLM enhancement      │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│          PHASE 4: CONSOLIDATION (all modes)              │
├─────────────────────────────────────────────────────────┤
│  LLM reviews schema for redundancies                    │
│  Merges semantically similar types                      │
│  Protects ontology/extension types from removal         │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│                    OUTPUT                                │
├─────────────────────────────────────────────────────────┤
│  SchemaDefinition:                                      │
│  - entity_types: List[EntityTypeDefinition]             │
│  - relationship_types: List[RelationshipTypeDefinition] │
│  - Edge type map with ("Entity","Entity") catch-all     │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│                 CODE GENERATION                          │
├─────────────────────────────────────────────────────────┤
│  schemas/<graph_name>/schema_v1.py                      │
│  - Pydantic models (with enriched docstrings)           │
│  - ENTITY_TYPES + EDGE_TYPES + EDGE_TYPE_MAP            │
│  - CoerciveBaseModel for scalar/list handling            │
└─────────────────────────────────────────────────────────┘

Learn More