Aletheia vs FalkorDB GraphRAG-SDK¶

This document compares Aletheia's approach to ontology handling and schema inference with FalkorDB's GraphRAG-SDK.

Overview¶

Both frameworks build knowledge graphs from unstructured data using LLMs for entity and relationship extraction. They differ in how they handle ontologies and schema discovery.

Framework	Philosophy
GraphRAG-SDK	"Let the LLM figure it out"
Aletheia	"LLM proposes, ontology validates"

GraphRAG-SDK Approach¶

Ontology Autodiscovery¶

GraphRAG-SDK uses single-pass LLM extraction from raw documents. A fixed system prompt instructs the LLM to output JSON schema directly.

# GraphRAG-SDK: Ontology from documents
ontology = Ontology.from_sources(
    sources=[PDF("report.pdf"), URL("https://example.com")],
    model=model,
)

Key Characteristics¶

Aspect	Implementation
Discovery Method	Single-pass LLM extraction
Prompt Strategy	Fixed system prompt outputs JSON directly
Entity Definition	Runtime Python objects (label, attributes, description)
Relationship Handling	Extracted alongside entities in same pass
Validation	Post-hoc LLM correction via `FIX_ONTOLOGY_PROMPT`
Merging	Document ontologies merged via `o.merge_with()`

Ontology Sources¶

GraphRAG-SDK supports three ways to obtain an ontology:

from_sources() - LLM extracts from documents
from_kg_graph() - Extract from existing FalkorDB graph
from_ttl() - Parse RDF/Turtle files

Prompt Architecture¶

Document Text → Fixed System Prompt → JSON Schema → Ontology Object

The system prompt instructs:

Capture entities, relationships, and attributes
Use basic types (e.g., "person" not "mathematician")
Maintain consistent entity references
Output JSON inline with no spaces

Aletheia Approach¶

Two-Stage Meta-Prompt Architecture¶

Aletheia separates domain analysis from schema extraction:

Stage 1: Domain Analysis - LLM acts as "knowledge graph architect" to generate a domain-specific extraction prompt
Stage 2: Schema Extraction - Uses the generated prompt to extract structured schema

# Aletheia: Schema with ontology alignment
engine = SchemaInferenceEngine(
    llm_client=client,
    schema_mode=SchemaMode.GRAPH_HYBRID,
    ontology=ontology,
    parser=parser,
)
schema = await engine.extract_schema(db_name, sample_data_dir)

Schema Modes¶

Aletheia offers six distinct schema inference modes (plus an inference alias for llm):

Mode	Description
`none`	Use Graphiti defaults
`llm`	Two-stage LLM inference
`ontology`	Extract schema from ontology file
`hybrid`	LLM + ontology validation
`graph-hybrid`	LLM-first + semantic graph alignment
`ontology-first`	Ontology as primary, LLM enhances

All modes except none apply a Phase 4 consolidation step: an LLM reviews the final schema for redundancies and merges semantically similar types. Ontology-derived types are protected from removal.

Graph-Hybrid Mode (Recommended)¶

The graph-hybrid mode provides the best balance of flexibility and rigor:

┌─────────────────────────────────────────────────────────┐
│ Phase 1: LLM-First Inference (Unbiased)                 │
│   • Extract schema without ontology guidance            │
│   • Avoids anchoring bias from seeing ontology first    │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│ Phase 2: Semantic Alignment via Knowledge Graph         │
│   • Vector search matches LLM concepts → ontology       │
│   • Confidence scores for each alignment                │
│   • Unaligned concepts flagged for review               │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│ Phase 3: Property Enrichment (Data-Driven)              │
│   • Add ontology properties that appear in actual data  │
│   • Filter properties not present in source data        │
│   • Prevents schema bloat                               │
└─────────────────────────────────────────────────────────┘

Ontology as Knowledge Graph¶

Aletheia loads OWL/TTL ontologies into FalkorDB as searchable knowledge graphs:

# Load ontology to graph (once)
aletheia build-ontology-graph \
  --use-case my_case \
  --knowledge-graph my_ontology

# Build data graph with alignment
aletheia build-knowledge-graph \
  --use-case my_case \
  --knowledge-graph my_graph \
  --schema-mode graph-hybrid \
  --ontology-graph my_ontology

Detailed Comparison¶

Ontology Role¶

Aspect	GraphRAG-SDK	Aletheia
Purpose	IS the extraction schema	Guides/validates extraction schema
Relationship	Ontology = Schema	Ontology → Schema (separate artifacts)
Authority	LLM-derived	Domain expert-defined

Formal Ontology Support¶

Aspect	GraphRAG-SDK	Aletheia
OWL support	Limited (TTL import only)	First-class (loaded to graph)
Ontology storage	Runtime Python objects	FalkorDB knowledge graph
Searchability	No	Vector + BFS search
Reusability	Per-session	Persistent, shared across runs

LLM Bias Mitigation¶

Aspect	GraphRAG-SDK	Aletheia
Anchoring bias	None - LLM sees prompts	Graph-hybrid: LLM infers blind
Hallucination control	Post-hoc fix prompts	Ontology alignment validation
Schema drift	Risk across documents	Ontology provides anchor

Alignment Mechanism¶

Aspect	GraphRAG-SDK	Aletheia
Method	String matching via merge	Semantic search (cosine + BFS)
Confidence tracking	No	Yes - per-concept scores
Alignment reports	No	Yes - JSON reports with rationale
Failed alignments	Silent merge	Explicit warnings, review flags

Property Handling¶

Aspect	GraphRAG-SDK	Aletheia
Property source	LLM extraction only	LLM + ontology enrichment
Filtering	None	Data-driven (only properties in data)
Schema bloat	Risk	Prevented by filtering

Schema Persistence¶

Aspect	GraphRAG-SDK	Aletheia
Storage	Runtime Python objects	Generated Python modules
Location	Memory only	`schemas/<graph_name>.py`
Versioning	No	Git-trackable files
Reuse	Rebuild each session	Load existing schema

Code Examples¶

GraphRAG-SDK: Basic Usage¶

from graphrag_sdk import Ontology, KnowledgeGraph
from graphrag_sdk.source import URL

# Auto-discover ontology from web pages
ontology = Ontology.from_sources(
    sources=[URL("https://example.com/article")],
    model=model,
)

# Create knowledge graph
kg = KnowledgeGraph(
    name="my_graph",
    ontology=ontology,
    model=model,
)

# Populate from sources
kg.process_sources([URL("https://example.com/data")])

Aletheia: Graph-Hybrid Mode¶

from aletheia.core.schema import SchemaInferenceEngine, SchemaMode

# Create engine with ontology
engine = SchemaInferenceEngine(
    llm_client=client,
    schema_mode=SchemaMode.GRAPH_HYBRID,
    ontology=ontology,  # Loaded from OWL/TTL
    parser=parser,
    alignment_confidence=0.7,
)

# Extract with alignment
schema = await engine.extract_schema(
    db_name="my_graph",
    sample_data_dir=Path("data/"),
)

# Access alignment report
report = engine._last_alignment_report
print(f"Aligned: {report.successful_count}")
print(f"Failed: {report.failed_count}")

When to Use Which¶

Scenario	Recommended
Quick prototyping from documents	GraphRAG-SDK
Domain with formal ontology (FTM, FIBO, etc.)	Aletheia
Need audit trail / alignment reports	Aletheia
Multi-source data requiring entity resolution	Aletheia
Simple RAG chatbot	GraphRAG-SDK
Regulatory compliance / explainability	Aletheia
Ad-hoc document analysis	GraphRAG-SDK
Production knowledge graph with governance	Aletheia

Summary¶

GraphRAG-SDK optimizes for simplicity and speed. It works well for general-purpose document analysis where schema consistency is less critical.

Aletheia optimizes for rigor and auditability. It excels in domains with established ontologies where schema fidelity and alignment transparency matter.

The key insight is that these represent different points on the flexibility-rigor spectrum. GraphRAG-SDK prioritizes developer experience; Aletheia prioritizes domain modeling correctness.