Architecture¶
Aletheia is a GraphRAG evaluation framework and knowledge graph builder. This document describes its architecture and how components interact.
High-Level Architecture¶
┌─────────────────────────────────────────────────────────────────┐
│ Data Sources │
│ (FTM JSON, MuSiQue, Custom formats) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Use Cases │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Parser │ │ Episode │ │ Ontology │ │
│ │ │──│ Builder │ │ Loader │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Aletheia Core │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌─────────────┐ │
│ │ Config │ │ Graph │ │ Schema │ │ Evaluation │ │
│ │ (DB, LLM) │ │ Builder │ │ Inference │ │ (RAGAS) │ │
│ └────────────┘ └────────────┘ └────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Graphiti (Fork) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Episode │ │ Entity │ │ Search │ │
│ │ Processing │ │ Resolution │ │ API │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ ┌──────────────┐ ┌──────────────────────────────────────────┐ │
│ │ Community │ │ MCP Server │ │
│ │ Detection │ │ (15 tools, self-describing connectors) │ │
│ └──────────────┘ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Graph Database │
│ (Neo4j / FalkorDB) │
└─────────────────────────────────────────────────────────────────┘
Project Structure¶
aletheia/
├── aletheia/
│ ├── cli/ # CLI commands
│ │ ├── main.py # Entry point
│ │ ├── build.py # Build commands
│ │ └── evaluate.py # Evaluation commands
│ ├── core/
│ │ ├── config/ # Database + LLM configuration
│ │ ├── episodes/ # Episode builder registry
│ │ ├── evaluation/ # RAGAS integration + grounding verification
│ │ ├── graph/ # Graph builder
│ │ ├── ontology/ # GenericOntologyLoader, ModelingProfile
│ │ ├── parsing/ # Base parser
│ │ ├── schema/ # Schema inference engine (7 modes)
│ │ └── tracking/ # Ingestion progress
│ └── ...
├── use_cases/
│ ├── anticorruption/ # EU financial sanctions (FTM)
│ ├── terrorist_orgs/ # Multi-authority FTO designations (FTM)
│ ├── aviation_safety/ # European aviation incidents
│ ├── safety_recommendations/ # EASA safety recommendations
│ ├── airworthiness_directives/# EASA airworthiness directives
│ ├── operation_tango/ # Multi-dataset investigation (FTM)
│ └── evaluation/ # MuSiQue evaluation benchmark
├── schemas/ # Auto-generated schemas (never edit manually)
├── prompts/ # Dynamic extraction prompts
└── docs/ # Documentation (MkDocs Material)
Generated files
Files in schemas/ and prompts/ are regenerated by schema inference on every run. Never edit them manually — fix the inputs (parser, ontology, inference code) instead.
Component Details¶
CLI Layer¶
The CLI (aletheia/cli/) provides commands for building graphs, running evaluations, and inspecting state:
main.py— Click command group registrationbuild.py—build-ontology-graph,build-knowledge-graph,list-use-cases,list-graphs,show-graphevaluate.py—evaluate-ragaswith grounding modes and community search
Core Layer¶
Config¶
Dual LLM configuration (reasoning model + fast model), database driver creation, embedding model setup.
Schema Inference¶
The schema inference engine (aletheia/core/schema/) supports 7 modes and a common Phase 4 consolidation step:
| Mode | Primary Source | LLM Role |
|---|---|---|
none | Graphiti defaults | None |
llm / inference | LLM | Full inference |
ontology | Ontology file | None |
hybrid | LLM + ontology validation | Inference + validation |
graph-hybrid | LLM + ontology graph | Inference + semantic alignment |
ontology-first | Ontology + LLM | Enhancement only |
Key components:
inference.py— Main engine: mode dispatch, Phase 4 consolidation, data-driven pruningmodels.py—EntityTypeDefinition,RelationshipTypeDefinition, enriched docstrings, edge type map with("Entity", "Entity")catch-allcoercion.py—CoerciveBaseModelthat fixes LLM scalar/list type mismatches
Ontology¶
GenericOntologyLoader— Loads TTL/OWL ontologies, classifies classes using transitive ancestry (Entity,Relationship,Abstract), extracts non-reified object properties as relationship typesModelingProfile— Optional explicit classification hints per ontology
Evaluation¶
- RAGAS metrics — Context Precision, Context Recall, Faithfulness, Answer Similarity
- Grounding verification — Three modes (
strict,lenient,off) to detect parametric knowledge leakage - Community search — Optional hierarchical context from entity clusters built via label propagation
Graph Builder¶
Orchestrates the ingestion pipeline: parse → build episodes → infer schema → call Graphiti add_episode. Supports --build-communities for community detection and --resume for resuming interrupted ingestion.
Use Case Layer¶
Each use case (use_cases/<name>/) is self-contained:
use_cases/terrorist_orgs/
├── __init__.py # Registration
├── parser.py # Data parser
├── episode_builder.py # Markdown episode builder
├── ontology/ # TTL/OWL files
│ └── followthemoney.ttl
├── data/ # Source data
│ └── entities.ftm.json
├── evaluation_questions.json # RAGAS evaluation questions
└── mcp_config.yaml # MCP server configuration
Graphiti Integration¶
Aletheia uses a maintained fork of Graphiti (david-morales/aletheia-graphiti, branch: aletheia) that includes 16 cherry-picked upstream PRs and 12 custom fixes for entity extraction, node dedup, and edge resolution.
Graphiti handles:
- Episode processing — Text → entities + relationships
- Entity resolution — Deduplication and merging
- Search API — Semantic and graph-based search (BFS, cosine similarity, community)
- Community detection — Label propagation clustering with hierarchical summaries
MCP Server¶
The Graphiti fork includes an MCP server with 15 tools across 6 groups:
| Group | Tools |
|---|---|
| Semantic Discovery | search, explore_node |
| Schema & Ontology | get_schema, search_ontology, explore_ontology |
| Graph Profiling | profile_graph (property coverage, language detection, relationship validation) |
| Cypher Analytics | run_cypher (read-only, 4-stage security pipeline) |
| Community Intelligence | build_communities |
| Data Management | add_memory, get_episodes, get_episode_context, delete_entity_edge, delete_episode, clear_graph, get_status |
Each connector is self-describing: a DomainProfile auto-discovers entity types, edge types, counts, and samples at startup, generating domain-specific tool descriptions and MCP resources. Four domain configs are defined in use_cases/<name>/mcp_config.yaml with a shared base config.
Data Flow¶
Ingestion¶
1. Parser.parse() → Iterator[Entity]
2. episode_builder(entity) → markdown text
3. SchemaInferenceEngine.extract → entity_types, edge_types (mode-specific + Phase 4)
4. graphiti.add_episode(...) → graph updates (entities, edges, communities)
5. Tracking records progress → resume support
Evaluation¶
1. Load questions from JSON
2. For each question:
a. graphiti.search_(query) → context (nodes + edges + communities)
b. LLM generates answer → grounded response with citations
c. Grounding verification → accept/reject based on evidence
d. RAGAS metrics calculation → precision, recall, faithfulness, similarity
3. Aggregate and output results
Key Interfaces¶
Parser¶
class Parser(Protocol):
def __init__(self, data_dir: Path): ...
def parse(self) -> Iterator[Entity]: ...
@property
def schema_distribution(self) -> dict[str, int]:
"""Entity type counts in data — drives data-driven pruning."""
...
Episode Builder¶
def episode_builder(entity: Entity) -> str:
"""Convert entity to markdown episode."""
...
register_episode_builder(
"my_case",
episode_builder,
source_description="Description of data source",
)
Ontology Loader¶
class OntologyLoader(Protocol):
def __init__(self, ontology_dir: Path): ...
def load(self) -> Ontology: ...
Learn More¶
- Creating Use Cases — Build your own use case
- Schema Modes — Schema inference modes and decision tree
- MCP Connectors — Domain-aware MCP servers
- Contributing — Development workflow