Graphiti MCP Server vs Aletheia Evaluation¶
This FAQ explains the differences between using the Graphiti MCP server directly (e.g., with Claude Desktop) versus using Aletheia's evaluation framework, and when to use each.
The Question¶
"I discovered the Graphiti MCP server. How does it compare to Aletheia? What's the difference between Claude Desktop answering questions via MCP versus Aletheia's evaluation pipeline?"
Quick Answer¶
- Graphiti MCP Server: A retrieval and analytics interface that lets an LLM access your knowledge graph. The LLM synthesizes answers using its own judgment.
- Aletheia: A controlled evaluation framework that forces grounded answers with programmatic verification and quality metrics.
Example: Complex Aviation Query¶
To illustrate the differences, consider this question:
"What incidents in Barcelona or Madrid involved Boeing aircraft and resulted in emergency declarations (MAYDAY or PAN-PAN)?"
Graphiti MCP Server (Aletheia Fork)¶
Available Tools¶
The MCP server exposes 15 tools across 6 groups:
| Group | Tools | Purpose |
|---|---|---|
| Semantic Discovery | search, explore_node | Semantic search with BFS traversal, entity-centric exploration |
| Schema & Ontology | get_schema, search_ontology, explore_ontology | Graph structure discovery, ontology concept search |
| Graph Profiling | profile_graph | Property coverage, language detection, relationship validation |
| Cypher Analytics | run_cypher | Read-only Cypher queries with 4-stage security pipeline |
| Community Intelligence | build_communities | Label propagation clustering |
| Data Management | add_memory, get_episodes, get_episode_context, delete_entity_edge, delete_episode, clear_graph, get_status | CRUD operations and health checks |
Self-Describing Connectors¶
Each MCP server instance introspects its own graph at startup via DomainProfile, generating domain-specific:
- Tool descriptions — the LLM sees aviation-specific guidance when connected to the aviation graph
- Entity catalogs — types, counts, and sample entities
- MCP resources —
graphiti://domain_summary,graphiti://entity_catalog,graphiti://relationship_types
Four domain connectors are configured in Aletheia: Aviation Safety, Safety Recommendations, Airworthiness Directives, and Terrorist Organizations.
How Claude Desktop Answers¶
User: "What incidents in Barcelona or Madrid involved Boeing aircraft
and resulted in emergency declarations (MAYDAY or PAN-PAN)?"
Claude Desktop (via MCP):
1. Calls search(query="Barcelona Madrid Boeing MAYDAY PAN-PAN")
2. Optionally calls run_cypher() for structured analytics
3. Optionally calls explore_node() to traverse from a known entity
4. Uses parametric knowledge + returned results to synthesize answer
MCP Limitations for Evaluation¶
| Limitation | Impact |
|---|---|
| No grounding verification | LLM may hallucinate details not in results |
| No evaluation metrics | No way to measure retrieval quality |
| No evidence citations | Can't trace answer to source |
| Parametric knowledge leakage | LLM's training data may pollute answer |
Aletheia Evaluation Framework¶
Pipeline Architecture¶
Question
│
▼ [1. Query Analysis]
Detect patterns: "Barcelona OR Madrid", "Boeing", "MAYDAY"
→ Generate SearchFilters (node_labels, edge_types)
│
▼ [2. Graphiti Search]
Multi-method search with:
- Cosine similarity + BFS traversal
- RRF reranking
- Optional community search
- Custom search filters
│
▼ [3. Evidence Transformation]
transform_search_results(results) → EvidenceUnit[]
Each unit has: eid, subject_name, predicate, object_name, fact
│
▼ [4. Context Building]
Atomic evidence units with citation IDs
Optional COMMUNITY_CONTEXT section
│
▼ [5. LLM Answer Generation]
Structured JSON with mandatory citations
│
▼ [6. Grounding Verification]
Programmatic check that answer entities appear in cited evidence
│
▼ [7. RAGAS Evaluation]
Quality metrics: precision, recall, faithfulness, similarity
│
▼ [8. Grounding Report]
Pass rates, rejection reasons, diagnostics
Example Evidence Output¶
EVIDENCE:
[E1]
subject: Occurrence-2024-0157-EU
predicate: OCCURRED_AT
object: Barcelona Airport
fact: The aviation incident Occurrence-2024-0157-EU occurred at Barcelona Airport
source: aviation_safety
[E2]
subject: Occurrence-2024-0157-EU
predicate: INVOLVED_AIRCRAFT
object: Boeing 737-800
fact: Boeing 737-800 registration EC-LUT was the aircraft involved
source: aviation_safety
Grounding Verification¶
Aletheia programmatically verifies that:
- All cited evidence IDs are valid
- Entities mentioned in the answer appear in the cited evidence
INSUFFICIENT_CONTEXTis returned when evidence is lacking
Three grounding modes control strictness:
| Mode | Behavior |
|---|---|
strict (default) | Rejects ungrounded answers |
lenient | Warns but includes all answers |
off | No verification |
Side-by-Side Comparison¶
| Dimension | Graphiti MCP Server | Aletheia Evaluation |
|---|---|---|
| Tools | 15 (search, Cypher, schema, profiling, ontology, CRUD) | Programmatic pipeline |
| Search | search (configurable methods + reranker) | Multi-method with custom filters |
| Analytics | run_cypher (read-only Cypher queries) | RAGAS metrics |
| Schema Awareness | get_schema discovers graph structure | Schema inference engine (7 modes) |
| Context Format | Raw search results | Atomic evidence units with IDs |
| Answer Generation | LLM's discretion | Structured JSON with mandatory citations |
| Grounding | None | Programmatic verification gate |
| Parametric Knowledge | Can leak into answer | Blocked by prompt + verification |
| Evaluation Metrics | None | RAGAS (precision, recall, faithfulness, similarity) |
| Self-Description | DomainProfile generates per-domain guidance | N/A |
When to Use Which¶
Use MCP Server¶
- Interactive exploration of knowledge graphs
- Conversational memory for agentic workflows
- Structured analytics via Cypher queries
- Quick answers where auditability is less critical
- Building AI agents that need persistent memory
- Schema discovery and ontology browsing
Use Aletheia Evaluation¶
- Measuring retrieval system quality
- Ensuring answers are grounded in evidence
- Preventing parametric knowledge leakage
- Generating quality metrics for system improvements
- Production systems requiring auditability
- Comparing search configurations (baseline comparisons)
Integration¶
The MCP server and Aletheia are complementary:
- Build your knowledge graph with Aletheia (parsers, episode builders, schema inference)
- Evaluate retrieval quality with Aletheia's RAGAS pipeline
- Serve the graph to LLMs via MCP connectors for interactive use
- Analyze the graph with Cypher queries via the MCP server
The same graph database powers both — Aletheia builds and evaluates it, MCP serves it.
Summary¶
| If you need... | Use... |
|---|---|
| Quick interactive Q&A | MCP Server |
| Structured graph analytics | MCP Server (run_cypher) |
| Measured retrieval quality | Aletheia |
| Grounded, auditable answers | Aletheia |
| Agentic memory | MCP Server |
| Production evaluation | Aletheia |