Skip to content

Building Knowledge Graphs

This guide covers the complete workflow for building knowledge graphs in Aletheia.

Workflow Overview

graph LR
    A[Source Data] --> B[Parser]
    B --> C[Episode Builder]
    C --> D[Graphiti]
    D --> E[Knowledge Graph]

Step 1: Choose a Schema Mode

Before building, decide on a schema mode:

Mode Best For
none Quick prototyping, unknown data
llm / inference Data exploration, no ontology
ontology Strict formal domains
hybrid LLM + ontology string validation
graph-hybrid Production FTM data (recommended)
ontology-first Complete ontologies, can't lose concepts

Step 2: Prepare Ontology (for graph-hybrid)

If using graph-hybrid mode, first load the ontology:

aletheia build-ontology-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs_ontology

This creates nodes for entity types and relationship types that guide extraction.

Step 3: Build the Graph

Basic Build

aletheia build-knowledge-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs \
  --schema-mode graph-hybrid \
  --ontology-graph terrorist_orgs_ontology

With Community Building

Communities cluster related entities for hierarchical queries:

aletheia build-knowledge-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs \
  --schema-mode graph-hybrid \
  --ontology-graph terrorist_orgs_ontology \
  --build-communities

Reset and Rebuild

To start fresh:

aletheia build-knowledge-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs \
  --schema-mode graph-hybrid \
  --reset

Data Loss

--reset deletes all existing data in the graph.

Resume Interrupted Build

If a build is interrupted:

aletheia build-knowledge-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs \
  --schema-mode graph-hybrid \
  --resume

Aletheia tracks progress and resumes from the last successful episode.

Monitoring Progress

During ingestion, Aletheia displays:

Building knowledge graph...
  [=====>              ] 25% (250/1000 episodes)
  Elapsed: 5m 23s | Remaining: ~16m
  Errors: 3

Handling Errors

Some episodes may fail to ingest. View errors with:

aletheia show-errors --knowledge-graph terrorist_orgs

Common error causes:

Error Cause Solution
Invalid entity IDs Cross-episode references Expected - Graphiti resolves via entity resolution
Token limit exceeded Episode too large Split into smaller episodes
Rate limit API throttling Wait and resume

Verifying the Graph

After building, verify the graph:

# Show statistics
aletheia show-graph --knowledge-graph terrorist_orgs

# Query directly (FalkorDB)
redis-cli GRAPH.QUERY terrorist_orgs "MATCH (n) RETURN labels(n), count(*)"

# Query directly (Neo4j)
cypher-shell -d terrorist_orgs "MATCH (n) RETURN labels(n), count(*)"

Best Practices

1. Start Small

Test with a subset first:

# In your parser, limit records for testing
def parse(self) -> Iterator[Entity]:
    for i, entity in enumerate(self._parse_all()):
        if i >= 100:  # Test with 100 records
            break
        yield entity

2. Use graph-hybrid for FTM

The graph-hybrid mode is optimized for FTM data:

  • Ontology provides structure
  • LLM handles edge cases
  • Semantic alignment improves consistency

3. Build Communities

Communities improve retrieval for:

  • "What organizations are related to X?"
  • "What are the main entity clusters?"
  • High-level summarization queries

4. Monitor Token Usage

LLM extraction uses tokens. Monitor usage:

  • Use --verbose to see per-episode token counts
  • Consider smaller episodes for large documents

Learn More