Building Knowledge Graphs¶

This guide covers the complete workflow for building knowledge graphs in Aletheia.

Workflow Overview¶

graph LR
    A[Source Data] --> B[Parser]
    B --> C[Episode Builder]
    C --> D[Graphiti]
    D --> E[Knowledge Graph]

Step 1: Choose a Schema Mode¶

Before building, decide on a schema mode:

Mode	Best For
`none`	Quick prototyping, unknown data
`llm` / `inference`	Data exploration, no ontology
`ontology`	Strict formal domains
`hybrid`	LLM + ontology string validation
`graph-hybrid`	Production FTM data (recommended)
`ontology-first`	Complete ontologies, can't lose concepts

Step 2: Prepare Ontology (for graph-hybrid)¶

If using graph-hybrid mode, first load the ontology:

aletheia build-ontology-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs_ontology

This creates nodes for entity types and relationship types that guide extraction.

Step 3: Build the Graph¶

Basic Build¶

aletheia build-knowledge-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs \
  --schema-mode graph-hybrid \
  --ontology-graph terrorist_orgs_ontology

With Community Building¶

Communities cluster related entities for hierarchical queries:

aletheia build-knowledge-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs \
  --schema-mode graph-hybrid \
  --ontology-graph terrorist_orgs_ontology \
  --build-communities

Reset and Rebuild¶

To start fresh:

aletheia build-knowledge-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs \
  --schema-mode graph-hybrid \
  --reset

Data Loss

--reset deletes all existing data in the graph.

Resume Interrupted Build¶

If a build is interrupted:

aletheia build-knowledge-graph \
  --use-case terrorist_orgs \
  --knowledge-graph terrorist_orgs \
  --schema-mode graph-hybrid \
  --resume

Aletheia tracks progress and resumes from the last successful episode.

Monitoring Progress¶

During ingestion, Aletheia displays:

Building knowledge graph...
  [=====>              ] 25% (250/1000 episodes)
  Elapsed: 5m 23s | Remaining: ~16m
  Errors: 3

Handling Errors¶

Some episodes may fail to ingest. View errors with:

aletheia show-errors --knowledge-graph terrorist_orgs

Common error causes:

Error	Cause	Solution
Invalid entity IDs	Cross-episode references	Expected - Graphiti resolves via entity resolution
Token limit exceeded	Episode too large	Split into smaller episodes
Rate limit	API throttling	Wait and resume

Verifying the Graph¶

After building, verify the graph:

# Show statistics
aletheia show-graph --knowledge-graph terrorist_orgs

# Query directly (FalkorDB)
redis-cli GRAPH.QUERY terrorist_orgs "MATCH (n) RETURN labels(n), count(*)"

# Query directly (Neo4j)
cypher-shell -d terrorist_orgs "MATCH (n) RETURN labels(n), count(*)"

Best Practices¶

1. Start Small¶

Test with a subset first:

# In your parser, limit records for testing
def parse(self) -> Iterator[Entity]:
    for i, entity in enumerate(self._parse_all()):
        if i >= 100:  # Test with 100 records
            break
        yield entity

2. Use graph-hybrid for FTM¶

The graph-hybrid mode is optimized for FTM data:

Ontology provides structure
LLM handles edge cases
Semantic alignment improves consistency

3. Build Communities¶

Communities improve retrieval for:

"What organizations are related to X?"
"What are the main entity clusters?"
High-level summarization queries

4. Monitor Token Usage¶

LLM extraction uses tokens. Monitor usage:

Use --verbose to see per-episode token counts
Consider smaller episodes for large documents

Learn More¶

Schema Modes - Detailed mode comparison
CLI Reference - All command options
Troubleshooting - Common issues