Skip to content

Creating Use Cases

This guide walks through creating a new use case in Aletheia.

Overview

A use case is a self-contained data domain. To create one:

  1. Create directory structure
  2. Implement a parser
  3. Implement an episode builder
  4. (Optional) Add an ontology
  5. Register components

Directory Structure

mkdir -p use_cases/my_case/{data,ontology}
use_cases/my_case/
├── __init__.py          # Registration
├── parser.py            # Data parser
├── episode_builder.py   # Markdown builder
├── ontology/            # (Optional) Ontology files
│   └── schema.ttl
└── data/                # Source data
    └── records.json

Step 1: Implement the Parser

The parser transforms source data into entities:

# use_cases/my_case/parser.py
from pathlib import Path
from dataclasses import dataclass
from typing import Iterator
import json


@dataclass
class MyEntity:
    """Entity from my data source."""
    id: str
    name: str
    type: str
    properties: dict


class MyParser:
    """Parser for my data format."""

    def __init__(self, data_dir: Path):
        self.data_dir = data_dir

    def parse(self) -> Iterator[MyEntity]:
        """Parse data files and yield entities."""
        data_file = self.data_dir / "records.json"

        with open(data_file) as f:
            records = json.load(f)

        for record in records:
            yield MyEntity(
                id=record["id"],
                name=record["name"],
                type=record["type"],
                properties=record.get("properties", {}),
            )

Step 2: Implement the Episode Builder

The episode builder converts entities to markdown:

# use_cases/my_case/episode_builder.py
from .parser import MyEntity


def build_episode(entity: MyEntity) -> str:
    """Convert entity to markdown episode."""

    # Build properties section
    props_lines = []
    for key, value in entity.properties.items():
        props_lines.append(f"- **{key.title()}**: {value}")
    props_section = "\n".join(props_lines) if props_lines else "- No properties"

    return f"""
# Entity: {entity.name}

## Metadata
- **ID**: {entity.id}
- **Type**: {entity.type}

## Properties
{props_section}

## Context
This is a {entity.type} entity named {entity.name}.
""".strip()

Step 3: Register Components

Register the parser and episode builder:

# use_cases/my_case/__init__.py
from .parser import MyParser, MyEntity
from aletheia.core.ontology import GenericOntologyLoader
from aletheia.core.episodes import register_episode_builder
from .episode_builder import build_episode

# Export parser class
Parser = MyParser

# Export ontology loader
Ontology = GenericOntologyLoader

# Register episode builder
register_episode_builder(
    "my_case",
    build_episode,
    source_description="My custom data source",
)

Step 4: (Optional) Add Ontology

For graph-hybrid mode, add an ontology:

# use_cases/my_case/ontology/schema.ttl
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://example.org/mycase#> .

# Classes
:Person a owl:Class ;
    rdfs:label "Person" .

:Organization a owl:Class ;
    rdfs:label "Organization" .

# Properties
:name a owl:DatatypeProperty ;
    rdfs:domain :Person ;
    rdfs:range xsd:string .

# Relationships
:worksFor a owl:ObjectProperty ;
    rdfs:domain :Person ;
    rdfs:range :Organization .

Step 5: Test the Use Case

List Use Cases

aletheia list-use-cases
# Should show: my_case

Build Graph

aletheia build-knowledge-graph \
  --use-case my_case \
  --knowledge-graph my_case_graph \
  --schema-mode none

Verify

aletheia show-graph --knowledge-graph my_case_graph

Reusing the FTM Parser

For FTM data, reuse the existing parser:

# use_cases/my_ftm_case/__init__.py
from use_cases.anticorruption.parser import FTMParser, FTMEntity
from aletheia.core.ontology import GenericOntologyLoader
from aletheia.core.episodes import register_episode_builder
from use_cases.anticorruption.episode_builder import build_ftm_episode_content

# Reuse FTM parser
Parser = FTMParser
Ontology = GenericOntologyLoader

# Reuse FTM episode builder
register_episode_builder(
    "my_ftm_case",
    build_ftm_episode_content,
    source_description="OpenSanctions FTM data",
)

Advanced: Custom Entity Resolution

Override entity resolution behavior:

def build_episode(entity: MyEntity) -> str:
    """Episode with explicit entity markers for resolution."""

    # Add explicit entity markers
    entities = [f"[[{entity.name}]]"]
    for alias in entity.properties.get("aliases", []):
        entities.append(f"[[{alias}]]")

    return f"""
# Entity: {entity.name}

## Known As
{', '.join(entities)}

## Properties
...
"""

Advanced: Custom Relationship Extraction

Add relationship hints for edge extraction:

def build_episode(entity: MyEntity) -> str:
    """Episode with relationship hints."""

    relationships = []
    for rel in entity.properties.get("relationships", []):
        relationships.append(
            f"- {entity.name} {rel['type']} {rel['target']}"
        )

    return f"""
# Entity: {entity.name}

## Relationships
{chr(10).join(relationships) if relationships else 'No relationships'}
"""

Step 6: (Optional) Add MCP Config

For MCP server integration, add a config file referencing the shared base:

# use_cases/my_case/mcp_config.yaml
base: ../../mcp-base-config.yaml

graphiti:
  group_id: my_case
  ontology_graph: my_case_ontology

The base: path is resolved relative to the overlay file's directory. See MCP Connectors for details.

Step 7: (Optional) Implement schema_distribution

If using ontology-first or graph-hybrid modes with data-driven pruning, implement schema_distribution on your parser:

@property
def schema_distribution(self) -> dict[str, int]:
    """Return entity type counts from the data."""
    return {"Person": 42, "Organization": 15, "Sanction": 30}

This drives data-driven pruning — entity types not in this distribution are removed from the schema.

Step 8: (Optional) Add Evaluation Data

For counterfactual testing (parametric knowledge detection), add an evaluation/ directory:

use_cases/my_case/evaluation/
├── substitutions.json             # Entity substitution maps
├── counterfactual_testset.json    # Generated test set
└── generate_counterfactual_testset.py  # Generator script

The substitutions.json file defines domain-specific entity swaps used by the counterfactual mutation framework in aletheia.core.evaluation.counterfactual. See use_cases/terrorist_orgs/evaluation/ for a working example.

Checklist

  • [ ] Directory structure created
  • [ ] Parser implemented and returns entities
  • [ ] Episode builder returns valid markdown
  • [ ] Components registered in __init__.py
  • [ ] (Optional) Ontology TTL files added
  • [ ] (Optional) MCP config with base reference
  • [ ] (Optional) schema_distribution property on parser
  • [ ] (Optional) Counterfactual substitution data in evaluation/
  • [ ] Use case appears in list-use-cases
  • [ ] Graph builds successfully
  • [ ] Entities extracted correctly

Learn More