Skip to content

Question Format

This guide covers how to design and format evaluation questions for Aletheia.

JSON Structure

{
  "questions": [
    {
      "id": "q1",
      "question": "What alias is used for al-Shabaab?",
      "answer": "al-Hijra",
      "answer_aliases": ["Al-Hijra", "al Hijra"]
    }
  ]
}

Fields

Field Required Type Description
id Yes string Unique identifier
question Yes string The question to ask
answer Yes string Expected gold answer
answer_aliases No array Alternative correct forms

Question Types

1. Alias Lookup

Test retrieval of entity properties:

{
  "id": "alias-1",
  "question": "What is another name for the PKK?",
  "answer": "Kurdistan Workers' Party",
  "answer_aliases": ["Kurdistan Workers Party", "Partiya KarkerĂȘn Kurdistan"]
}

Tests: Node property retrieval, entity resolution

2. Entity Existence

Test simple entity lookup:

{
  "id": "exist-1",
  "question": "Is Hamas designated as a terrorist organization?",
  "answer": "Yes, Hamas is designated as a Foreign Terrorist Organization by the US State Department."
}

Tests: Entity retrieval, basic search

3. Relationship Queries

Test edge traversal:

{
  "id": "rel-1",
  "question": "What authority sanctioned Hezbollah?",
  "answer": "The US State Department designated Hezbollah as an FTO.",
  "answer_aliases": ["US State Department", "State Department"]
}

Tests: Edge retrieval, relationship understanding

4. Geographic Filtering

Test attribute-based filtering:

{
  "id": "geo-1",
  "question": "What Irish organizations are proscribed by the UK?",
  "answer": "The Real IRA and Continuity IRA are proscribed Irish organizations."
}

Tests: Multi-attribute filtering, set retrieval

5. Multi-hop Reasoning

Test graph traversal:

{
  "id": "multi-1",
  "question": "What is the parent organization of AQIM, which was formerly known as GSPC?",
  "answer": "al-Qaeda is the parent organization of AQIM.",
  "answer_aliases": ["al-Qaeda", "Al-Qaeda"]
}

Tests: Multi-hop traversal, relationship chaining

6. Temporal Queries

Test temporal attributes:

{
  "id": "temp-1",
  "question": "When was Hamas first designated as an FTO?",
  "answer": "Hamas was designated as an FTO in 1997."
}

Tests: Temporal property retrieval

Design Guidelines

Do

  1. Be specific - Questions should have definite answers
  2. Use domain terminology - Match the language in your data
  3. Include answer aliases - Account for spelling variations
  4. Test different capabilities - Mix question types

Don't

  1. Ask about common knowledge - LLMs may answer from training data
  2. Use ambiguous questions - Answers should be verifiable
  3. Require external knowledge - Answers should be in your graph

Question Types to Avoid for GraphRAG

Some question types are better suited for SQL than GraphRAG:

Question Type Example Why It's Hard
Counting "How many orgs are designated?" Requires aggregation
Ranking "What's the oldest designation?" Requires sorting
Comparison "Which org has more aliases?" Requires computation

These should be excluded or answered with "INSUFFICIENT_CONTEXT".

terrorist_orgs Dataset

The terrorist_orgs use case includes 70 curated questions:

Category Count Description
Alias lookup 7 Entity property retrieval
Entity existence 7 Simple entity lookup
Geographic filter 10 Location-based queries
Cross-jurisdiction 2 Multi-authority queries
Set operations 2 AND/OR queries
Temporal 3 Date-based queries
Multi-hop 4 2-3 hop traversal

Usage:

aletheia evaluate-ragas \
  --knowledge-graph terrorist_orgs \
  --questions use_cases/terrorist_orgs/evaluation_questions.json

Learn More