Question Format¶

This guide covers how to design and format evaluation questions for Aletheia.

JSON Structure¶

{
  "questions": [
    {
      "id": "q1",
      "question": "What alias is used for al-Shabaab?",
      "answer": "al-Hijra",
      "answer_aliases": ["Al-Hijra", "al Hijra"]
    }
  ]
}

Fields¶

Field	Required	Type	Description
`id`	Yes	string	Unique identifier
`question`	Yes	string	The question to ask
`answer`	Yes	string	Expected gold answer
`answer_aliases`	No	array	Alternative correct forms

Question Types¶

1. Alias Lookup¶

Test retrieval of entity properties:

{
  "id": "alias-1",
  "question": "What is another name for the PKK?",
  "answer": "Kurdistan Workers' Party",
  "answer_aliases": ["Kurdistan Workers Party", "Partiya Karkerên Kurdistan"]
}

Tests: Node property retrieval, entity resolution

2. Entity Existence¶

Test simple entity lookup:

{
  "id": "exist-1",
  "question": "Is Hamas designated as a terrorist organization?",
  "answer": "Yes, Hamas is designated as a Foreign Terrorist Organization by the US State Department."
}

Tests: Entity retrieval, basic search

3. Relationship Queries¶

Test edge traversal:

{
  "id": "rel-1",
  "question": "What authority sanctioned Hezbollah?",
  "answer": "The US State Department designated Hezbollah as an FTO.",
  "answer_aliases": ["US State Department", "State Department"]
}

Tests: Edge retrieval, relationship understanding

4. Geographic Filtering¶

Test attribute-based filtering:

{
  "id": "geo-1",
  "question": "What Irish organizations are proscribed by the UK?",
  "answer": "The Real IRA and Continuity IRA are proscribed Irish organizations."
}

Tests: Multi-attribute filtering, set retrieval

5. Multi-hop Reasoning¶

Test graph traversal:

{
  "id": "multi-1",
  "question": "What is the parent organization of AQIM, which was formerly known as GSPC?",
  "answer": "al-Qaeda is the parent organization of AQIM.",
  "answer_aliases": ["al-Qaeda", "Al-Qaeda"]
}

Tests: Multi-hop traversal, relationship chaining

6. Temporal Queries¶

Test temporal attributes:

{
  "id": "temp-1",
  "question": "When was Hamas first designated as an FTO?",
  "answer": "Hamas was designated as an FTO in 1997."
}

Tests: Temporal property retrieval

Design Guidelines¶

Do¶

Be specific - Questions should have definite answers
Use domain terminology - Match the language in your data
Include answer aliases - Account for spelling variations
Test different capabilities - Mix question types

Don't¶

Ask about common knowledge - LLMs may answer from training data
Use ambiguous questions - Answers should be verifiable
Require external knowledge - Answers should be in your graph

Question Types to Avoid for GraphRAG¶

Some question types are better suited for SQL than GraphRAG:

Question Type	Example	Why It's Hard
Counting	"How many orgs are designated?"	Requires aggregation
Ranking	"What's the oldest designation?"	Requires sorting
Comparison	"Which org has more aliases?"	Requires computation

These should be excluded or answered with "INSUFFICIENT_CONTEXT".

terrorist_orgs Dataset¶

The terrorist_orgs use case includes 70 curated questions:

Category	Count	Description
Alias lookup	7	Entity property retrieval
Entity existence	7	Simple entity lookup
Geographic filter	10	Location-based queries
Cross-jurisdiction	2	Multi-authority queries
Set operations	2	AND/OR queries
Temporal	3	Date-based queries
Multi-hop	4	2-3 hop traversal

Usage:

aletheia evaluate-ragas \
  --knowledge-graph terrorist_orgs \
  --questions use_cases/terrorist_orgs/evaluation_questions.json

Learn More¶

Running Evaluations - How to run evaluations
RAGAS Metrics - Understanding metrics
Avoiding Parametric Knowledge - Prevent contamination