Question Format¶
This guide covers how to design and format evaluation questions for Aletheia.
JSON Structure¶
{
"questions": [
{
"id": "q1",
"question": "What alias is used for al-Shabaab?",
"answer": "al-Hijra",
"answer_aliases": ["Al-Hijra", "al Hijra"]
}
]
}
Fields¶
| Field | Required | Type | Description |
|---|---|---|---|
id | Yes | string | Unique identifier |
question | Yes | string | The question to ask |
answer | Yes | string | Expected gold answer |
answer_aliases | No | array | Alternative correct forms |
Question Types¶
1. Alias Lookup¶
Test retrieval of entity properties:
{
"id": "alias-1",
"question": "What is another name for the PKK?",
"answer": "Kurdistan Workers' Party",
"answer_aliases": ["Kurdistan Workers Party", "Partiya KarkerĂȘn Kurdistan"]
}
Tests: Node property retrieval, entity resolution
2. Entity Existence¶
Test simple entity lookup:
{
"id": "exist-1",
"question": "Is Hamas designated as a terrorist organization?",
"answer": "Yes, Hamas is designated as a Foreign Terrorist Organization by the US State Department."
}
Tests: Entity retrieval, basic search
3. Relationship Queries¶
Test edge traversal:
{
"id": "rel-1",
"question": "What authority sanctioned Hezbollah?",
"answer": "The US State Department designated Hezbollah as an FTO.",
"answer_aliases": ["US State Department", "State Department"]
}
Tests: Edge retrieval, relationship understanding
4. Geographic Filtering¶
Test attribute-based filtering:
{
"id": "geo-1",
"question": "What Irish organizations are proscribed by the UK?",
"answer": "The Real IRA and Continuity IRA are proscribed Irish organizations."
}
Tests: Multi-attribute filtering, set retrieval
5. Multi-hop Reasoning¶
Test graph traversal:
{
"id": "multi-1",
"question": "What is the parent organization of AQIM, which was formerly known as GSPC?",
"answer": "al-Qaeda is the parent organization of AQIM.",
"answer_aliases": ["al-Qaeda", "Al-Qaeda"]
}
Tests: Multi-hop traversal, relationship chaining
6. Temporal Queries¶
Test temporal attributes:
{
"id": "temp-1",
"question": "When was Hamas first designated as an FTO?",
"answer": "Hamas was designated as an FTO in 1997."
}
Tests: Temporal property retrieval
Design Guidelines¶
Do¶
- Be specific - Questions should have definite answers
- Use domain terminology - Match the language in your data
- Include answer aliases - Account for spelling variations
- Test different capabilities - Mix question types
Don't¶
- Ask about common knowledge - LLMs may answer from training data
- Use ambiguous questions - Answers should be verifiable
- Require external knowledge - Answers should be in your graph
Question Types to Avoid for GraphRAG¶
Some question types are better suited for SQL than GraphRAG:
| Question Type | Example | Why It's Hard |
|---|---|---|
| Counting | "How many orgs are designated?" | Requires aggregation |
| Ranking | "What's the oldest designation?" | Requires sorting |
| Comparison | "Which org has more aliases?" | Requires computation |
These should be excluded or answered with "INSUFFICIENT_CONTEXT".
terrorist_orgs Dataset¶
The terrorist_orgs use case includes 70 curated questions:
| Category | Count | Description |
|---|---|---|
| Alias lookup | 7 | Entity property retrieval |
| Entity existence | 7 | Simple entity lookup |
| Geographic filter | 10 | Location-based queries |
| Cross-jurisdiction | 2 | Multi-authority queries |
| Set operations | 2 | AND/OR queries |
| Temporal | 3 | Date-based queries |
| Multi-hop | 4 | 2-3 hop traversal |
Usage:
aletheia evaluate-ragas \
--knowledge-graph terrorist_orgs \
--questions use_cases/terrorist_orgs/evaluation_questions.json
Learn More¶
- Running Evaluations - How to run evaluations
- RAGAS Metrics - Understanding metrics
- Avoiding Parametric Knowledge - Prevent contamination