Skip to content

Avoiding Parametric Knowledge

Parametric knowledge contamination occurs when an LLM answers from its training data rather than retrieved context. This guide explains how to detect and prevent it.

The Problem

Question: What is the capital of France?
Retrieved Context: [Information about French geography]
LLM Answer: Paris

Did the LLM answer from context or from memory?

For common knowledge, you can't tell. This makes evaluation unreliable.

Detection Signs

1. High Answer Similarity + Low Faithfulness

Metric Score Interpretation
Answer Similarity 0.95 Correct answer
Faithfulness 0.40 Not grounded in context

The LLM "knew" the answer but didn't use the context.

2. Correct Answers with No Context

If retrieval returns nothing but answers are correct, the LLM is using parametric knowledge.

3. Grounding Rejections

In strict mode, look for: - uncited_entities - Answer mentions things not in evidence - answer_without_citations - No evidence cited

Prevention Strategies

1. Use Domain-Specific Data

Choose data the LLM hasn't seen:

Good Bad
Internal company documents Wikipedia content
Recent news (post-training cutoff) Historical facts
Obscure sanctions data Common knowledge
Proprietary databases Public datasets

2. Use Grounding Verification

aletheia evaluate-ragas \
  --grounding-mode strict \
  --questions questions.json

Strict mode rejects answers not grounded in evidence.

3. Generate Synthetic Questions

Create questions from your graph that can only be answered with your specific data:

# Extract facts from your graph
facts = query_graph_facts(graphiti, group_id)

# Generate questions
for fact in facts:
    question = f"What is the relationship between {fact.source} and {fact.target}?"
    # This question requires your specific graph data

4. Use Recent Data

Data from after the LLM's training cutoff:

# Download latest sanctions data (updated weekly)
curl -o entities.ftm.json \
  "https://data.opensanctions.org/datasets/latest/..."

5. Use Obscure Data

Even historical data can work if it's obscure:

Likely Known Likely Unknown
Major terrorist groups Obscure shell companies
Famous sanctions Specific sanction programs
Country capitals Entity aliases

Testing for Contamination

Method 1: No-Context Test

Run evaluation with empty context:

# In evaluation code
context = ""  # Empty context
answer = llm.generate(question, context)

# If answers are correct, LLM is using parametric knowledge

Method 2: Wrong-Context Test

Provide deliberately wrong context:

# Swap context between questions
context_for_q1 = retrieve(q2)
answer = llm.generate(q1, context_for_q1)

# Correct answer = parametric knowledge

Method 3: Compare Grounding Modes

# Strict mode
aletheia evaluate-ragas --grounding-mode strict ...
# Result: Answer Similarity = 0.60

# Off mode
aletheia evaluate-ragas --grounding-mode off ...
# Result: Answer Similarity = 0.90

# Large gap = parametric knowledge contamination

The terrorist_orgs Dataset

The terrorist_orgs dataset is designed to minimize contamination:

  1. Specific aliases - LLMs don't know all aliases
  2. Program IDs - US-FTO219 vs general knowledge
  3. Cross-jurisdictions - UK proscriptions less known
  4. Recent designations - Post-training cutoff
aletheia evaluate-ragas \
  --knowledge-graph terrorist_orgs \
  --questions use_cases/terrorist_orgs/evaluation_questions.json \
  --grounding-mode strict

Best Practices Summary

  1. Always use grounding verification in production evaluations
  2. Start with strict mode to establish baseline
  3. Prefer obscure/recent data for evaluation
  4. Generate questions from your graph for guaranteed coverage
  5. Monitor grounding pass rates - declining rates suggest contamination
  6. Curate question sets to remove obviously-known questions

Learn More