Aviation Safety - Presentation Demo¶

Aletheia is a GraphRAG platform designed for domain-specific analytical reasoning over knowledge graphs.

It provides the following capabilities:

Data Ingestion across multiple domain-specific knowledge graphs
Schema inference mechanisms based on LLMs and Ontologies
Entities and Relationships Extraction, normalization and deduplication
Hybrid search: Graph traversal plus semantic embeddings
Temporal relationships
Embedded ontology access for domain-aware reasoning
Direct Cypher analytics with validation and sanitization
Cross-graph reasoning across independent knowledge domains
MCP connectors with self-describing, schema-aware tool interfaces

A. Why Safety Analysts Need Every Detail¶

The Swiss Cheese Model¶

Aviation safety operates on a fundamental principle: accidents don't happen because of a single failure. James Reason's Swiss Cheese Model explains how accidents occur in complex systems through the alignment of multiple weaknesses.

                  HAZARD
                    │
                    ▼
            ┌───────────────┐
            │   ░░░░░░░░░   │  ← Layer 1: Procedures
            │ ░░░░░  ░░░░░░ │     (hole: outdated checklist)
            └───────────────┘
                    │
                    ▼
            ┌───────────────┐
            │  ░░░░░░░░░░░  │  ← Layer 2: Technology
            │ ░░░░  ░░░░░░░ │     (hole: sensor calibration)
            └───────────────┘
                    │
                    ▼
            ┌───────────────┐
            │   ░░░░░░░░░   │  ← Layer 3: Training
            │ ░░░░░░░  ░░░░ │     (hole: CRM gap)
            └───────────────┘
                    │
                    ▼
              ⚠️ ACCIDENT
        (when holes align)

Each "slice of cheese" represents a defense layer (procedures, technology, training, supervision), and the "holes" are inherent weaknesses. An accident occurs only when holes in multiple layers align, allowing a hazard to traverse the entire system.

Component	Description
Defense Layers	Barriers like regulations, equipment, training, supervision
Holes (Weaknesses)	Human errors, technical failures, design flaws
Active Failures	Direct actions that trigger events (e.g., skipping a checklist item)
Latent Failures	Deep systemic defects waiting to be activated (e.g., poorly designed procedure)

Systemic Thinking

The model shifts focus from individual blame to systemic analysis. Errors are symptoms of deeper organizational and technical weaknesses that must be identified to prevent recurrence.

Why This Matters for GraphRAG¶

Safety analysts investigating incidents need to:

Trace causal chains — From latent conditions through active failures to the accident
Identify contributing factors — Weather, human factors, maintenance, equipment
Connect related incidents — Similar failures across different contexts
Find patterns — Recurring weaknesses in defense layers

This requires navigating relationships across multiple entities — exactly what GraphRAG enables. A simple keyword search cannot connect a maintenance procedure gap in Lyon with a similar QA failure in Palma de Mallorca. A knowledge graph can.

B. Why GraphRAG and not RAG?¶

RAG (Vector stores) allow AI to find conceptually similar information through embeddings: keyword search on steroids. Vector search measures proximity in semantic meaning, not structure or causality.

RAG is fast, scalable, and excellent for recall, but it doesn't understand relationships. RAG can tell you what looks alike, not what belongs together.

The Need for Structure

To move beyond pure semantic retrieval, AI needs a structured representation of meaning.

A way to connect facts, follow relationships, and reason across them.

That's where Knowledge Graphs come in.

C. Introducing EASA ECCAIRS for Aviation Safety¶

ECCAIRS (European Coordination Centre for Accident and Incident Reporting Systems) is the European framework used to collect, manage, and analyze aviation safety occurrences.

It covers:

Standardized procedures for reporting incidents by aviation stakeholders
Centralized and consolidated occurrence database where reports are stored and shared
Common aviation safety taxonomy that structures how events, causes, and contributing factors are described

Together, these elements ensure consistency, comparability, and regulatory alignment in safety data across Europe.

What is an Occurrence?¶

An occurrence is the formal record of any aviation safety-related incident or event. It combines:

Component	Description
Structured section	Based on the ECCAIRS taxonomy - encodes factual data such as event type, phases of flight, actors, and contributing factors
Narrative section	Free-text written in natural human language describing what actually happened, sometimes in different languages

Together, the structured data enables large-scale analysis and comparability, while the narrative preserves context, nuance, and operational detail that cannot be fully captured by predefined fields.

Domain Complexity

Aviation safety occurrences form a highly specialized niche domain, characterized by dense technical language, extensive use of acronyms, and domain-specific jargon rooted in operations, engineering, and regulation.

ECCAIRS in Aletheia, post-2026-05-02

The full ECCAIRS Aviation 7.1.0.2 taxonomy — 540 classes plus ~204,000 controlled-value individuals (occurrence categories, ATA chapters, flight phases, weather phenomena, …) — is loaded as a searchable knowledge graph (ontology_eccairs). search_ontology("SCF-NP") returns the System/Component Failure (Non-Powerplant) individual; search_ontology("ATA 29 hydraulic") returns the hydraulic-power chapter and its sub-chapters. The analyst can speak in code (Q6 in Section K) and the system decodes against the actual ontology.

D. ECCAIRS Ontology¶

Although it is published as a taxonomy rather than a formal ontology, the ECCAIRS model can be directly transformed into an ontology that describes the aviation safety domain.

It defines core domain concepts such as:

Occurrence
Runway
Aircraft
Engine
Flight Crew Member
Aerodrome

The resulting ontology represents a highly specialized domain vocabulary, using precise and formal terminology. For instance, Aerodrome is the canonical term covering airports, heliports, and other landing areas, reflecting the regulatory need for unambiguous and standardized language.

Ontology graph by the numbers: 540 classes, ~204,592 controlled-value individuals, ~205,132 nodes total in ontology_eccairs.

E. Occurrence Dataset¶

The aviation safety corpus has been engineered for analyst-grade demonstration. It contains 100 markdown incident reports under use_cases/aviation_safety/data/:

22 hand-authored cluster reports — four deliberate patterns planted to support multi-hop and cross-graph reasoning:

Cluster	Reports	Pattern
A — Operator systemic	6	Iberia A320-family hydraulic-line fatigue near a mounting bracket; all maintenance traces back to the Madrid base
B — Aircraft fleet defect	5	Embraer ERJ-195 DMC software anomaly under a specific approach + flap configuration; spans 5 operators / 5 airports
C — Airport environment	7	Frankfurt EDDF runway 25L approach bird strikes during the autumn migration window
D — Recommendation recurrence	4	EASA-SR-2024-091-03 issued from a TAP/Lisbon source incident; recurs at Vienna, Zurich, Amsterdam at operators that did not implement it

68 LLM-generated noise reports — produced by scripts/generate_aviation_corpus.py from a committed parameter table, validated through AviationSafetyParser. The noise raises retrieval pressure without diluting cluster signals.
9 multilingual reports (Spanish, French, German, Italian) embedded in the noise to demonstrate cross-language semantic search.

Structured fields use ECCAIRS canonical labels (Flight phase ∈ {Approach, En route, Landing, Manoeuvring, Standing, Take-off, Taxi, …}; Aircraft damage ∈ {None, Minor, Substantial, Destroyed, …}; Injury level ∈ {None, Minor, Serious, Fatal, …}). The cheatsheet lives at use_cases/aviation_safety/ontology_controlled_values.md.

Sample Document¶

Incident Report 2024-0856-EU

Metadata

Field	Value
Incident ID	2024-0856-EU
Date	2024-11-25
Time	19:33 UTC
Location	Barcelona-El Prat Airport (LEBL)
Country	Spain
Flight Phase	Cruise / En route

Aircraft

Field	Value
Type	Boeing 787-9
Registration	EC-NBM
Operator	Air Europa

Incident Description

Durante el vuelo de crucero a FL390 sobre el Mediterráneo occidental, aproximadamente 120 nm al este de Barcelona, la tripulación detectó un olor inusual a humo eléctrico en la cabina de vuelo. El comandante ordenó inmediatamente el uso de máscaras de oxígeno y declaró una emergencia (MAYDAY).

Se ejecutó el procedimiento de humo/fuego/vapores del QRH (Quick Reference Handbook). La tripulación identificó que el olor provenía del sistema de entretenimiento a bordo (IFE) en la cabina. El sistema IFE fue desactivado y el olor cesó gradualmente.

Findings

Primary Cause: Cortocircuito eléctrico en la unidad de distribución de energía del sistema IFE
Contributing Factors: Aging electrical components, possible moisture ingress
Human Factors: Correct application of emergency procedures by flight crew

F. The Challenge¶

The challenge for Aletheia is to generate accurate answers given the combined characteristics of both the input data and the questions.

Data Challenges¶

Aspect	Challenges
Both	Multiple languages, typos, abbreviations, inconsistent phrasing, highly technical aviation jargon
Input Data	Combines structured and unstructured information, may not align with ECCAIRS ontology (e.g., airport vs. aerodrome)
Questions	Can be ambiguous (e.g., referring to cities instead of aerodromes), may require multi-hop or temporal reasoning

Example Query¶

A question such as "What incidents in Barcelona or Madrid involved Boeing aircraft and resulted in emergency declarations (MAYDAY or PAN-PAN)?" must resolve to a single incident matching all criteria:

Field	Value
Location	Barcelona-El Prat Airport (LEBL)
Date	2024-11-25
Aircraft	Boeing 787-9 (EC-NBM)
Operator	Air Europa
Emergency	MAYDAY declared
Cause	Electrical short circuit in IFE power distribution unit
Outcome	Diverted to Barcelona, emergency landing, no injuries

The Madrid incident (2024-0734-EU) involved an Airbus A330-200, not Boeing, so it doesn't match.

G. The Knowledge Graphs¶

Aletheia's aviation safety use case spans three independent knowledge graphs, each built from different data sources and covering a distinct regulatory domain.

Three Domains, One Analytical Platform¶

Graph	Domain	Source	Purpose
Aviation Safety	2024 European occurrences	Synthetic ECCAIRS-style reports	What happened — incidents, aircraft, operators, locations
Safety Recommendations	EASA Safety Recommendations	EASA SR database	What was recommended — investigation findings and corrective actions
Airworthiness Directives	EASA Airworthiness Directives	EASA AD database	What was mandated — mandatory inspections, modifications, and operational limits

These three datasets mirror the real-world safety lifecycle: an incident triggers an investigation that produces recommendations, which may escalate to mandatory directives. In practice, analysts must cross-reference these databases manually. Aletheia connects them through knowledge graphs.

No Pre-Built Links

The three knowledge graphs are built independently from separate data sources. There are no pre-built cross-references between them. Aletheia bridges the gaps through entity resolution, semantic search, and ontology-informed reasoning at query time.

Aviation Safety Graph¶

100 European aviation occurrences from 2024, with multilingual narratives (English, Spanish, French, German, Italian).

Entities:

Type	Count
Occurrence	100
Aircraft	~95
AerodromeGeneral	~60
Operator	~55
Manufacturer	~30
Country	~20

Relationships:

Relationship	Description	Count
INVOLVED_AIRCRAFT	Occurrence → Aircraft	9
OCCURRED_AT	Occurrence → Aerodrome	10
LOCATED_IN	Aerodrome → Country	10
OPERATED_BY	Aircraft → Operator	10
MANUFACTURED_BY	Aircraft → Manufacturer	10

Safety Recommendations Graph¶

~243 EASA Safety Recommendations linked to 138 occurrences, issued by 37 investigation authorities across 24 countries.

Entities:

Type	Count
Occurrence	138
Aircraft	237
SafetyRecommendation	~243
Manufacturer	59
SafetyInvestigationAuthority	37
Country	24

Relationships:

Relationship	Description	Count
STEMS_FROM	SafetyRecommendation → Occurrence	242
ISSUES	SafetyInvestigationAuthority → SafetyRecommendation	158
INVOLVES	Occurrence → Aircraft	147
LOCATES	SafetyInvestigationAuthority → Country	137
MANUFACTURES	Manufacturer → Aircraft	131

Airworthiness Directives Graph¶

~1,000 EASA Airworthiness Directives covering 331 aircraft types, referencing 228 technical documents, approved by 137 design approval holders.

Entities:

Type	Count
AirworthinessDirective	~1,000
AircraftType	331
Document	228
DesignApprovalHolder	137

Relationships:

Relationship	Description	Count
APPROVED_BY	AirworthinessDirective → DesignApprovalHolder	759
APPLIES_TO	AirworthinessDirective → AircraftType	677
REFERENCES	AirworthinessDirective → Document	199
SUPERSEDES	AirworthinessDirective → AirworthinessDirective	81

Scale

Combined, the three graphs contain over 2,000 entities and 5,000+ typed relationships. The Airworthiness Directives graph alone is larger than many production GraphRAG deployments.

Knowledge Graph Visualization¶

Aviation Safety Knowledge Graph

FalkorDB Graph Explorer¶

H. Building the Knowledge Graphs¶

When building a knowledge graph, an LLM is used to extract entities and relationships from text.

Without schema guidance, this process easily leads to inconsistent entity types, ambiguous relationships, and fragmented vocabularies, resulting in a graph that lacks structural coherence.

Why Schema Matters¶

Benefit	Description
Type consistency	Identical concepts use the same canonical types
Relationship clarity	Relationships follow a clear, queryable vocabulary
Property standardization	Attributes are named consistently across entities
Improved retrieval	Queries reliably return all relevant results

Schema Inference in Aletheia¶

Schema inference determines the vocabulary of entity types, relationships, and properties that the LLM must use during extraction.

The most robust approach combines LLM-assisted schema inference with domain ontologies. In this model:

LLM-driven schema discovery
Semantic alignment with existing ontology through knowledge graph search and embeddings
Automatic resolution of terminology mismatches

This hybrid approach enables the extraction of high-confidence schemas, enriched and constrained by authoritative domain knowledge.

Inferred Schemas¶

Each knowledge graph has its own domain-specific schema, inferred through the same ontology-guided process.

Aviation Safety Schema

Knowledge Graph: aviation_safety_quality

✅ Entity types: 6
   - Occurrence
   - Aircraft
   - AerodromeGeneral
   - Operator
   - Manufacturer
   - Country

✅ Relationship types: 5
   - INVOLVED_AIRCRAFT
   - OCCURRED_AT
   - OPERATED_BY
   - MANUFACTURED_BY
   - LOCATED_IN

Safety Recommendations Schema

Knowledge Graph: safety_recommendations

✅ Entity types: 6
   - SafetyRecommendation
   - Occurrence
   - Aircraft
   - SafetyInvestigationAuthority
   - Manufacturer
   - Country

✅ Relationship types: 5
   - STEMS_FROM
   - ISSUES
   - INVOLVES
   - MANUFACTURES
   - LOCATES

Airworthiness Directives Schema

Knowledge Graph: airworthiness_directives

✅ Entity types: 4
   - AirworthinessDirective
   - AircraftType
   - Document
   - DesignApprovalHolder

✅ Relationship types: 4
   - APPLIES_TO
   - APPROVED_BY
   - REFERENCES
   - SUPERSEDES

Ingestion Process¶

Reproducible materialization

The 100-report corpus is reproducible from committed sources: 22 hand-authored cluster seeds in cluster_seeds/ plus a 68-row parameter YAML at scripts/aviation_noise_params.yaml. Run python scripts/generate_aviation_corpus.py to regenerate data/ from scratch (~10 min plus LLM costs for the noise reports).

I. ECCAIRS Ontology Graph¶

Ontologies encode expert knowledge about a domain.

In the context of knowledge graphs, an ontology serves as a schema or blueprint that describes:

Component	Description
Classes	The types of entities (e.g., Person, Organization, Aircraft)
Properties	Attributes of entities (e.g., name, date, location)
Relationships	How entities connect to each other
Constraints	Rules about valid combinations

The Terminology Problem¶

Without ontology alignment, inconsistencies arise:

User Language	Ontology Term
Airport	Aerodrome
Aircraft, Plane, Airplane	Aircraft
Operator, Airline, Carrier	Operator
Occurrence, Incident, Event	Occurrence

Ontology Graph Solution¶

How can we link an entity (like airport) with the corresponding ontology class (aerodrome)?

Aletheia provides the ability to build an ontology graph — a semantic projection of a formal ontology that enables the system to access domain knowledge through embeddings and graph search.

The ontology graph is built before the knowledge graph, ensuring schema inference can rely on it.

During schema inference, when detecting an "Airport" entity, Aletheia queries the ontology graph using semantic search to find the matching canonical class ("Aerodrome").

Ontology Graph in FalkorDB¶

ECCAIRS Ontology Structure¶

Ontology in the MCP Connector

The ECCAIRS ontology graph is not just a build-time tool. Each Aletheia MCP connector provides embedded ontology access through search_ontology and explore_ontology tools, enabling the LLM to decode domain terminology at query time — directly from within any data connector.

Searching the ECCAIRS ontology¶

The ontology graph is queried like any other Aletheia graph. Two examples used live in the demo:

search_ontology("SCF-NP")
→ "SCF-NP: System/component failure or malfunction [non-powerplant]"
   (Occurrence categoryValue individual)

search_ontology("ATA 29 hydraulic power")
→ "2900 Hydraulic power system" (level 2)
→ "2910 Hydraulic main system" (level 3)
→ "Hydraulic main system line/fitting" (level 4)

These individual nodes are wired by INSTANCE_OF edges to their parent value classes (e.g., Occurrence_categoryValue, Ata_chapterValue), so kg_explore_node can pivot from an individual to its peers in the same enumeration.

J. The MCP Connector¶

Aletheia exposes each knowledge graph through a self-describing MCP (Model Context Protocol) connector — a standardized interface that allows any LLM to discover, search, traverse, and analyze the graph.

Each connector is domain-aware: it advertises what entity types, relationship types, and query patterns exist in its graph, enabling the LLM to generate precise queries without guessing at graph structure.

Tool Surface¶

Each connector provides 13 tools organized into five capability groups:

Group	Tools	Purpose
Semantic Discovery	`search`, `explore_node`	Find entities by meaning, traverse graph neighborhoods
Schema & Ontology	`get_schema`, `search_ontology`, `explore_ontology`	Understand graph structure, decode domain terminology
Cypher Analytics	`run_cypher`	Count, aggregate, compare, compute — direct analytical queries
Community Intelligence	`build_communities`	Cluster entities, detect thematic groups
Data Management	`add_memory`, `get_episodes`, `get_episode_context`, `delete_entity_edge`, `delete_episode`, `clear_graph`, `get_status`	Ingest, inspect, maintain

Search Capabilities¶

The search tool provides unified access to nodes, edges, and communities with advanced filtering:

Feature	Description
Entity type filtering	Restrict results to specific types (e.g., only `SafetyRecommendation` entities)
Edge type filtering	Search for specific relationships (e.g., only `STEMS_FROM` edges)
Reranker strategies	Five ranking algorithms: reciprocal rank fusion, diversity ranking, cross-encoder precision, graph proximity, and importance ranking
Graph-proximity search	Bias results toward a known entity's neighborhood using `center_node_uuid`
BFS from known entities	Breadth-first graph traversal starting from specific nodes
Temporal filtering	Query edges valid at a specific point in time
Community search	Search across detected entity clusters

Cypher Analytics¶

The run_cypher tool executes read-only Cypher queries with a four-stage validation pipeline:

LLM generates Cypher
        │
        ▼
┌─ Stage 1: LLM Fixups ─────────────┐
│  Smart quotes, unicode, missing    │
│  RETURN clause, direction fixes    │
└────────────────────────────────────┘
        │
        ▼
┌─ Stage 2: FalkorDB Dialect ────────┐
│  Auto-fix: date(), toLower, etc.   │
│  Reject: APOC, pattern comp., etc. │
└────────────────────────────────────┘
        │
        ▼
┌─ Stage 3: Security Whitelist ──────┐
│  Only: MATCH, WHERE, RETURN, WITH  │
│  Reject all write operations       │
└────────────────────────────────────┘
        │
        ▼
┌─ Stage 4: Safety Injection ────────┐
│  Auto-inject LIMIT, timeout        │
└────────────────────────────────────┘
        │
        ▼
   GRAPH.RO_QUERY (database-level
   read-only enforcement)

Results are returned as typed JSON envelopes (scalar, tabular, graph, or path) with metadata including row count, truncation status, execution time, and a list of any auto-corrections applied to the query.

Embedded Ontology Access¶

Every data connector includes search_ontology and explore_ontology tools that query the ECCAIRS ontology directly. The LLM can decode aviation shorthand (e.g., "SCF-NP" → "System/Component Failure - Non-Powerplant", "ATA 36" → "Pneumatic system") without switching to a separate ontology server.

Self-Describing Connectors¶

Each connector advertises its domain schema in the tool metadata. For example, the Safety Recommendations connector tells the LLM:

Entity types: Aircraft, Country, Manufacturer, Occurrence,
              SafetyInvestigationAuthority, SafetyRecommendation

Edge types:   INVOLVES, ISSUES, LOCATES, MANUFACTURES, STEMS_FROM

This means the LLM knows — before making any query — that STEMS_FROM connects SafetyRecommendation to Occurrence, that ISSUES connects SafetyInvestigationAuthority to SafetyRecommendation, and so on. It can write precise filtered searches and valid Cypher queries on the first attempt.

Discovery + Analytics

The combination of semantic search for discovery ("find me relevant entities") and Cypher for analytics ("count, aggregate, prove") means the LLM can handle both exploratory and computational questions through the same connector.

K. Aletheia in Action¶

The demo follows an analyst's actual workflow over a quarter's worth of incidents: triage the corpus, surface anomalies, narrow to one, escalate to a structured investigation, and close on a budget-bounded decision. The four blocks correspond to the analyst's mental model — "What's recurring? Why? What should we do? What can we afford?" — not to a tour of capabilities.

Each block is presenter-ready. Italics under a query describe what the system does before output appears. Admonition boxes labelled Live presenter mark moments where latency or autonomy needs framing for the audience.

Chat structure¶

The 21 questions run across six chats. Each boundary is a real topic shift — splitting the chats keeps each LLM context window focused on what the analyst is doing right now and avoids cross-talk between unrelated tool results.

Chat	Questions	What's in it	Why a fresh chat
1	Q1–Q4	Triage table → save → live edit → chart	The four artifacts share Q1's aggregation result; one continuous turn keeps that data in context.
2	Q5–Q6	Map + ECCAIRS jargon pivot	New visual pivots; Q6's `search_ontology` calls don't need Q1–Q4's chart context cluttering the prompt.
3	Q7–Q10	Iberia operator deep-dive (timeline → profile → cross-graph → save)	Block boundary. The two new connectors (`safety_recommendations`, `airworthiness_directives`) come into play; reset to start with a clean tool surface.
4	Q11–Q12	Network graph as independent UI artifact + save	`graph_build` returns a large structured payload; isolating it from Q7–Q10's Cypher results keeps the LLM context under the 200K-token ceiling.
5	Q13–Q17	Structured investigation: briefing → elicitation → reasoning agent → save	The briefing's elicitation flow and Phronesis investigation are one continuous conversation by design.
6	Q18–Q21	Euboulia decision pipeline (8 stages)	Decision flow is one continuous conversation by design — each pipeline stage's output feeds the next.

Inline markers in the script (🆕 **New chat**) flag each transition.

Block 1 — "Patterns of the quarter"¶

Single graph (aviation_safety). Goal: triage 2024 occurrences, surface a mosaic of anomalies, demonstrate the workspace UX (table → chart → map, save artifacts, edit live), and end on the ontology-jargon question that picks one anomaly to drill into.

Block opener. "We've ingested every 2024 occurrence into the aviation safety graph — 100 reports across the EU. Before drilling anywhere, I want to know what patterns are recurring this quarter."

Q1 — Triage table¶

"What recurring failure patterns do you see in our 2024 occurrences? Group by primary cause family and operator. Include incident count, severity distribution, and the airports involved."

The system runs a Cypher aggregation; the table renders inline in the chat.

Capability	What's being tested
Schema-aware aggregation	Group-by across multiple entity types
Severity bucketing	The `Aircraft Damage` and `Injuries` fields are read as ECCAIRS canonical tokens
Airport rollup	Multi-airport patterns surface in the same row

The top three patterns surface visibly: the Iberia hydraulic cluster, the Embraer ERJ-195 DMC software anomaly, and the Frankfurt EDDF autumn bird strikes.

Q2 — Save the table to the workspace¶

"Save that as a document called 'Recurring Patterns 2024' under a folder named '2025 Annual Review'."

A new folder appears in the workspace tree; the table renders as a saved markdown document.

📁 Workspace now contains: - 2025 Annual Review/Recurring Patterns 2024.md

Q3 — Live edit the saved document¶

"Keep only the top three patterns. Drop the long tail."

The saved document re-renders in place; the workspace tree timestamp updates.

Capability	What's being tested
In-place artifact editing	The saved doc is a live artifact, not a snapshot

Q4 — Chart by operator¶

"Now show me a chart: severity-weighted incident frequency, top operators on the X axis, the three patterns stacked."

A bar chart renders; the chart is itself savable.

Capability	What's being tested
Workbench chart visualization	Native chart from a structured query result
Stacked-by-pattern reading	Operator + cause-family is a visual two-key cross-tab

The Iberia bar towers over the others on the hydraulic stack. Operators of Embraer ERJ-195 cluster vertically. Frankfurt EDDF arrivals cluster on the Lufthansa/Ryanair/KLM/Eurowings/Condor bars.

🆕 New chat — chat 2 of 6

Open a fresh chat in the workbench before Q5. Q1–Q4 share Q1's aggregation result and the saved triage doc; Q5's map and Q6's ontology pivot start a different visual+ontology thread, and a fresh chat keeps the LLM context focused on the geocoder + ECCAIRS lookups.

Q5 — Map the incidents¶

"Plot the incidents on a map, color-coded by pattern."

A map renders with markers colored by cluster; the geocoder resolves airport names to coordinates.

Capability	What's being tested
Geocoding from airport names + ICAO codes	No pre-stored coordinates — resolved via the geocoder MCP
Map UI as an independent visualization component	First-class artifact, savable

The Iberia hydraulic cluster traces an Iberian-peninsula + southern-European footprint. The ERJ-195 DMC events spread across northern Europe. The Frankfurt cluster is a single dense pin.

Q6 — ECCAIRS jargon focus pivot (showpiece)¶

"Any SCF-NP events on Airbus A320-family aircraft involving the hydraulic system during take-off and landing in 2024? Decode SCF-NP and the flight-phase codes against the ECCAIRS ontology. Which operator concentrates the cluster?"

The system queries the ECCAIRS ontology graph: search_ontology("SCF-NP") returns the System/Component Failure (Non-Powerplant) individual; search_ontology("Take-off"/"Landing") returns the matching Flight_phaseValue individuals. Then a Cypher query against aviation_safety filters by aircraft-type prefix and flight-phase tokens. The narrative answer is at analyst register — codes are decoded inline.

Capability	What's being tested
Ontology-as-decoder	ECCAIRS individuals (post-PR1) are real searchable nodes; the LLM does not rely on pretraining for code decoding
Multi-attribute Cypher filter	Aircraft type + flight phase + cause family in a single aggregation
Domain register	The answer reads as a safety analyst would write it, not as a chatbot

The cluster surfaces clearly: Iberia, six A320-family hydraulic-line incidents, six airports.

Live presenter

The ontology decode is the technical showpiece of Block 1. If search_ontology returns slowly (rare; it's vector-indexed but the first call after a cold start can take ~2s), use the moment to point out that ECCAIRS as a graph means every code in the analyst's vocabulary is queryable — not just the ones the LLM was trained on.

Block close. "Iberia is the strongest signal. Let me drill into it."

Block 2 — "Iberia deep dive"¶

Multi-graph (aviation_safety + safety_recommendations + airworthiness_directives). Goal: build the operator picture, surface the regulatory gap, save findings, and visualize the network as an independent UI component.

Block opener. "Iberia, six hydraulic-line incidents on A320-family aircraft. I want the timeline first, then the operator's risk profile, then I'll cross to the regulators to see if anyone is covering this specific failure mode."

🆕 New chat — chat 3 of 6

Open a fresh chat in the workbench before Q7. Block 2 starts here — two new connectors (safety_recommendations, airworthiness_directives) come online; a clean chat means the LLM sees the full multi-graph tool surface without competing context from Block 1's aviation_safety-only queries.

Q7 — Operator timeline¶

"Show me Iberia's complete 2024 hydraulic timeline: incident, aircraft, airport, date, runway phase, severity."

A chronological table renders; one row per incident.

Capability	What's being tested
Filtered Cypher with order-by-date	A focused timeline pulled from a 100-report corpus
Multi-column projection	Each row carries six fields the analyst needs at a glance

Q8 — Operator profile¶

"Build Iberia's risk profile from this data: fleet composition involved, common contributing factors, maintenance organization signals, geographic spread."

The system aggregates contributing factors, maintenance references, and locations into a structured profile section. Execution is direct — the analyst does not want a briefing yet.

Capability	What's being tested
Direct execution under request	"Build" is a verb of action; no briefing escalation
Multi-aspect aggregation	Profile spans four dimensions in one synthesis

Prompt-design rationale

The wording "Build Iberia's risk profile from this data" is deliberate. The system has an Operator Risk Profile briefing template; if the analyst said "Profile Iberia" alone, the assistant might escalate to briefing creation. The phrase "from this data" combined with the imperative "Build" keeps execution direct. Briefing escalation is for Block 3.

Q9 — Cross-graph SR + AD coverage¶

"Cross-reference the safety_recommendations and airworthiness_directives graphs: any open EASA Safety Recommendation covering hydraulic-line fatigue near mounting-bracket attachments? Any active Airworthiness Directive on A320-family hydraulic-line inspection intervals or mounting-bracket attachments?"

The system fans out two parallel queries — one against safety_recommendations, one against airworthiness_directives. Results land in a two-column comparison.

Capability	What's being tested
Cross-graph fan-out	Two graphs queried in one analyst turn
Coverage assessment	The synthesis reads as "regulation X is adjacent but doesn't cover this specific failure mode"

The result reveals the regulatory gap. SR coverage of fixed-wing A320 hydraulic-line fatigue near mounting brackets is absent (the existing hydraulic SRs are AS 350 helicopter servo-transparency cases). AD coverage is adjacent: EASA AD 2024-0097 targets A318/A319/A320/A321 main-landing-gear door actuator fittings — but not the hydraulic-line/mounting-bracket pattern Iberia is showing.

Categorical-filter pitfall

kg_get_schema returns property names, not values. Different graphs use different vocabularies (status='Active' vs 'EFFECTIVE' vs 'Open'). Before equality-filtering on status, severity, phase, etc., the assistant calls kg_profile_graph first — every property with distinct_count < 20 returns its top values with frequencies, so the filter uses the corpus's actual canonical value. A wrong value silently zeros the result and produces a false "no coverage" conclusion.

Q10 — Save the findings¶

"Save these findings as 'Iberia hydraulic pattern' under the Annual Review folder."

A new document is added to the workspace.

📁 Workspace now contains: - 2025 Annual Review/Recurring Patterns 2024.md - 2025 Annual Review/Iberia hydraulic pattern.md

🆕 New chat — chat 4 of 6

Open a fresh chat in the workbench before Q11. graph_build returns a structured payload of nodes + edges that can run into the thousands of tokens; isolating it from Q7–Q10's Cypher results keeps the prompt under the 200K-token ceiling. A fresh chat also forces the LLM to re-read the schema before building the graph, which catches stale assumptions.

Q11 — Network graph (independent UI component)¶

"Show me Iberia's 2024 incident network as a graph: the operator at the center, first-degree links to aircraft and airports, second-degree to contributing factors and maintenance events."

The graph viewer opens as a separate workspace artifact. Edges are typed; the analyst can pan and zoom; clicking a node opens its details.

Capability	What's being tested
Graph-viz UI as independent component	Like the map, the graph is a first-class artifact — not a one-off chat output
Mixed-degree neighborhood	First-degree (operator → aircraft/airport) + second-degree (contributing factors, maintenance) in one render

This is the visual moment of Block 2. The graph shows the six aircraft fanning out from Iberia, the airports fanning out from each aircraft, and the contributing factors collapsing into a few common nodes — visually the systemic pattern is unmistakable.

Live presenter

Pause on the graph. The shared maintenance-organization node and the shared contributing-factor nodes are what the cross-graph queries (Q9) couldn't surface. Visual decomposition is the strength of an interactive graph view.

Recovery aside

If the graph viewer renders blank: refresh the workspace pane — the layout engine occasionally drops the first frame on cold load.

Q12 — Save the graph¶

"Save the graph in the Iberia folder."

The assistant offers two folder layouts: (a) create a new Iberia/ folder at the workspace root, or (b) save inside the existing 2025 Annual Review/ folder. The script chooses (a) create new Iberia folder — the workspace listings below assume a flat Iberia/ sibling. The assistant then writes the graph as Iberia/2024-incident-network.graph.

📁 Workspace now contains: - 2025 Annual Review/Recurring Patterns 2024.md - 2025 Annual Review/Iberia hydraulic pattern.md - Iberia/2024-incident-network.graph

Block close. "I have the operator picture and the regulatory gap. Time to escalate this to a structured investigation that leaves a PROV-O audit trail."

Block 3 — "Structured investigation"¶

Briefing creation + reasoning agent. Goal: introduce briefings explicitly. The analyst escalates to a scoped investigation with PROV-O auditing.

Block opener. "Manual queries will only get me so far. I want a structured, auditable investigation. The graph layer should plan and execute, and the briefing keeps the scope honest."

🆕 New chat — chat 5 of 6

Open a fresh chat in the workbench before Q13. The briefing flow + Phronesis investigation that follows is one continuous conversation through Q17 — do not open another chat mid-flow. A clean starting context is essential because the elicitation cards (Q14) and the reasoning loop (Q15) both build on the briefing's state machine; carrying over Block 2's tool results would confuse the briefing extractor.

Q13 — Trigger briefing creation¶

"I want to launch a structured investigation. Build me a consolidated risk briefing for Iberia across all three graphs."

The assistant offers the available aviation templates as structured options. The picker shows the five aviation-profile templates (Operator Risk Profile, Fleet Defect Investigation, Regulatory Gap Analysis, Recommendation Lifecycle Trace, Airport Safety Profile) plus the generic Investigation Briefing. The analyst selects Operator Risk Profile.

Capability	What's being tested
Briefing template registry	Profile-aware — only aviation templates appear
Structured option presentation	Not free text; the picker is the right UI for the choice
Briefing-vs-direct disambiguation	The verb "launch a structured investigation" is what triggers escalation

Q14 — Elicitation walkthrough (narrative beat)¶

The assistant's elicitation flow runs before any heavyweight reasoning starts. Phase A is graph-grounded reconnaissance; Phase B is a sequence of structured-options cards, one section at a time. Each card lists 3–4 mutually-exclusive options; the analyst clicks the matching choice.

Phase A — Reconnaissance (no analyst input). The assistant runs semantic_search against all three graphs and explore_node on the verified Iberia node, then auto-fills the Entry Points table (Iberia → Operator, the involved aircraft types, the airport ICAOs) and the Domain Terms section (decoded ECCAIRS codes from search_ontology).

Phase B — Section cards (analyst chooses one option per card). The script's recommended choices below produce the comprehensive cross-graph briefing the rest of the demo expects.

#	Section	Question	Options	Recommended choice
1	Initial question	"Use the assistant's suggested investigation question, or enter your own?"	(free-text confirmation)	"yes, that question works"
2	Objective	"Focus on all risk dimensions, or prioritize one area?"	All risk dimensions / Maintenance-focused / Fleet-specific risk / Regulatory gap	All risk dimensions
3	Scope — temporal	"RFI mentions 2024 specifically. Should we…"	2024 data only / 2024 with historical context / Extended window (2023-2024)	2024 data only
4	Scope — geographic	"Iberia incidents at LEMD, BCN, PMI, BIO, and one diversion to LYS. Geographic scope?"	Spanish operations focus / Madrid hub-centric / All incident locations	All incident locations
5	Scope — fleet	"2024 incidents involve A320, A321, A330, A350. Fleet scope?"	All Airbus types / Narrow-body focus (A320 family) / Incident-specific aircraft	Narrow-body focus
6	Graph Strategy	"Cross-reference strategy across the three graphs?"	Exhaustive coverage / Gap detection / Lifecycle chain / Exhaustive with gap escalation	Exhaustive coverage
7	Expected Output — format	"Output format for a 7-incident risk profile?"	Risk matrix / Comprehensive report / Tabular analysis / Executive narrative	Comprehensive report
8	Expected Output — depth	"Detail level?"	Detailed (specific incidents + citations) / Comprehensive (deep-dive + cross-references)	Comprehensive
9	Expected Output — must-include	"Which elements must appear?"	Systemic clusters + maintenance signals / Fleet-specific regulatory coverage / Prioritized recommendations / All comprehensive elements	All comprehensive elements
10	Constraints	"Any specific exclusions?"	2024 scope limitation / Iberia mainline only / No specific constraints / Other constraints	No specific constraints

Phase C — Approval. The completed briefing renders with all eight sections filled (Objective, Scope, Entry Points, Domain Terms, Graph Strategy, Expected Output, Constraints, Analyst Decisions). The analyst types "approved, proceed with the investigation"; the briefing transitions Draft → IN_REVIEW → APPROVED. The assistant then asks once more "Would you like to start the investigation now?" — the analyst confirms with "yes, start the investigation now".

Capability	What's being tested
Elicitation as graph-grounded dialogue	Recon happens before asking the analyst anything
Auto-populated sections	Domain Terms decoded from the ontology graph; Entry Points verified from the connector
Cross-graph scope expansion	"Exhaustive coverage" choice (card 6) commits the briefing to query all three graphs
Confirmation gate	The agent never starts the reasoning loop without an explicit "yes, start the investigation" — the briefing-approved state is necessary but not sufficient

Live presenter

The Operator Risk Profile briefing has the anchor verbs the audience needs to see: Objective, Scope, Entry Points, Graph Strategy. Walk through them on screen as the elicitation completes — this is the moment the audience sees the difference between "chat with the data" and "investigate the data". The 10 cards typically run in ~3 minutes; each click triggers an update_briefing_content call you can point at as on-screen evidence that the briefing document is being filled live, not buffered.

Q15 — Watch the investigation run (no prompt; agent autonomy)¶

The reasoning agent (Phronesis) runs autonomously. The investigation graph populates incrementally in the workspace as the planner-executor loop progresses through plan → execute → observe → converge. Tool calls (Cypher, semantic search, ontology lookups) are visible inline; nothing is silent. The status panel shows the loop state: Planning → Executing → Synthesizing → Verifying → Complete.

Capability	What's being tested
Reasoning autonomy with audit	The agent makes decisions; PROV-O records every one
Live investigation graph	The audience watches the case being built — ToolCall and Finding nodes appear in real time
Adaptive replanning	If a tool returns an unexpected result shape, the planner replans — the cycle counter increments, the investigation does not crash

Live presenter

For the Operator Risk Profile briefing the investigation typically runs 3–5 minutes and executes 10–20 autonomous tool calls (mostly run_cypher against aviation_safety, with search against safety_recommendations/airworthiness_directives for cross-graph coverage). The cycle counter (e.g. "Planning (cycle 8)") reflects replans, not raw step count — a healthy investigation may go through 15–20 cycles. Don't interrupt it. The pace is part of the value: this is what "think for me" looks like — measured, inspectable, with intermediate observations recorded.

Operational rule

Phronesis caches MCP sessions to each connector at startup. If a connector is restarted while Phronesis is up, its session goes stale (the connector returns "Unknown tool" on the first call). Whenever a connector restarts, restart aletheia-reasoning too before starting an investigation. Auto-reconnect on Session terminated is a known follow-up in aletheia/discovery/client.py.

Q16 — Show the investigation graph¶

"Show me the investigation graph."

The PROV-O investigation graph renders as an independent UI component, like the network graph in Block 2 but scoped to the run.

Capability	What's being tested
PROV-O investigation graph	Each plan/execute/observe step is a node; their wiring is the audit trail
Replay-ready	The analyst can step through the investigation post-hoc

Q17 — Save the briefing and investigation¶

"Save the briefing and the investigation results in the Iberia folder."

Two artifacts are written to the existing Iberia/ folder (created in Q12). The assistant generates each as a markdown file: the briefing as the structured Operator Risk Profile document; the investigation results as the synthesized analyst-facing report (Executive Summary, Cluster analyses, Regulatory Gap, Recommendations, Claims).

📁 Workspace now contains: - 2025 Annual Review/Recurring Patterns 2024.md - 2025 Annual Review/Iberia hydraulic pattern.md - Iberia/2024-incident-network.graph - Iberia/Iberia Risk Profile Briefing 2024.md - Iberia/Iberia 2024 Consolidated Risk Assessment.md

Block close. "The investigation surfaced an open EASA Safety Recommendation directly applicable to the maintenance-audit gap, plus two more from the broader regulatory inventory that several operators in our 2024 corpus haven't yet implemented. Now I need to decide which we can fund."

Block 4 — "Decision: which recommendations do we fund?"¶

Decision agent (Euboulia). Goal: turn the analyst's accumulated portfolio (gap-analysis findings + EASA-derived recommendations) into a budget-bounded action plan.

Block opener. "From triage and the Iberia deep dive I have a candidate portfolio of safety actions, plus the EASA recommendations the investigation surfaced. Five-million-euro annual budget. Let me frame this for the decision agent."

🆕 New chat — chat 6 of 6

Open a fresh chat in the workbench before Q18. The Euboulia 8-stage decision pipeline that follows is one continuous conversation through Q21 — do not open another chat mid-pipeline. A clean starting context is essential because each pipeline stage's output is the next stage's input (analyzer → extractor → formulator → coder → assembler → executor → verifier → interpreter); contaminating it with Block 3's Phronesis state would derail the formulation.

Q18 — Frame the decision¶

"We have a candidate portfolio of safety actions: the Iberia maintenance audit, the ERJ-195 DMC software bulletin push, the Frankfurt 25L wildlife management upgrade, and three EASA-derived recommendations from the gap analysis. Each has an estimated implementation cost and an expected risk reduction. Help me decide which to implement under our €5M annual safety improvement budget — maximize total risk reduction."

Euboulia (the decision agent) is invoked. The 8-stage pipeline begins: analyzer → extractor → formulator → coder → assembler → executor → verifier → interpreter. The first three stages are gated by short approval prompts ("Does this problem statement capture what you're trying to solve?", "Are the extracted parameters correct and complete?", "Are the constraints and objective correctly identified?") — the analyst clicks through with Yes, proceed, Parameters look good, Constraints and objective are correct.

The analyst supplies the per-action input data when the executor stage asks for it:

#	Action	Cost (€)	Risk reduction
1	Iberia maintenance audit	1,200,000	100
2	ERJ-195 DMC software bulletin push	350,000	25
3	Frankfurt 25L wildlife management upgrade	2,800,000	60
4	EASA recommendation #1 (from gap analysis)	600,000	40
5	EASA recommendation #2 (from gap analysis)	1,500,000	50
6	EASA recommendation #3 (from gap analysis)	450,000	20

MaxBudget = 5,000,000. Total cost if all six were selected: €6.9M — the budget binds, so the optimizer must choose. (Risk reduction is a dimensionless score; the units are arbitrary as long as they are consistent across rows.)

Capability	What's being tested
Decision agent invocation	A natural-language portfolio question becomes a formal optimization problem
Inline parameter extraction	Costs and risk reductions are surfaced in Stage 2 (extractor) for analyst review before solving
Stage gating with cheap approvals	Each early stage has a one-click confirmation; the analyst can revise before the solver runs

Q19 — Inspect the formulation¶

"Walk me through the formulation: what's the objective function, what are the constraints, what are the decision variables?"

The decision agent renders its problem as KaTeX — visible math, not a black box. The Problem panel header classifies it as BIP — Binary Integer Program, 1 binary variable, 2 constraints, 4 parameters (N, Cost, RiskReduction, MaxBudget), solver CVXPY (CBC backend).

The formulation is a 0-1 knapsack:

Decision variables: $\text{Select}_i \in {0, 1}$ for $i \in {1, \ldots, N}$ — one per candidate. Selected at most once (binary).
Objective: $\max \sum_{i=1}^{N} \text{RiskReduction}_i \cdot \text{Select}_i$
Budget constraint: $\sum_{i=1}^{N} \text{Cost}_i \cdot \text{Select}_i \leq \text{MaxBudget}$, with $\text{MaxBudget} = 5{,}000{,}000$.

The formulate_clause stage emits each clause with a confidence score (5/5 expected for this problem). After the analyst accepts ("Model is correct"), the coder stage generates the CVXPY Python script, which the analyst can inspect under the Code tab before it runs.

Capability	What's being tested
Transparent LP/MILP	The optimization is visible — the analyst sees what's being solved
Decision-as-conversation	Formulation review is an ordinary turn, not a hidden step
Per-clause confidence	The formulator publishes its uncertainty so low-confidence clauses can be revised before the solver runs

Live presenter

The KaTeX rendering is the audience-facing proof that this is real optimization, not LLM hand-waving. Pause briefly on the math, then click the Code tab — the generated CVXPY script is also visible. The audience sees the entire chain: prose → math → code → solver result.

Q20 — Run the optimization¶

"Run the optimization. Show me the optimal portfolio with the rationale for each item picked and rejected."

Stages 6–8 of the pipeline run: executor solves the BIP via CVXPY in a sandboxed subprocess, verifier confirms the solution against the constraints, interpreter renders the analyst-facing artifact. The Solution panel header reads optimal; the Objective Value reads 235.

The optimal portfolio for the Q18 input data:

#	Action	Cost (€)	Risk reduction	Decision
1	Iberia maintenance audit	1,200,000	100	✅ Selected
2	ERJ-195 DMC software bulletin push	350,000	25	✅ Selected
3	Frankfurt 25L wildlife management upgrade	2,800,000	60	❌ Rejected
4	EASA recommendation #1	600,000	40	✅ Selected
5	EASA recommendation #2	1,500,000	50	✅ Selected
6	EASA recommendation #3	450,000	20	✅ Selected

Portfolio summary: 5 of 6 actions implemented · total cost €4.1M of €5M (82% utilization, €900K slack) · total risk reduction 235 units · 1/1 constraints verified.

Why Frankfurt 25L was excluded: at €2.8M it would consume the entire remaining budget after the other five (which together cost €4.1M and deliver risk reduction 235); swapping any selected item for Frankfurt would lower the objective. The interpreter renders the rationale inline in plain English, sourced from the optimizer's primal/dual values — not an LLM rationalization.

Capability	What's being tested
End-to-end BIP via Euboulia	Eight pipeline stages, visible per-stage progress
Verifier as a separate stage	The solver's optimality claim is checked against the constraints before the interpreter renders the artifact
Per-item rationale in plain English	The output is operationally readable, not an LP solver dump

Q21 — Save the decision¶

"Save the decision file as '2026 Safety Investment Plan' in the Annual Review folder."

The decision is persisted to the Annual Review folder as a structured document. This is the artifact the analyst hands to leadership.

📁 Workspace at the end of the demo: - 2025 Annual Review/Recurring Patterns 2024.md - 2025 Annual Review/Iberia hydraulic pattern.md - 2025 Annual Review/2026 Safety Investment Plan.md - Iberia/2024-incident-network.graph - Iberia/Iberia Risk Profile Briefing 2024.md - Iberia/Iberia 2024 Consolidated Risk Assessment.md

Block close. "That's the case for the budget meeting. The triage gave me the population. The Iberia investigation gave me the audit trail. The decision agent gave me the portfolio. The system did not decide — I did. But every step is reproducible."

L. Summary¶

The demo follows an analyst's arc: triage → deep dive → structured investigation → decision. Each block adds capabilities; the cumulative work product is a folder of saved artifacts.

Capabilities by block¶

Block	Graphs	Capabilities introduced
1 — Patterns of the quarter	aviation_safety	Triage aggregation, workspace doc save + live edit, chart, map, ECCAIRS jargon decode via the searchable ontology
2 — Iberia deep dive	+ safety_recommendations + airworthiness_directives	Operator timeline + profile (direct execution, not briefing), cross-graph fan-out, regulatory gap detection, network graph as independent UI
3 — Structured investigation	three graphs + ontology	Briefing template registry (profile-gated), elicitation flow (recon → one-question-at-a-time → entry-point verification → auto-decoded domain terms → cross-graph detection → approval), reasoning-agent autonomy with PROV-O audit, investigation graph as independent UI
4 — Decision: which recommendations do we fund?	three graphs + ontology + decision agent	LP/MILP optimization via Euboulia, KaTeX-rendered formulation, per-item rationale, decision file as the closing artifact

The arc¶

Block 1: "Find me something." Block 2: "Connect these." Block 3: "Investigate this." Block 4: "Decide for me."

The system did not decide. The analyst did. But every step is reproducible from saved artifacts and PROV-O records.

Workspace at the end¶

2025 Annual Review/
├── Recurring Patterns 2024.md
├── Iberia hydraulic pattern.md
├── 2026 Safety Investment Plan.md
└── Iberia/
    ├── network graph
    ├── Operator Risk Profile briefing
    └── investigation graph

That's the case for the budget meeting.