Aviation Safety - Technical Reference¶

This page documents the technical implementation of the aviation safety use case.

Architecture¶

use_cases/aviation_safety/
├── __init__.py                    # Use case registration
├── parser.py                      # Markdown incident parser
├── episode_builder.py             # Episode text builder
├── data/                          # Markdown incident reports
│   ├── incident_2024_0157.md
│   ├── incident_2024_0289.md
│   └── ...
├── ontology/
│   ├── aviation_safety.ttl        # Basic ontology
│   └── eccairs_aviation.ttl       # ECCAIRS-derived (optional)
├── eccairs_taxonomy/              # ECCAIRS XML parser
│   └── eccairs_parser.py
├── evaluation_questions.json      # Full evaluation set (50q)
└── evaluation_questions_curated.json  # Curated set (35q)

Components¶

Parser¶

The AviationSafetyParser parses structured markdown incident reports:

# use_cases/aviation_safety/parser.py
class AviationSafetyParser(BaseParser):
    """Parser for aviation safety markdown incident reports."""

    def parse_all(self) -> Iterator[IncidentRecord]:
        """Parse all markdown files in data directory."""
        for file in self.data_dir.glob("*.md"):
            yield self.parse_file(file)

    def parse_file(self, path: Path) -> IncidentRecord:
        """Parse a single incident report."""
        content = path.read_text()
        sections = self._split_sections(content)

        return IncidentRecord(
            id=self._extract_id(sections),
            date=self._extract_field(sections, "Metadata", "Date"),
            location=self._extract_field(sections, "Metadata", "Location"),
            aircraft=self._parse_aircraft(sections),
            description=sections.get("Incident Description", ""),
            findings=self._parse_findings(sections),
            # ...
        )

Data Models¶

@dataclass
class IncidentRecord:
    """Aviation safety incident record."""
    id: str
    date: str = ""
    time: str = ""
    location: str = ""
    country: str = ""
    flight_phase: str = ""
    aircraft: AircraftInfo = field(default_factory=AircraftInfo)
    description: str = ""
    outcome: Outcome = field(default_factory=Outcome)
    weather: WeatherConditions = field(default_factory=WeatherConditions)
    findings: Findings = field(default_factory=Findings)
    safety_recommendations: list[SafetyRecommendation] = field(default_factory=list)

@dataclass
class AircraftInfo:
    """Aircraft information from incident report."""
    aircraft_type: str = ""
    registration: str = ""
    operator: str = ""

@dataclass
class Findings:
    """Incident findings and analysis."""
    primary_cause: str = ""
    contributing_factors: list[str] = field(default_factory=list)
    human_factors: str = ""
    wildlife: dict[str, str] = field(default_factory=dict)

Episode Builder¶

The episode builder converts parsed records to markdown for Graphiti:

# use_cases/aviation_safety/episode_builder.py
def build_episode_content(record: IncidentRecord) -> str:
    """Build markdown episode from incident record."""
    lines = [
        f"# Aviation Safety Occurrence: {record.id}",
        "",
        "## Occurrence Metadata",
        f"- **Occurrence ID**: {record.id}",
        f"- **Date**: {record.date}",
        f"- **Location**: {record.location}",
        f"- **Country**: {record.country}",
        f"- **Flight Phase**: {record.flight_phase}",
    ]

    if record.aircraft.aircraft_type:
        lines.extend([
            "",
            "## Aircraft",
            f"- **Type**: {record.aircraft.aircraft_type}",
            f"- **Registration**: {record.aircraft.registration}",
            f"- **Operator**: {record.aircraft.operator}",
        ])

    lines.extend([
        "",
        "## Occurrence Description",
        record.description,
    ])

    if record.findings.primary_cause:
        lines.extend([
            "",
            "## Findings",
            f"### Primary Cause",
            record.findings.primary_cause,
        ])

    return "\n".join(lines)

Episode Output Example¶

# Aviation Safety Occurrence: 2024-0412-EU

## Occurrence Metadata
- **Occurrence ID**: 2024-0412-EU
- **Date**: 2024-03-22
- **Location**: En route, 45 nm west of Barcelona (LEBL)
- **Country**: Spain
- **Flight Phase**: Cruise

## Aircraft
- **Type**: Embraer ERJ-195
- **Registration**: CS-TTW
- **Operator**: TAP Air Portugal

## Occurrence Description
The aircraft encountered severe clear air turbulence at FL380 without
prior warning. Two cabin crew members sustained minor injuries...

## Findings
### Primary Cause
Unpredicted clear air turbulence associated with jetstream boundary

Registration¶

# use_cases/aviation_safety/__init__.py
from .parser import AviationSafetyParser, IncidentRecord
from aletheia.core.ontology import GenericOntologyLoader
from aletheia.core.episodes import register_episode_builder
from .episode_builder import build_episode_content

Parser = AviationSafetyParser
Ontology = GenericOntologyLoader

DATA_DIR = "use_cases/aviation_safety/data"
ONTOLOGY_DIR = "use_cases/aviation_safety/ontology"

register_episode_builder(
    "aviation_safety",
    build_episode_content,
    source_description="Aviation safety data",
)

Data Pipeline¶

Markdown Incident Reports
         │
         ▼ [AviationSafetyParser]
    IncidentRecord objects
         │
         ▼ [Episode Builder]
    Markdown episodes
         │
         ▼ [Graphiti]
    Knowledge Graph
         │
         ├──► Nodes: Occurrence, Aircraft, Airport, Operator, Manufacturer, Country
         └──► Edges: HAS_AIRCRAFT, HAS_OPERATOR, HAS_AIRPORT, LOCATED_IN, MANUFACTURED_BY

Graph Schema¶

Node Types¶

Type	Count	Description
Occurrence	10	Aviation incidents
Aircraft	11	Aircraft involved
Airport	10	Locations
Operator	9	Airlines
Country	5	Countries
Manufacturer	5	Aircraft makers
Episodic	10	Source episodes

Edge Types¶

Type	Count	Description
MENTIONS	60	Episode → Entity
RELATES_TO	15	General relationships
HAS_OPERATOR	10	Occurrence → Operator
LOCATED_IN	10	Airport → Country
HAS_AIRPORT	10	Occurrence → Airport
HAS_AIRCRAFT	10	Occurrence → Aircraft
MANUFACTURED_BY	10	Aircraft → Manufacturer

ECCAIRS Taxonomy¶

Converting ECCAIRS XML to OWL¶

python use_cases/aviation_safety/eccairs_taxonomy/eccairs_parser.py \
  "use_cases/aviation_safety/eccairs_taxonomy/Eccairs Aviation 7.0.0.1.xml" \
  -o use_cases/aviation_safety/ontology/eccairs_aviation.ttl \
  -v

Parser Features¶

The ECCAIRSTaxonomyParser converts ECCAIRS XML (UTF-16) to OWL:

class ECCAIRSTaxonomyParser:
    """Convert ECCAIRS XML taxonomy to OWL ontology."""

    def parse(self) -> list[ECCAIRSEntity]:
        """Parse ECCAIRS XML file."""
        tree = ET.parse(self.xml_path)
        root = tree.getroot()

        for entity in root.findall(".//ENTITY"):
            yield ECCAIRSEntity(
                id=entity.get("Id"),
                name=entity.get("Description"),
                attributes=self._parse_attributes(entity),
            )

    def to_ttl(self, output_path: Path) -> None:
        """Generate OWL ontology."""
        # Creates owl:Class for each ENTITY
        # Creates owl:DatatypeProperty for attributes
        # Creates owl:ObjectProperty for references

Generated Ontology¶

@prefix eccairs: <http://eccairs.jrc.ec.europa.eu/ontology#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

eccairs:Occurrence a owl:Class ;
    rdfs:label "Occurrence" ;
    rdfs:comment "An aviation safety occurrence" .

eccairs:Aircraft a owl:Class ;
    rdfs:label "Aircraft" ;
    rdfs:comment "An aircraft involved in an occurrence" .

eccairs:hasAircraft a owl:ObjectProperty ;
    rdfs:domain eccairs:Occurrence ;
    rdfs:range eccairs:Aircraft .

Markdown Format¶

Required Sections¶

Section	Required	Content
`# Incident Report {ID}`	Yes	Title with ID
`## Metadata`	Yes	Date, location, flight phase
`## Aircraft`	Yes	Type, registration, operator
`## Incident Description`	Yes	Narrative text
`## Outcome`	No	Injuries, damage
`## Weather Conditions`	No	Visibility, wind
`## Findings`	No	Primary cause, contributing factors
`## Safety Recommendations`	No	EASA recommendations

Field Extraction¶

Fields are extracted from bullet points:

## Metadata
- **Date**: 2024-02-15
- **Location**: Paris CDG (LFPG)

Parser regex:

pattern = r'\*\*([^*]+)\*\*:\s*(.+)'
# Captures: ("Date", "2024-02-15")

Evaluation Questions¶

Question Format¶

{
  "questions": [
    {
      "id": "av_q1",
      "question": "What caused incident 2024-0157-EU?",
      "answer": "Hydraulic pump failure due to manufacturing defect",
      "type": "cause_lookup"
    }
  ]
}

Question Type Distribution (Curated)¶

Type	Count	%
cause_lookup	10	28.6%
incident_at_location	8	22.9%
entity_description	6	17.1%
aircraft_lookup	5	14.3%
operator_lookup	4	11.4%
recommendation_lookup	2	5.7%

Configuration¶

Environment Variables¶

# Database
FALKORDB_HOST=localhost
FALKORDB_PORT=6379

# LLM
OPENAI_API_KEY=sk-...

# Embeddings (optional)
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5

Extending the Use Case¶

Adding New Incidents¶

Create markdown file following the format
Name as incident_YYYY_NNNN.md
Include all required sections
Rebuild graph with --reset

Custom Entity Extraction¶

Modify the episode builder to extract additional entities:

def build_episode_content(record: IncidentRecord) -> str:
    # Add custom entity extraction
    if record.findings.wildlife:
        lines.append(f"- **Bird Species**: {record.findings.wildlife.get('species')}")

Adding ECCAIRS Attributes¶

Generate new ontology from ECCAIRS XML
Load ontology graph
Rebuild knowledge graph with graph-hybrid mode