Understanding Ontologies¶
An ontology is a formal representation of knowledge within a domain. It defines the types of entities that exist, their properties, and the relationships between them.
What is an Ontology?¶
In the context of knowledge graphs, an ontology serves as a schema or blueprint that describes:
- Classes: The types of entities (e.g., Person, Organization, Aircraft)
- Properties: Attributes of entities (e.g., name, date, location)
- Relationships: How entities connect to each other (e.g., WORKS_FOR, LOCATED_IN)
- Constraints: Rules about valid combinations (e.g., a Person can have only one birth date)
Ontology (Schema) Knowledge Graph (Data)
───────────────── ─────────────────────
Person → John Smith
- name: string - name: "John Smith"
- birthDate: date - birthDate: 1985-03-15
Organization → Acme Corp
- name: string - name: "Acme Corp"
- founded: date - founded: 2010-01-01
WORKS_FOR → John Smith ──WORKS_FOR──► Acme Corp
- since: date - since: 2020-06-01
Why Ontologies Matter¶
1. Consistency¶
Without an ontology, the same concept might be extracted differently:
| Without Ontology | With Ontology |
|---|---|
| "company", "firm", "business", "corp" | Organization |
| "works at", "employed by", "staff of" | WORKS_FOR |
| "located in", "based in", "HQ in" | LOCATED_IN |
An ontology ensures that semantically equivalent concepts map to the same canonical type.
2. Interoperability¶
Ontologies enable data from different sources to be integrated:
Source A: "Boeing 737-800" ─┐
Source B: "B738" ─┼──► Aircraft (type: Boeing 737-800)
Source C: "737-800 aircraft" ─┘
When multiple systems use the same ontology, their data becomes automatically compatible.
3. Domain Expertise Capture¶
Ontologies encode expert knowledge about a domain:
- Aviation: An "Occurrence" involves Aircraft, happens at an Airport, has a Flight Phase
- Financial Crime: A "Sanction" targets an Entity, issued by an Authority, has a Program ID
- Healthcare: A "Diagnosis" relates to a Patient, made by a Physician, has an ICD code
This expertise guides extraction and ensures domain-relevant relationships are captured.
4. Query Intelligence¶
With an ontology, systems can understand that:
- A query for "airlines" should match entities of type
Operator - A query for "plane crashes" relates to
Occurrencewith certainEvents - A query for "sanctioned companies" means
OrganizationwithSANCTIONrelationship
Ontology vs Schema vs Taxonomy¶
| Concept | Definition | Example |
|---|---|---|
| Taxonomy | Hierarchical classification | Aircraft → Commercial → Wide-body → Boeing 787 |
| Schema | Data structure definition | {name: string, date: date} |
| Ontology | Formal knowledge model with reasoning | Aircraft subClassOf Vehicle; if X manufactured Y, then Y hasManufacturer X |
An ontology is the most expressive, supporting:
- Inheritance: A
CommercialAircraftinherits properties fromAircraft - Inference: If A is part of B, and B is located in C, then A is located in C
- Constraints: An
Occurrencemust have exactly oneprimary_cause
Standard Ontology Formats¶
OWL (Web Ontology Language)¶
The W3C standard for ontologies, typically serialized as Turtle (.ttl):
@prefix ex: <http://example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ex:Aircraft a owl:Class ;
rdfs:label "Aircraft" ;
rdfs:comment "A vehicle capable of flight" .
ex:Operator a owl:Class ;
rdfs:label "Operator" ;
rdfs:comment "An organization that operates aircraft" .
ex:operatedBy a owl:ObjectProperty ;
rdfs:domain ex:Aircraft ;
rdfs:range ex:Operator .
FollowTheMoney (FTM)¶
A schema designed for investigative journalism and anti-corruption:
Person:
properties:
- name
- birthDate
- nationality
Organization:
properties:
- name
- jurisdiction
- registrationNumber
Ownership:
properties:
- owner: Person | Organization
- asset: Organization
- percentage
Domain-Specific Ontologies¶
Different domains have established ontologies:
| Domain | Ontology | Description |
|---|---|---|
| Aviation Safety | ECCAIRS | EU standard for occurrence reporting |
| Financial Crime | FollowTheMoney | Investigative journalism standard |
| Biomedicine | SNOMED CT | Clinical terminology |
| Geography | GeoNames | Place names and relationships |
| General | Schema.org | Web-wide entity types |
Using established ontologies provides:
- Standardization: Industry-accepted terminology
- Completeness: Years of domain expert refinement
- Interoperability: Data exchange with other systems
Aletheia's Ontology Processing¶
Aletheia's GenericOntologyLoader processes ontologies with several capabilities beyond basic OWL parsing:
- Transitive class classification — Determines whether each class is an entity type, relationship type, or abstract class by checking the full ancestry chain (not just direct parents).
- Non-reified relationship extraction — Object properties between entity classes are extracted as relationship types alongside reified relationship classes.
- ModelingProfile support — Optional explicit classification hints that override heuristic rules, useful when an ontology's structure doesn't fit the default patterns.
- Enriched docstrings — Property names and descriptions from the ontology are appended to entity type docstrings, giving the LLM concrete extraction signal.
See Integration for the full workflow.
Next Steps¶
- Business Value of Ontologies — ROI and practical benefits
- Aletheia Integration — How Aletheia uses ontologies for extraction
- Use case specific ontologies: