Reasoning
DAG ID: reason_koncludix
Schedule: Manual trigger only
File: dags/reason_koncludix.py
Reasoner: Konclude, driven through the Koncludix wrapper.
What It Does
Performs OWL 2 DL reasoning over the MSE-KG to materialise implicit knowledge that is only derivable from the ontological axioms defined in MWO, NFDIcore, and BFO. The reasoner computes the deductive closure of the ABox with respect to the TBox, generating inferred class assertions, subclass relationships, property assertions, and inverse-property entailments that are not explicitly stated in the input graph but are logically entailed by the ontology.
The pipeline uses Konclude, a high-performance OWL 2 reasoner that supports the $\mathcal{SROIQ}(\mathcal{D})$ description logic. Konclude is invoked through the Koncludix Python wrapper, which drives Konclude through a small set of SPARQL extraction jobs (classes, object/datatype properties, sub-property hierarchies, class assertions) and recombines the XML results into a single inferences Turtle file.
Why Konclude?
Konclude is a fast, optimisation-focused OWL 2 reasoner. The previous pipeline used Openllet and Sunlet variants; both have been retired in favour of Konclude, which is now the default reasoner for the core pipeline and all harvesters. The retired DAG scripts (reason-spreadsheets.py, reason_openlletnew.py) are kept on disk for the reproducibility of older releases but are not part of the production pipeline.
Task Chain
graph LR
A["init_data_dir"] --> B["pre_filter<br/>━━━━━━━━━━━<br/>ROBOT remove"]
A --> R["retrieve_nfdicore_extension"]
B --> M["merge_expand<br/>━━━━━━━━━━━<br/>ROBOT merge + expand"]
R --> M
M --> C["reasoning<br/>━━━━━━━━━━━<br/>Konclude (Koncludix)"]
C --> E["mark_reason_success"]
style A fill:#e8eaf6,stroke:#283593
style B fill:#fff3e0,stroke:#e65100
style R fill:#fff3e0,stroke:#e65100
style M fill:#ede7f6,stroke:#4527a0
style C fill:#e3f2fd,stroke:#1565c0
style E fill:#e8f5e9,stroke:#2e7d32
Step 1: Axiom Pre-Filtering
Before reasoning, the pipeline uses ROBOT to remove axioms that cause reasoning difficulties or are deprecated:
robot remove --input input.ttl \
--term http://purl.obolibrary.org/obo/RO_0000057 \
--axioms SubPropertyChainOf \
remove \
--term http://purl.obolibrary.org/obo/BFO_0000118 \
--term http://purl.obolibrary.org/obo/BFO_0000181 \
--term http://purl.obolibrary.org/obo/BFO_0000138 \
--term http://purl.obolibrary.org/obo/BFO_0000136 \
--output filtered.ttl
| Removed Term | Reason |
|---|---|
RO_0000057 (SubPropertyChainOf only) |
Property chain axioms on has_participant cause reasoning complexity explosion |
BFO_0000118 |
Deprecated BFO class |
BFO_0000181 |
Deprecated BFO class |
BFO_0000138 |
Deprecated BFO class |
BFO_0000136 |
Deprecated BFO class |
Why filter before reasoning?
SubPropertyChainOf axioms on has_participant (RO_0000057) interact with the large ABox to produce combinatorial explosion in reasoning time. Removing these chain axioms preserves the core semantics while making reasoning tractable. Deprecated BFO terms are removed to prevent spurious inferences from obsolete class definitions.
Step 2: Merge with NFDIcore Extension
In parallel to the pre-filter step, the pipeline fetches the current NFDIcore extension ontology from the URL stored in the Airflow Variable nfdicore_extension and writes it next to the filtered input. ROBOT is then used to merge the filtered MSE-KG with the NFDIcore extension and to expand any macro axioms:
robot merge \
--input spreadsheets-filtered.ttl \
--input nfdicore-extension.owl \
expand --annotate-expansion-axioms true \
--output spreadsheets-expanded.ttl
The expanded file is the input that Konclude consumes.
Step 3: Konclude Reasoning via Koncludix
The reasoner is invoked through the Koncludix Python wrapper:
from common.koncludix import koncludix
koncludix(
binary = "/opt/Konclude/Binaries/Konclude",
input_file = "spreadsheets-expanded.ttl",
output_file = "spreadsheets_inferences.ttl",
work_dir = "./koncludix",
)
The wrapper drives Konclude through a small set of SPARQL extraction jobs and merges the per-job XML results into one Turtle file containing only the inferred axioms.
Extracted Axiom Types
| Axiom Type | Description | Example |
|---|---|---|
| ClassAssertion | Inferred rdf:type statements |
A person bearing an AgentRole is inferred to be an Agent |
| SubClassOf | Inferred class subsumption | ArtificialIntelligence ⊑ ComputerScience |
| SubPropertyOf | Inferred property hierarchies | Specialised participation relations |
| PropertyAssertion | Inferred object/data property values | Inverse of participates_in yields has_participant |
| InverseProperties | Materialised inverse property pairs | RO_0000056 ↔ RO_0000057 |
Why these axiom types?
This selection covers the axioms needed for SPARQL query answering: class assertions enable ?x a ?Class patterns, property assertions enable ?x ?prop ?y traversals, and subsumption enables hierarchical queries. Axiom types like DisjointClasses or EquivalentClasses are not extracted because they are schema-level (TBox) axioms that are already present in the input ontology.
The downstream pipeline validates that the produced file is well-formed Turtle (not accidentally RDF/XML) by inspecting the file header.
Input
| Source | Description |
|---|---|
matwerk_sharedfs |
Shared filesystem path |
matwerk_last_successful_merge_run |
Source directory (if source_run_dir not in conf) |
nfdicore_extension |
URL of the NFDIcore extension OWL file |
koncludebin |
Path to the Konclude executable |
robotcmd |
Path to the ROBOT executable |
Conf parameters (from triggering DAG or UI):
| Parameter | Default | Description |
|---|---|---|
artifact |
spreadsheets |
Name of the artifact being reasoned |
in_ttl |
spreadsheets_asserted.ttl |
Input TTL filename |
source_run_dir |
(from Variable) | Custom source directory |
target_run_dir |
(auto-created) | Custom target directory |
Output
| Output | Description |
|---|---|
{artifact}-filtered.ttl |
Pre-processed TTL (problematic axioms removed) |
{artifact}-expanded.ttl |
Merged with NFDIcore extension and ROBOT-expanded |
{artifact}_inferences.ttl |
Konclude reasoning output, materialised as Turtle |
Variables set on success:
matwerk_last_successful_reason_run(if artifact isspreadsheets)matwerk_last_successful_reason_run__{artifact}(always)
Downstream
None. Trigger validation_checks after this DAG succeeds.