Skip to content

Reasoning

DAG ID: reason_koncludix Schedule: Manual trigger only File: dags/reason_koncludix.py Reasoner: Konclude, driven through the Koncludix wrapper.

What It Does

Performs OWL 2 DL reasoning over the MSE-KG to materialise implicit knowledge that is only derivable from the ontological axioms defined in MWO, NFDIcore, and BFO. The reasoner computes the deductive closure of the ABox with respect to the TBox, generating inferred class assertions, subclass relationships, property assertions, and inverse-property entailments that are not explicitly stated in the input graph but are logically entailed by the ontology.

The pipeline uses Konclude, a high-performance OWL 2 reasoner that supports the $\mathcal{SROIQ}(\mathcal{D})$ description logic. Konclude is invoked through the Koncludix Python wrapper, which drives Konclude through a small set of SPARQL extraction jobs (classes, object/datatype properties, sub-property hierarchies, class assertions) and recombines the XML results into a single inferences Turtle file.

Why Konclude?

Konclude is a fast, optimisation-focused OWL 2 reasoner. The previous pipeline used Openllet and Sunlet variants; both have been retired in favour of Konclude, which is now the default reasoner for the core pipeline and all harvesters. The retired DAG scripts (reason-spreadsheets.py, reason_openlletnew.py) are kept on disk for the reproducibility of older releases but are not part of the production pipeline.

Task Chain

graph LR
    A["init_data_dir"] --> B["pre_filter<br/>━━━━━━━━━━━<br/>ROBOT remove"]
    A --> R["retrieve_nfdicore_extension"]
    B --> M["merge_expand<br/>━━━━━━━━━━━<br/>ROBOT merge + expand"]
    R --> M
    M --> C["reasoning<br/>━━━━━━━━━━━<br/>Konclude (Koncludix)"]
    C --> E["mark_reason_success"]

    style A fill:#e8eaf6,stroke:#283593
    style B fill:#fff3e0,stroke:#e65100
    style R fill:#fff3e0,stroke:#e65100
    style M fill:#ede7f6,stroke:#4527a0
    style C fill:#e3f2fd,stroke:#1565c0
    style E fill:#e8f5e9,stroke:#2e7d32

Step 1: Axiom Pre-Filtering

Before reasoning, the pipeline uses ROBOT to remove axioms that cause reasoning difficulties or are deprecated:

robot remove --input input.ttl \
  --term http://purl.obolibrary.org/obo/RO_0000057 \
  --axioms SubPropertyChainOf \
  remove \
  --term http://purl.obolibrary.org/obo/BFO_0000118 \
  --term http://purl.obolibrary.org/obo/BFO_0000181 \
  --term http://purl.obolibrary.org/obo/BFO_0000138 \
  --term http://purl.obolibrary.org/obo/BFO_0000136 \
  --output filtered.ttl
Removed Term Reason
RO_0000057 (SubPropertyChainOf only) Property chain axioms on has_participant cause reasoning complexity explosion
BFO_0000118 Deprecated BFO class
BFO_0000181 Deprecated BFO class
BFO_0000138 Deprecated BFO class
BFO_0000136 Deprecated BFO class

Why filter before reasoning?

SubPropertyChainOf axioms on has_participant (RO_0000057) interact with the large ABox to produce combinatorial explosion in reasoning time. Removing these chain axioms preserves the core semantics while making reasoning tractable. Deprecated BFO terms are removed to prevent spurious inferences from obsolete class definitions.

Step 2: Merge with NFDIcore Extension

In parallel to the pre-filter step, the pipeline fetches the current NFDIcore extension ontology from the URL stored in the Airflow Variable nfdicore_extension and writes it next to the filtered input. ROBOT is then used to merge the filtered MSE-KG with the NFDIcore extension and to expand any macro axioms:

robot merge \
  --input spreadsheets-filtered.ttl \
  --input nfdicore-extension.owl \
  expand --annotate-expansion-axioms true \
  --output spreadsheets-expanded.ttl

The expanded file is the input that Konclude consumes.

Step 3: Konclude Reasoning via Koncludix

The reasoner is invoked through the Koncludix Python wrapper:

from common.koncludix import koncludix

koncludix(
    binary       = "/opt/Konclude/Binaries/Konclude",
    input_file   = "spreadsheets-expanded.ttl",
    output_file  = "spreadsheets_inferences.ttl",
    work_dir     = "./koncludix",
)

The wrapper drives Konclude through a small set of SPARQL extraction jobs and merges the per-job XML results into one Turtle file containing only the inferred axioms.

Extracted Axiom Types

Axiom Type Description Example
ClassAssertion Inferred rdf:type statements A person bearing an AgentRole is inferred to be an Agent
SubClassOf Inferred class subsumption ArtificialIntelligence ⊑ ComputerScience
SubPropertyOf Inferred property hierarchies Specialised participation relations
PropertyAssertion Inferred object/data property values Inverse of participates_in yields has_participant
InverseProperties Materialised inverse property pairs RO_0000056 ↔ RO_0000057

Why these axiom types?

This selection covers the axioms needed for SPARQL query answering: class assertions enable ?x a ?Class patterns, property assertions enable ?x ?prop ?y traversals, and subsumption enables hierarchical queries. Axiom types like DisjointClasses or EquivalentClasses are not extracted because they are schema-level (TBox) axioms that are already present in the input ontology.

The downstream pipeline validates that the produced file is well-formed Turtle (not accidentally RDF/XML) by inspecting the file header.

Input

Source Description
matwerk_sharedfs Shared filesystem path
matwerk_last_successful_merge_run Source directory (if source_run_dir not in conf)
nfdicore_extension URL of the NFDIcore extension OWL file
koncludebin Path to the Konclude executable
robotcmd Path to the ROBOT executable

Conf parameters (from triggering DAG or UI):

Parameter Default Description
artifact spreadsheets Name of the artifact being reasoned
in_ttl spreadsheets_asserted.ttl Input TTL filename
source_run_dir (from Variable) Custom source directory
target_run_dir (auto-created) Custom target directory

Output

Output Description
{artifact}-filtered.ttl Pre-processed TTL (problematic axioms removed)
{artifact}-expanded.ttl Merged with NFDIcore extension and ROBOT-expanded
{artifact}_inferences.ttl Konclude reasoning output, materialised as Turtle

Variables set on success:

  • matwerk_last_successful_reason_run (if artifact is spreadsheets)
  • matwerk_last_successful_reason_run__{artifact} (always)

Downstream

None. Trigger validation_checks after this DAG succeeds.