OntoCheck logo

OntoCheck: Query-Driven Ontology Assessments for Scientific Domain Applications

1 Case Western Reserve University 2 University of Central Florida

* These authors contributed equally to this work.

TL;DR

OntoCheck is an open-source Python tool that automates ontology quality assessment. It combines 17 structural metrics with a novel query-driven methodology that measures how well an ontology supports your SPARQL queries—yielding Relevance (vocabulary coverage) and Accuracy (utilization density) scores. Install via pip install OntoCheck and evaluate any OWL/RDF ontology against your competency questions in minutes.

Abstract

As materials science and other fields increasingly adopt FAIR data principles, ontologies have become essential for encoding the semantics of scientific investigations. Yet, evaluating ontology quality remains a manual, technically demanding bottleneck. Current frameworks emphasize structural correctness but fail to assess practical utility against the real-world queries posed by domain scientists. To address this, we introduce OntoCheck, an open-source tool unifying domain-agnostic structural metrics with a novel, query-driven assessment methodology. By analyzing SPARQL queries derived from natural-language competency questions, OntoCheck compares the required query terms against an ontology's full vocabulary to yield complementary metrics for vocabulary coverage and utilization density. Crucially, this empowers domain scientists and data engineers to assert the practical usage and suitability of an ontology based on their specific application needs. We validate OntoCheck at scale against DBpedia via the KGQA dataset, and demonstrate its utility across several materials science and geospatial ontologies, automating quality assessment for the reproducible deployment of semantic infrastructure in FAIR workflows.

Why Task-Based Ontology Assessment?

Traditional ontology assessment focuses on structural correctness—valid URIs, no circular dependencies, no orphaned classes. But a structurally sound ontology can still fail a researcher's use case if it lacks the vocabulary needed for their queries.

Consider a domain scientist who wants to know the maximum grain size across perovskite samples. Because the ontology contains terms relevant to the query (i.e., mds:GrainSize, mds:Sample, and mds:PerovskiteStructure), a query engine is able to retrieve the answer. However, a query for the average phase fraction of the alpha-titanium phase of Ti64 samples is unanswerable because the ontology does not contain terms relevant to the query. This example clearly demonstrates the importance of metrics that measure an ontology's task-based competency.

Successful vs. unsuccessful query-answering case
Figure 1. A successful versus unsuccessful query-answering case caused by differences in an ontology's fitness for use. Green ovals, blue rounded boxes, and white rounded boxes denote classes, instances, and literals, respectively. Query 1 (Q1) returns the desired answers because the ontology contains the terms used in the query, whereas Query 2 (Q2) is unanswerable.

Key Contributions

  1. Unified metric suite. OntoCheck implements 17 task-agnostic metrics spanning labeling, structural, accessibility, and naming convention categories, consolidating and extending prior assessment frameworks into a single open-source tool.
  2. Novel task-based assessment methodology. By ingesting competency questions expressed as SPARQL queries and systematically comparing the required vocabulary against the terms defined in a candidate ontology, OntoCheck derives two complementary metrics:
    • Relevance — quantifies vocabulary coverage (what fraction of task-required terms the ontology defines)
    • Accuracy — quantifies utilization density (what fraction of ontology terms the task queries actually use)
  3. Validated at scale. We verify OntoCheck through large-scale evaluation on the DBpedia knowledge graph using the LC-QuAD benchmark (5,000 queries, 98.94% Relevance), and demonstrate applicability across materials science and geospatial ontologies across four configurable assessment modes.

Assessment Modes

OntoCheck is configurable in four assessment modes, controlled by a declarative configuration C = (O, Q, M, G), where O specifies ontologies to be assessed, Q is a set of competency queries, M refers to the evaluation metrics, and G is a ground-truth knowledge graph.

Mode Name Configuration Description
1 Task-agnostic (O, -, M, -) Structural, labeling, accessibility, and naming metrics
2 Task-specific Web (O, Q, M, G) Validation against KGQA benchmarks (e.g., LC-QuAD / DBpedia)
3 Task-based Scientific (O, Q, M, -) Domain ontology vs. competency questions
4 Cross-Domain (O[], Q, M, -) Merged ontologies vs. cross-domain questions

Knowledge graphs backed by an ontology can also be evaluated in Modes 3 and 4.

Assessment Examples

Mode 1: Task-Agnostic Assessment

The semanticConnection metric is applied to a subset (Equipment) of the built-in Capacitors ontology (54 classes, 8 root hierarchy chains). The assessment reveals that 6 of 8 root classes are grounded in CCO or BFO, while two domain-specific roots — mds:Thermocouple and mds:VoltageRating — remain disconnected, directly flagging the portions of the ontology that require further alignment for integration with external knowledge graphs.

Mode 2: Task-Specific Web Ontology (DBpedia / LC-QuAD)

OntoCheck is validated at scale against the DBpedia ontology (3,814 domain classes and properties) using the LC-QuAD 1.0 benchmark (5,000 queries, 38 workloads). Across all queries, 470 unique ontology terms appear in Ta, of which 465 are present in To, yielding an overall Relevance of 98.94% and an Accuracy of 12.19%. Per-workload analysis shows 28 of 38 workloads achieve 100% Relevance.

Mode 3: Task-Based Scientific Ontology Assessment

OntoCheck is demonstrated across four materials science sub-domain ontologies and two geospatial ontologies, each evaluated against 20 competency questions. Representative results are shown below.

Materials Science Ontologies
Ontology CQ Competency Question |Ta| |Ta∩To| Rel. Acc.
EBSD (89 terms) CQ1 What is the material composition of this sample? 2150%1.1%
CQ16 What are the polishing voltage, motor speed setting, and frequency mode of the electropolish system? 88100%9.0%
CQ20 Which samples from the same batch manufactured by LPBF with laser power above 200W had total surface removal depth less than 100 μm? 7686%6.7%
Capacitors (87 terms) CQ1 What types of electrical connectors are available and what is the function of each? 11100%1.1%
CQ9 What are the manufacturer-specified dimensions of the capacitor? 44100%4.6%
CQ19 What solder process type, solder material, and iron temperature were used? 44100%4.6%
MatProc (127 terms) CQ5 What parameters are required for Laser Powder Bed Fusion? 22100%1.6%
CQ18 Which process is known by the abbreviation “SPS”? 000%0.0%
CQ20 Which processes share both a temperature parameter and a pressure parameter? 33100%2.4%
XRD (318 terms) CQ7 What is the dislocation density obtained from diffraction line profile analysis? 22100%0.6%
CQ10 What crystal structure does the alpha-titanium phase have? 33100%0.9%
CQ15 What types of solid-state phase transformations are defined? 11100%0.3%
Geospatial Ontologies
Ontology CQ Competency Question |Ta| |Ta∩To| Rel. Acc.
GeoOutage (10 terms) CQ10 Which NTL image is associated with this outage record? 22100%20.0%
CQ15 Which outage records in Lee County, Florida had more than 100 outages around Hurricane Ian? 44100%40.0%
CQ19 Which NTL image and outage map correspond to the highest-outage record for each county? 55100%50.0%
Geospatial (100 terms) CQ2 What satellite is the ASTER sensor mounted on? 44100%4.0%
CQ10 What satellites were launched in 2020? 33100%3.0%
CQ12 What geospatial data exists in the neighborhood of geohash “dpnq”? 1111100%11.0%
Mode 4: Cross-Domain Ontology Assessment

OntoCheck supports cross-domain assessment by merging multiple ontologies. Two case studies demonstrate this capability for scientists working at the intersection of multiple sub-domains.

Case Study 1: XRD + Capacitors (404 terms)
CQ Competency Question Rel. Acc.
CQ2 What X-ray detector was used in this experiment? 100%0.5%
CQ6 What are the lattice parameters determined by Rietveld refinement of this diffraction pattern? 100%0.5%
CQ10 What is the capacitance value and its tolerance for the capacitor being tested? 100%0.7%
Case Study 2: XRD + Geospatial (417 terms)
CQ Competency Question Rel. Acc.
CQ9 What bands contain near infrared information from the ASTER sensor? 100%1.2%
CQ5 What is the chemical formula of this sample? 100%0.5%
CQ6 What are the lattice parameters determined by Rietveld refinement of this diffraction pattern? 100%0.5%
Comparative Assessment of Materials Science Ontologies

OntoCheck enables direct comparison of competing ontologies against a shared 50-question benchmark spanning seven categories. Ten representative questions are shown below.

ID Category Competency Question MDS-Onto PMDCo AM-Onto EMMO CHAMEO
Q02Material Identity What is the identifier/name assigned to each material specimen, and by whom was it assigned? 100100010033
Q13Manufacturing What intermediate processing steps occurred between raw material acquisition and the final specimen, and in what sequence? 1006767100100
Q21Characterization What specimen preparation steps were carried out before a characterization measurement? 1001000100100
Q23Characterization What is the measurement uncertainty or tolerance associated with a reported characterization result? 100100010050
Q35Provenance Who performed the manufacturing or characterization process? 671003310067
Q36Provenance What institution or facility owns or operates the equipment used in a characterization or manufacturing experiment? 100673367100
Q43Simulation What physical assumptions or simplifications were made in a computational model of a material or process? 67671006767
Q44Simulation At what spatial or temporal scale does a given simulation operate? 6710010010033
Q46Degradation What degradation or damage mechanisms are represented for a material under service or test conditions? 671003310067
Q50Degradation What failure mode or end-of-life criterion was defined, and was it reached during the study period? 100676710033
Overall Average (%) 96.795.377.396.076.8

Task-Based Metrics

Given an ontology O and a set of competency-question SPARQL queries Q, OntoCheck derives an ontology term set To (all domain-namespace classes and properties in O) and a task term set Ta (all domain-namespace terms referenced across the queries in Q). Two complementary metrics are computed:

Relevance = |Ta ∩ To| / |Ta|    Accuracy = |Ta ∩ To| / |To|
  • Relevance measures the fraction of task-required terms that the ontology defines—does the ontology cover the vocabulary needed for the intended workload?
  • Accuracy measures the fraction of ontology terms utilized by the task queries—how much of the ontology is actually used by the workload?

Task-Agnostic Metrics (17 metrics)

Labeling
  • checkLabel — Human-readable identifiers
  • altLabelCheck — Synonym coverage
  • definitionCheck — Formal definitions
Structural
  • isolatedElements — Orphaned classes
  • classConnections — Disconnected subgraphs
  • missingDomainRange — Undeclared restrictions
  • leafNodeCheck — Leaf nodes
  • semanticConnection — Higher-level ontology grounding (e.g., CCO, BFO)
Accessibility
  • sparqlEndpoint — Endpoint reachability
  • rdfDump — RDF dump availability
  • humanLicense — License information
  • externalLinks — External link validity
Naming Convention
  • classCapitalCheck — Capitalization standards
  • classSpaceCheck — Spaces in identifiers
  • spellCheck — Spelling in labels/definitions
  • duplicateLabels — Duplicate label detection
  • searchClass — Class string search

Getting Started

Installation
pip install OntoCheck
Command-Line Interface
# Mode 1: Run task-agnostic metrics
ontocheck path/to/ontology.ttl --metrics all

# Mode 3: Task-based scientific assessment
ontocheck path/to/ontology.ttl \
    --mode 3 \
    --questions competency_questions.json \
    --domain-prefixes mds

# Mode 4: Cross-domain assessment
ontocheck xrd.ttl capacitors.ttl \
    --mode 4 \
    --questions cross_domain_questions.json \
    --domain-prefixes mds
Python API
from ontocheck import run_ontology_assessment, run_task_based_assessment

# Mode 1: Task-agnostic assessment
run_ontology_assessment(
    ttl_file="path/to/ontology.ttl",
    metrics="all",
)

# Mode 3: Task-based scientific assessment
result = run_task_based_assessment(
    ttl_files="path/to/ontology.ttl",
    questions="competency_questions.json",
    domain_prefixes=["mds"],
)

print(f"Relevance: {result['relevance']:.2%}")
print(f"Accuracy:  {result['accuracy']:.2%}")

Coming Soon

We are developing a web-based interface where users can evaluate ontologies against their own task term sets (Ta) or custom metrics — no coding required. Users will be able to configure evaluations, define custom metrics, and receive structured assessment reports directly in their browser.

Additional planned capabilities include:

  • Integration with LLMs for automated SPARQL generation from natural-language competency questions
  • Community-submitted competency question sets and metric contributions incorporated into the standard suite
  • User-defined feeders for pluggable ontology and knowledge graph sources

Built for the Community

OntoCheck is conceived as a community resource. We actively encourage collaboration, contribution of new metrics, and submission of domain competency question sets, in the shared interest of building robust, reusable semantic infrastructure for FAIR scientific data. Whether you want to evaluate ontologies in your domain, propose new assessment metrics, or contribute competency question benchmarks — we would love to hear from you.

Get in touch:
Roger H. FrenchYinghui Wu
Case Western Reserve University

Acknowledgements

We are grateful to the MDS-Onto user community, who are also early users of OntoCheck, across several universities and organizations whose feedback and real-world use cases have directly shaped the tool's development. This material is based upon research in the Materials Data Science for Stockpile Stewardship Center of Excellence (MDS3 COE), and supported by the Department of Energy's National Nuclear Security Administration under Award Number DE-NA0004104. All authors thank the CWRU University Technology Center and the UCF Advanced Research Computing Center for their High Performance Computing (HPC) resources, which were utilized in this work.

BibTeX

@article{kundu2025ontocheck,
  title={OntoCheck: Query-Driven Ontology Assessments for Scientific Domain Applications},
  author={Rishabh Kundu and Redad Mehdi and Van D. Tran and Ethan Frakes and
          Abhishek Daundkar and Maliesha Sumudumalie and Vibha S. Mandayam and
          Jacob A. Lample and Mengjie Li and Laura S. Bruckman and
          Erika I. Barcelos and Alp Sehirlioglu and Roger H. French and Yinghui Wu},
  year={2025},
  url={https://github.com/cwru-sdle/OntoCheck}
}