OntoCheck: Query-Driven Ontology Assessments

TL;DR

OntoCheck is an open-source Python tool that automates ontology quality assessment. It combines 17 structural metrics with a novel query-driven methodology that measures how well an ontology supports your SPARQL queries—yielding Relevance (vocabulary coverage) and Accuracy (utilization density) scores. Install via pip install OntoCheck and evaluate any OWL/RDF ontology against your competency questions in minutes.

Abstract

As materials science and other fields increasingly adopt FAIR data principles, ontologies have become essential for encoding the semantics of scientific investigations. Yet, evaluating ontology quality remains a manual, technically demanding bottleneck. Current frameworks emphasize structural correctness but fail to assess practical utility against the real-world queries posed by domain scientists. To address this, we introduce OntoCheck, an open-source tool unifying domain-agnostic structural metrics with a novel, query-driven assessment methodology. By analyzing SPARQL queries derived from natural-language competency questions, OntoCheck compares the required query terms against an ontology's full vocabulary to yield complementary metrics for vocabulary coverage and utilization density. Crucially, this empowers domain scientists and data engineers to assert the practical usage and suitability of an ontology based on their specific application needs. We validate OntoCheck at scale against DBpedia via the KGQA dataset, and demonstrate its utility across several materials science and geospatial ontologies, automating quality assessment for the reproducible deployment of semantic infrastructure in FAIR workflows.

Why Task-Based Ontology Assessment?

Traditional ontology assessment focuses on structural correctness—valid URIs, no circular dependencies, no orphaned classes. But a structurally sound ontology can still fail a researcher's use case if it lacks the vocabulary needed for their queries.

Consider a domain scientist who wants to know the maximum grain size across perovskite samples. Because the ontology contains terms relevant to the query (i.e., mds:GrainSize, mds:Sample, and mds:PerovskiteStructure), a query engine is able to retrieve the answer. However, a query for the average phase fraction of the alpha-titanium phase of Ti64 samples is unanswerable because the ontology does not contain terms relevant to the query. This example clearly demonstrates the importance of metrics that measure an ontology's task-based competency.

Successful vs. unsuccessful query-answering case — **Figure 1.** A successful versus unsuccessful query-answering case caused by differences in an ontology's fitness for use. Green ovals, blue rounded boxes, and white rounded boxes denote classes, instances, and literals, respectively. Query 1 (Q1) returns the desired answers because the ontology contains the terms used in the query, whereas Query 2 (Q2) is unanswerable.

Key Contributions

Unified metric suite. OntoCheck implements 17 task-agnostic metrics spanning labeling, structural, accessibility, and naming convention categories, consolidating and extending prior assessment frameworks into a single open-source tool.
Novel task-based assessment methodology. By ingesting competency questions expressed as SPARQL queries and systematically comparing the required vocabulary against the terms defined in a candidate ontology, OntoCheck derives two complementary metrics:
- Relevance — quantifies vocabulary coverage (what fraction of task-required terms the ontology defines)
- Accuracy — quantifies utilization density (what fraction of ontology terms the task queries actually use)
Validated at scale. We verify OntoCheck through large-scale evaluation on the DBpedia knowledge graph using the LC-QuAD benchmark (5,000 queries, 98.94% Relevance), and demonstrate applicability across materials science and geospatial ontologies across four configurable assessment modes.

Assessment Modes

OntoCheck is configurable in four assessment modes, controlled by a declarative configuration C = (O, Q, M, G), where O specifies ontologies to be assessed, Q is a set of competency queries, M refers to the evaluation metrics, and G is a ground-truth knowledge graph.

Mode	Name	Configuration	Description
1	Task-agnostic	`(O, -, M, -)`	Structural, labeling, accessibility, and naming metrics
2	Task-specific Web	`(O, Q, M, G)`	Validation against KGQA benchmarks (e.g., LC-QuAD / DBpedia)
3	Task-based Scientific	`(O, Q, M, -)`	Domain ontology vs. competency questions^†
4	Cross-Domain	`(O[], Q, M, -)`	Merged ontologies vs. cross-domain questions^†

^† Knowledge graphs backed by an ontology can also be evaluated in Modes 3 and 4.

Assessment Examples

Mode 1: Task-Agnostic Assessment

The semanticConnection metric is applied to a subset (Equipment) of the built-in Capacitors ontology (54 classes, 8 root hierarchy chains). The assessment reveals that 6 of 8 root classes are grounded in CCO or BFO, while two domain-specific roots — mds:Thermocouple and mds:VoltageRating — remain disconnected, directly flagging the portions of the ontology that require further alignment for integration with external knowledge graphs.

Mode 2: Task-Specific Web Ontology (DBpedia / LC-QuAD)

OntoCheck is validated at scale against the DBpedia ontology (3,814 domain classes and properties) using the LC-QuAD 1.0 benchmark (5,000 queries, 38 workloads). Across all queries, 470 unique ontology terms appear in T_a, of which 465 are present in T_o, yielding an overall Relevance of 98.94% and an Accuracy of 12.19%. Per-workload analysis shows 28 of 38 workloads achieve 100% Relevance.

Mode 3: Task-Based Scientific Ontology Assessment

OntoCheck is demonstrated across four materials science sub-domain ontologies and two geospatial ontologies, each evaluated against 20 competency questions. Representative results are shown below.

Materials Science Ontologies

Ontology	CQ	Competency Question	\|T_a\|	\|T_a∩T_o\|	Rel.	Acc.
EBSD (89 terms)	CQ1	What is the material composition of this sample?	2	1	50%	1.1%
	CQ16	What are the polishing voltage, motor speed setting, and frequency mode of the electropolish system?	8	8	100%	9.0%
	CQ20	Which samples from the same batch manufactured by LPBF with laser power above 200W had total surface removal depth less than 100 μm?	7	6	86%	6.7%
Capacitors (87 terms)	CQ1	What types of electrical connectors are available and what is the function of each?	1	1	100%	1.1%
	CQ9	What are the manufacturer-specified dimensions of the capacitor?	4	4	100%	4.6%
	CQ19	What solder process type, solder material, and iron temperature were used?	4	4	100%	4.6%
MatProc (127 terms)	CQ5	What parameters are required for Laser Powder Bed Fusion?	2	2	100%	1.6%
	CQ18	Which process is known by the abbreviation “SPS”?	0	0	0%	0.0%
	CQ20	Which processes share both a temperature parameter and a pressure parameter?	3	3	100%	2.4%
XRD (318 terms)	CQ7	What is the dislocation density obtained from diffraction line profile analysis?	2	2	100%	0.6%
	CQ10	What crystal structure does the alpha-titanium phase have?	3	3	100%	0.9%
	CQ15	What types of solid-state phase transformations are defined?	1	1	100%	0.3%

Geospatial Ontologies

Ontology	CQ	Competency Question	\|T_a\|	\|T_a∩T_o\|	Rel.	Acc.
GeoOutage (10 terms)	CQ10	Which NTL image is associated with this outage record?	2	2	100%	20.0%
	CQ15	Which outage records in Lee County, Florida had more than 100 outages around Hurricane Ian?	4	4	100%	40.0%
	CQ19	Which NTL image and outage map correspond to the highest-outage record for each county?	5	5	100%	50.0%
Geospatial (100 terms)	CQ2	What satellite is the ASTER sensor mounted on?	4	4	100%	4.0%
	CQ10	What satellites were launched in 2020?	3	3	100%	3.0%
	CQ12	What geospatial data exists in the neighborhood of geohash “dpnq”?	11	11	100%	11.0%

Mode 4: Cross-Domain Ontology Assessment

OntoCheck supports cross-domain assessment by merging multiple ontologies. Two case studies demonstrate this capability for scientists working at the intersection of multiple sub-domains.

Case Study 1: XRD + Capacitors (404 terms)

CQ	Competency Question	Rel.	Acc.
CQ2	What X-ray detector was used in this experiment?	100%	0.5%
CQ6	What are the lattice parameters determined by Rietveld refinement of this diffraction pattern?	100%	0.5%
CQ10	What is the capacitance value and its tolerance for the capacitor being tested?	100%	0.7%

Case Study 2: XRD + Geospatial (417 terms)

CQ	Competency Question	Rel.	Acc.
CQ9	What bands contain near infrared information from the ASTER sensor?	100%	1.2%
CQ5	What is the chemical formula of this sample?	100%	0.5%
CQ6	What are the lattice parameters determined by Rietveld refinement of this diffraction pattern?	100%	0.5%

Comparative Assessment of Materials Science Ontologies

OntoCheck enables direct comparison of competing ontologies against a shared 50-question benchmark spanning seven categories. Ten representative questions are shown below.

ID	Category	Competency Question	MDS-Onto	PMDCo	AM-Onto	EMMO	CHAMEO
Q02	Material Identity	What is the identifier/name assigned to each material specimen, and by whom was it assigned?	100	100	0	100	33
Q13	Manufacturing	What intermediate processing steps occurred between raw material acquisition and the final specimen, and in what sequence?	100	67	67	100	100
Q21	Characterization	What specimen preparation steps were carried out before a characterization measurement?	100	100	0	100	100
Q23	Characterization	What is the measurement uncertainty or tolerance associated with a reported characterization result?	100	100	0	100	50
Q35	Provenance	Who performed the manufacturing or characterization process?	67	100	33	100	67
Q36	Provenance	What institution or facility owns or operates the equipment used in a characterization or manufacturing experiment?	100	67	33	67	100
Q43	Simulation	What physical assumptions or simplifications were made in a computational model of a material or process?	67	67	100	67	67
Q44	Simulation	At what spatial or temporal scale does a given simulation operate?	67	100	100	100	33
Q46	Degradation	What degradation or damage mechanisms are represented for a material under service or test conditions?	67	100	33	100	67
Q50	Degradation	What failure mode or end-of-life criterion was defined, and was it reached during the study period?	100	67	67	100	33
Overall Average (%)			96.7	95.3	77.3	96.0	76.8

Task-Based Metrics

Given an ontology O and a set of competency-question SPARQL queries Q, OntoCheck derives an ontology term set T_o (all domain-namespace classes and properties in O) and a task term set T_a (all domain-namespace terms referenced across the queries in Q). Two complementary metrics are computed:

Relevance = |T_a ∩ T_o| / |T_a| Accuracy = |T_a ∩ T_o| / |T_o|

Relevance measures the fraction of task-required terms that the ontology defines—does the ontology cover the vocabulary needed for the intended workload?
Accuracy measures the fraction of ontology terms utilized by the task queries—how much of the ontology is actually used by the workload?

Task-Agnostic Metrics (17 metrics)

Labeling

checkLabel — Human-readable identifiers
altLabelCheck — Synonym coverage
definitionCheck — Formal definitions

Structural

isolatedElements — Orphaned classes
classConnections — Disconnected subgraphs
missingDomainRange — Undeclared restrictions
leafNodeCheck — Leaf nodes
semanticConnection — Higher-level ontology grounding (e.g., CCO, BFO)

Accessibility

sparqlEndpoint — Endpoint reachability
rdfDump — RDF dump availability
humanLicense — License information
externalLinks — External link validity

Naming Convention

classCapitalCheck — Capitalization standards
classSpaceCheck — Spaces in identifiers
spellCheck — Spelling in labels/definitions
duplicateLabels — Duplicate label detection
searchClass — Class string search

Getting Started

Installation

pip install OntoCheck

Command-Line Interface

# Mode 1: Run task-agnostic metrics
ontocheck path/to/ontology.ttl --metrics all

# Mode 3: Task-based scientific assessment
ontocheck path/to/ontology.ttl \
    --mode 3 \
    --questions competency_questions.json \
    --domain-prefixes mds

# Mode 4: Cross-domain assessment
ontocheck xrd.ttl capacitors.ttl \
    --mode 4 \
    --questions cross_domain_questions.json \
    --domain-prefixes mds

Python API

from ontocheck import run_ontology_assessment, run_task_based_assessment

# Mode 1: Task-agnostic assessment
run_ontology_assessment(
    ttl_file="path/to/ontology.ttl",
    metrics="all",
)

# Mode 3: Task-based scientific assessment
result = run_task_based_assessment(
    ttl_files="path/to/ontology.ttl",
    questions="competency_questions.json",
    domain_prefixes=["mds"],
)

print(f"Relevance: {result['relevance']:.2%}")
print(f"Accuracy:  {result['accuracy']:.2%}")

Coming Soon

We are developing a web-based interface where users can evaluate ontologies against their own task term sets (T_a) or custom metrics — no coding required. Users will be able to configure evaluations, define custom metrics, and receive structured assessment reports directly in their browser.

Additional planned capabilities include:

Integration with LLMs for automated SPARQL generation from natural-language competency questions
Community-submitted competency question sets and metric contributions incorporated into the standard suite
User-defined feeders for pluggable ontology and knowledge graph sources

Built for the Community

OntoCheck is conceived as a community resource. We actively encourage collaboration, contribution of new metrics, and submission of domain competency question sets, in the shared interest of building robust, reusable semantic infrastructure for FAIR scientific data. Whether you want to evaluate ontologies in your domain, propose new assessment metrics, or contribute competency question benchmarks — we would love to hear from you.

Get in touch:
Roger H. French • Yinghui Wu
Case Western Reserve University

Acknowledgements

We are grateful to the MDS-Onto user community, who are also early users of OntoCheck, across several universities and organizations whose feedback and real-world use cases have directly shaped the tool's development. This material is based upon research in the Materials Data Science for Stockpile Stewardship Center of Excellence (MDS³ COE), and supported by the Department of Energy's National Nuclear Security Administration under Award Number DE-NA0004104. All authors thank the CWRU University Technology Center and the UCF Advanced Research Computing Center for their High Performance Computing (HPC) resources, which were utilized in this work.

BibTeX

@article{kundu2025ontocheck,
  title={OntoCheck: Query-Driven Ontology Assessments for Scientific Domain Applications},
  author={Rishabh Kundu and Redad Mehdi and Van D. Tran and Ethan Frakes and
          Abhishek Daundkar and Maliesha Sumudumalie and Vibha S. Mandayam and
          Jacob A. Lample and Mengjie Li and Laura S. Bruckman and
          Erika I. Barcelos and Alp Sehirlioglu and Roger H. French and Yinghui Wu},
  year={2025},
  url={https://github.com/cwru-sdle/OntoCheck}
}