|
1 | | -# Relationship to OWL template languages |
2 | | - |
3 | | -Although LinkML is robust and stable, LinkML-OWL is alpha software and incomplete. For now, to convert from TSV to OWL you should for now use a dedicated environment: |
4 | | - |
5 | | - * dosdp-tools |
6 | | - * robot-templates |
7 | | - * ottr |
8 | | - |
9 | | -For most purposes, these frameworks are also simpler and less |
10 | | -overhead, they treat ontology generation as a *string templating* |
11 | | -problem, and the emphasis is on the generation of axioms from |
12 | | -templates over formal descriptions of the source input file. |
13 | | - |
14 | | -In contrast, linkml-owl leverages the linkml framework for rich |
15 | | -modeling of the source data structures used to generate the ontology, |
16 | | -in particular: |
17 | | - |
18 | | - * Clear computable description of [cardinality](https://linkml.io/linkml/schemas/slots.html#slot-cardinality) which columns are required, which columns are multivalued etc |
19 | | - * Ability to use arbitrarily nested JSON trees or RDF graphs as input |
20 | | - - TSVs can still be used for "flat" schemas |
21 | | - * Use of [semantic enumerations](https://linkml.io/linkml/intro/tutorial06.html) |
22 | | - - for example, a field value may be restricted to two ontology terms such as "off" or "on" |
23 | | - * [Translation](https://linkml.io/linkml/schemas/generators.html) of source schema to other formalisms such as JSON-Schema, JSON-LD Contexts, shape languages, SQL, ... |
24 | | - * Flexible [validation](https://linkml.io/linkml/data/validating-data.html) of source input files leveraging any combination of JSON-Schema, SHACL, or ShEx |
25 | | - * Powerful abilities to infer missing values |
26 | | - * For example, populate a stereotypical textual definition based on slot values |
27 | | - * [Generation of markdown documentation](https://linkml.io/linkml/generators/markdown.html) from source schemas |
28 | | - |
29 | | -An example of a domain where this kind of rich data modeling of input |
30 | | -data includes generation of chemical entity ontologies from data. See |
31 | | -the [chemrof](https://chemkg.github.io/chemrof/) project. |
32 | | - |
33 | | -The overall philosophy of linkml-owl is **composability of distinct parts**. It is a relatively lightweight library that |
34 | | -is only concerned with mapping or templating from a source dataset to OWL. It delegates other aspects to other libraries, |
35 | | -in particular the following are seen as separate concerns: |
36 | | - |
37 | | -- Validation of input |
38 | | -- Organizing templates hierarchically |
39 | | -- Specifying complex rules for inferring membership of a template |
40 | | -- Template reuse, including reuse of core slots, and an [import](https://linkml.io/linkml/schemas/imports.html) mechanism |
41 | | -- Generation of documentation |
42 | | -- Automatic filling in of default values, and checking of consistency between dependent values |
43 | | -- Lexical manipulation, including pre-populating labels, synomyms, and text definitions |
| 1 | +# Comparison with other frameworks |
| 2 | + |
| 3 | +LinkML-OWL is one of several approaches for generating OWL ontologies from |
| 4 | +structured data. This page compares it to hand-written OWL, ROBOT templates, |
| 5 | +DOSDP (Dead Simple OWL Design Patterns), and OTTR (Reasonable Ontology Templates). |
| 6 | + |
| 7 | +## At a glance |
| 8 | + |
| 9 | +| Feature | Hand-written OWL | ROBOT templates | DOSDP | OTTR | **LinkML-OWL** | |
| 10 | +|---|---|---|---|---|---| |
| 11 | +| Input format | OWL syntax | TSV | TSV/YAML | RDF/stOTTR | YAML, JSON, TSV, RDF | |
| 12 | +| Nested/hierarchical data | No | No | No | Yes | **Yes** | |
| 13 | +| Schema validation | OWL profile checks | None | Minimal | Type checking | **Full (JSON-Schema, SHACL, ShEx)** | |
| 14 | +| Cardinality constraints | N/A | None | None | Limited | **Required, multivalued, ranges** | |
| 15 | +| Documentation generation | N/A | None | None | None | **Markdown, JSON-Schema docs** | |
| 16 | +| Enum/value set support | N/A | Manual | Limited | N/A | **Semantic enums with `meaning`** | |
| 17 | +| Template language | N/A | String substitution | YAML-based | stOTTR | **Jinja2 + annotations** | |
| 18 | +| Ecosystem | Protege | ROBOT CLI | DOSDP-tools | Lutra | **LinkML toolchain** | |
| 19 | + |
| 20 | +## Side-by-side: defining an anatomy class |
| 21 | + |
| 22 | +### Goal |
| 23 | + |
| 24 | +Define "lens of camera-type eye" as equivalent to "lens AND part-of some camera-type eye." |
| 25 | + |
| 26 | +### Hand-written OWL (Functional Syntax) |
| 27 | + |
| 28 | +```owl |
| 29 | +Prefix( rdfs: = <http://www.w3.org/2000/01/rdf-schema#> ) |
| 30 | +Prefix( UBERON: = <http://purl.obolibrary.org/obo/UBERON_> ) |
| 31 | +Prefix( BFO: = <http://purl.obolibrary.org/obo/BFO_> ) |
| 32 | +Prefix( IAO: = <http://purl.obolibrary.org/obo/IAO_> ) |
| 33 | + |
| 34 | +Ontology( |
| 35 | + Declaration( Class( UBERON:0004801 ) ) |
| 36 | + Declaration( Class( UBERON:0000389 ) ) |
| 37 | + Declaration( Class( UBERON:0000019 ) ) |
| 38 | + Declaration( ObjectProperty( BFO:0000050 ) ) |
| 39 | + |
| 40 | + AnnotationAssertion( rdfs:label UBERON:0004801 |
| 41 | + "lens of camera-type eye" ) |
| 42 | + AnnotationAssertion( IAO:0000115 UBERON:0004801 |
| 43 | + "The transparent structure in the eye that focuses light." ) |
| 44 | + |
| 45 | + EquivalentClasses( |
| 46 | + UBERON:0004801 |
| 47 | + ObjectIntersectionOf( |
| 48 | + UBERON:0000389 |
| 49 | + ObjectSomeValuesFrom( BFO:0000050 UBERON:0000019 ) |
| 50 | + ) |
| 51 | + ) |
| 52 | +) |
| 53 | +``` |
| 54 | + |
| 55 | +**Downsides:** Verbose. Every entity needs explicit declarations. Prefix management |
| 56 | +is manual. Easy to make syntax errors. No validation of data integrity. Adding 100 |
| 57 | +classes means 100 copies of this pattern. |
| 58 | + |
| 59 | +### ROBOT template |
| 60 | + |
| 61 | +**Template (TSV):** |
| 62 | + |
| 63 | +| ID | LABEL | DEFINITION | EquivalentTo | |
| 64 | +|---|---|---|---| |
| 65 | +| ID | LABEL | A IAO:0000115 | EC % | |
| 66 | +| UBERON:0004801 | lens of camera-type eye | The transparent structure... | UBERON:0000389 and (BFO:0000050 some UBERON:0000019) | |
| 67 | + |
| 68 | +```bash |
| 69 | +robot template --template lens.tsv --output lens.owl |
| 70 | +``` |
| 71 | + |
| 72 | +**Downsides:** The Manchester Syntax expression in the `EquivalentTo` column is |
| 73 | +a raw string — no validation until ROBOT parses it. No schema for the TSV |
| 74 | +(columns are ad-hoc). Nested data (e.g. parts-with-counts) cannot be represented |
| 75 | +in a flat TSV. No reuse of column definitions across templates. |
| 76 | + |
| 77 | +### DOSDP |
| 78 | + |
| 79 | +**Pattern (YAML):** |
| 80 | + |
| 81 | +```yaml |
| 82 | +pattern_name: anatomical_structure_part_of |
| 83 | +classes: |
| 84 | + anatomical_structure: UBERON:0000061 |
| 85 | + whole: UBERON:0000061 |
| 86 | +relations: |
| 87 | + part_of: BFO:0000050 |
| 88 | + |
| 89 | +vars: |
| 90 | + anatomical_structure: "'anatomical_structure'" |
| 91 | + whole: "'anatomical_structure'" |
| 92 | + |
| 93 | +name: |
| 94 | + text: "%s of %s" |
| 95 | + vars: |
| 96 | + - anatomical_structure |
| 97 | + - whole |
| 98 | + |
| 99 | +def: |
| 100 | + text: "A %s that is part of a %s." |
| 101 | + vars: |
| 102 | + - anatomical_structure |
| 103 | + - whole |
| 104 | + |
| 105 | +equivalentTo: |
| 106 | + text: "'anatomical_structure' and 'part_of' some 'whole'" |
| 107 | + vars: |
| 108 | + - anatomical_structure |
| 109 | + - whole |
| 110 | +``` |
| 111 | +
|
| 112 | +**Data (TSV):** |
| 113 | +
|
| 114 | +| defined_class | anatomical_structure | whole | |
| 115 | +|---|---|---| |
| 116 | +| UBERON:0004801 | UBERON:0000389 | UBERON:0000019 | |
| 117 | +
|
| 118 | +**Downsides:** Patterns are expressed using a custom YAML DSL. The OWL |
| 119 | +expression is still a string template. No schema validation of the input TSV. |
| 120 | +Cannot handle nested data or variable-length lists of differentiae. |
| 121 | +
|
| 122 | +### LinkML-OWL |
| 123 | +
|
| 124 | +**Schema (YAML):** |
| 125 | +
|
| 126 | +```yaml |
| 127 | +classes: |
| 128 | + DefinedAnatomicalStructure: |
| 129 | + slots: |
| 130 | + - id |
| 131 | + - label |
| 132 | + - definition |
| 133 | + - genus |
| 134 | + - differentia_part_of |
| 135 | + slot_usage: |
| 136 | + genus: |
| 137 | + slot_uri: rdfs:subClassOf |
| 138 | + range: AnatomicalStructure |
| 139 | + required: true |
| 140 | + annotations: |
| 141 | + owl: EquivalentClasses, IntersectionOf |
| 142 | + differentia_part_of: |
| 143 | + slot_uri: BFO:0000050 |
| 144 | + range: AnatomicalStructure |
| 145 | + annotations: |
| 146 | + owl: EquivalentClasses, IntersectionOf, ObjectSomeValuesFrom |
| 147 | +``` |
| 148 | +
|
| 149 | +**Data (YAML):** |
| 150 | +
|
| 151 | +```yaml |
| 152 | +- id: UBERON:0004801 |
| 153 | + label: lens of camera-type eye |
| 154 | + definition: The transparent structure in the eye that focuses light. |
| 155 | + genus: UBERON:0000389 |
| 156 | + differentia_part_of: UBERON:0000019 |
| 157 | +``` |
| 158 | +
|
| 159 | +```bash |
| 160 | +linkml-data2owl -s anatomy-schema.yaml -C DefinedAnatomicalStructure data.yaml -o lens.ofn |
| 161 | +``` |
| 162 | + |
| 163 | +**Advantages:** |
| 164 | + |
| 165 | +- The schema *is* the documentation: slot names, ranges, cardinality, and descriptions are all formal |
| 166 | +- Input data is validated against the schema before OWL generation |
| 167 | +- The same schema generates JSON-Schema, SHACL shapes, SQL DDL, and Markdown docs |
| 168 | +- Nested/hierarchical data is fully supported |
| 169 | +- Semantic enums map directly to ontology terms |
| 170 | +- OWL mapping is declarative (annotation keywords), not string-based |
| 171 | + |
| 172 | +## Side-by-side: disease by location (Mondo-style pattern) |
| 173 | + |
| 174 | +DOSDP was originally designed for Mondo disease patterns. This comparison |
| 175 | +shows the same "disease by anatomical location" pattern across all approaches. |
| 176 | + |
| 177 | +### Goal |
| 178 | + |
| 179 | +Define "brain disease" as equivalent to "nervous system disorder AND disease-has-location some brain." |
| 180 | + |
| 181 | +### ROBOT template |
| 182 | + |
| 183 | +| ID | LABEL | DEFINITION | EquivalentTo | |
| 184 | +|---|---|---|---| |
| 185 | +| ID | LABEL | A IAO:0000115 | EC % | |
| 186 | +| MONDO:0005560 | brain disease | A disease affecting the brain. | MONDO:0005071 and (RO:0004026 some UBERON:0000955) | |
| 187 | + |
| 188 | +One row per class. The Manchester Syntax in `EquivalentTo` is a raw string — a typo |
| 189 | +(e.g. misspelling a CURIE) is only caught when ROBOT tries to parse it. |
| 190 | + |
| 191 | +### DOSDP |
| 192 | + |
| 193 | +**Pattern:** |
| 194 | + |
| 195 | +```yaml |
| 196 | +pattern_name: disease_by_location |
| 197 | +classes: |
| 198 | + disease: MONDO:0000001 |
| 199 | + location: UBERON:0000061 |
| 200 | +relations: |
| 201 | + disease_has_location: RO:0004026 |
| 202 | + |
| 203 | +vars: |
| 204 | + disease: "'disease'" |
| 205 | + location: "'location'" |
| 206 | + |
| 207 | +name: |
| 208 | + text: "%s disease" |
| 209 | + vars: |
| 210 | + - location |
| 211 | + |
| 212 | +equivalentTo: |
| 213 | + text: "'disease' and 'disease_has_location' some 'location'" |
| 214 | + vars: |
| 215 | + - disease |
| 216 | + - location |
| 217 | +``` |
| 218 | +
|
| 219 | +**Data (TSV):** |
| 220 | +
|
| 221 | +| defined_class | disease | location | |
| 222 | +|---|---|---| |
| 223 | +| MONDO:0005560 | MONDO:0005071 | UBERON:0000955 | |
| 224 | +
|
| 225 | +This is the system DOSDP was designed for, and it works well for flat, single-pattern |
| 226 | +TSVs. However: |
| 227 | +
|
| 228 | +- Each pattern requires its own YAML + TSV pair |
| 229 | +- No type checking on the TSV columns — UBERON vs MONDO CURIEs are just strings |
| 230 | +- Adding a second differentia (e.g. + cause) requires a new pattern file |
| 231 | +- The OWL expression is still embedded as a string template |
| 232 | +
|
| 233 | +### LinkML-OWL |
| 234 | +
|
| 235 | +**Schema:** |
| 236 | +
|
| 237 | +```yaml |
| 238 | +classes: |
| 239 | + DiseaseByLocation: |
| 240 | + slots: |
| 241 | + - id |
| 242 | + - label |
| 243 | + - definition |
| 244 | + - subclass_of |
| 245 | + - location |
| 246 | + slot_usage: |
| 247 | + subclass_of: |
| 248 | + required: true |
| 249 | + annotations: |
| 250 | + owl: EquivalentClasses, IntersectionOf |
| 251 | + location: |
| 252 | + required: true |
| 253 | + slot_uri: RO:0004026 |
| 254 | + range: Disease |
| 255 | + annotations: |
| 256 | + owl: EquivalentClasses, IntersectionOf, ObjectSomeValuesFrom |
| 257 | +``` |
| 258 | +
|
| 259 | +**Data:** |
| 260 | +
|
| 261 | +```yaml |
| 262 | +- id: MONDO:0005560 |
| 263 | + label: brain disease |
| 264 | + definition: A disease affecting the brain. |
| 265 | + subclass_of: |
| 266 | + - MONDO:0005071 |
| 267 | + location: UBERON:0000955 |
| 268 | +``` |
| 269 | +
|
| 270 | +**Advantages over DOSDP here:** |
| 271 | +
|
| 272 | +- `location` has a declared `range` — the schema enforces that this slot takes anatomy CURIEs |
| 273 | +- Adding a second differentia is just adding another slot to the same class — no new pattern file |
| 274 | +- The same schema can generate JSON-Schema for validating the data TSV/YAML before OWL generation |
| 275 | +- Multiple metaclasses (DiseaseByLocation, DiseaseByAgent, DiseaseWithInheritance) coexist in one schema |
| 276 | + with shared slots, inheritance, and consistent validation |
| 277 | + |
| 278 | +## Where other tools may be better |
| 279 | + |
| 280 | +LinkML-OWL is not always the right choice: |
| 281 | + |
| 282 | +- **Simple, flat term lists**: If you have a simple TSV of terms with labels and |
| 283 | + parent classes, ROBOT templates are simpler with less setup overhead. |
| 284 | +- **Existing DOSDP infrastructure**: Projects already using DOSDP-tools with |
| 285 | + established patterns may not benefit from migrating. |
| 286 | +- **Pure OWL editing**: For interactive, visual ontology editing, Protege |
| 287 | + remains the standard tool. |
| 288 | +- **RDF-native workflows**: If your source data is already RDF, tools like |
| 289 | + SPARQL CONSTRUCT or OTTR may integrate more naturally. |
| 290 | + |
| 291 | +## When to choose LinkML-OWL |
| 292 | + |
| 293 | +LinkML-OWL is most valuable when: |
| 294 | + |
| 295 | +1. **Your source data is complex** — nested structures, variable-length lists, |
| 296 | + cross-references between entities |
| 297 | +2. **You need data validation** — catch errors before they become bad axioms |
| 298 | +3. **You generate multiple outputs** — the same schema can produce OWL, JSON-Schema, |
| 299 | + SQL, documentation, and SHACL shapes |
| 300 | +4. **You have design patterns that repeat** — define the pattern once in the schema, |
| 301 | + instantiate it many times in data |
| 302 | +5. **You want to auto-generate labels and definitions** — use `string_serialization` |
| 303 | + to populate annotation slots from other slot values |
| 304 | +6. **Your axioms are complex** — Jinja templates handle arbitrary OWL Functional Syntax, |
| 305 | + including GCIs, axiom annotations, and nested class expressions |
0 commit comments