Skip to content

Commit bb042e7

Browse files
authored
Merge pull request #53 from linkml/docs/improve-examples
Improve documentation examples and comparison
2 parents aac698d + 1c1b76e commit bb042e7

17 files changed

Lines changed: 2113 additions & 657 deletions

docs/comparison.md

Lines changed: 305 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,305 @@
1-
# Relationship to OWL template languages
2-
3-
Although LinkML is robust and stable, LinkML-OWL is alpha software and incomplete. For now, to convert from TSV to OWL you should for now use a dedicated environment:
4-
5-
* dosdp-tools
6-
* robot-templates
7-
* ottr
8-
9-
For most purposes, these frameworks are also simpler and less
10-
overhead, they treat ontology generation as a *string templating*
11-
problem, and the emphasis is on the generation of axioms from
12-
templates over formal descriptions of the source input file.
13-
14-
In contrast, linkml-owl leverages the linkml framework for rich
15-
modeling of the source data structures used to generate the ontology,
16-
in particular:
17-
18-
* Clear computable description of [cardinality](https://linkml.io/linkml/schemas/slots.html#slot-cardinality) which columns are required, which columns are multivalued etc
19-
* Ability to use arbitrarily nested JSON trees or RDF graphs as input
20-
- TSVs can still be used for "flat" schemas
21-
* Use of [semantic enumerations](https://linkml.io/linkml/intro/tutorial06.html)
22-
- for example, a field value may be restricted to two ontology terms such as "off" or "on"
23-
* [Translation](https://linkml.io/linkml/schemas/generators.html) of source schema to other formalisms such as JSON-Schema, JSON-LD Contexts, shape languages, SQL, ...
24-
* Flexible [validation](https://linkml.io/linkml/data/validating-data.html) of source input files leveraging any combination of JSON-Schema, SHACL, or ShEx
25-
* Powerful abilities to infer missing values
26-
* For example, populate a stereotypical textual definition based on slot values
27-
* [Generation of markdown documentation](https://linkml.io/linkml/generators/markdown.html) from source schemas
28-
29-
An example of a domain where this kind of rich data modeling of input
30-
data includes generation of chemical entity ontologies from data. See
31-
the [chemrof](https://chemkg.github.io/chemrof/) project.
32-
33-
The overall philosophy of linkml-owl is **composability of distinct parts**. It is a relatively lightweight library that
34-
is only concerned with mapping or templating from a source dataset to OWL. It delegates other aspects to other libraries,
35-
in particular the following are seen as separate concerns:
36-
37-
- Validation of input
38-
- Organizing templates hierarchically
39-
- Specifying complex rules for inferring membership of a template
40-
- Template reuse, including reuse of core slots, and an [import](https://linkml.io/linkml/schemas/imports.html) mechanism
41-
- Generation of documentation
42-
- Automatic filling in of default values, and checking of consistency between dependent values
43-
- Lexical manipulation, including pre-populating labels, synomyms, and text definitions
1+
# Comparison with other frameworks
2+
3+
LinkML-OWL is one of several approaches for generating OWL ontologies from
4+
structured data. This page compares it to hand-written OWL, ROBOT templates,
5+
DOSDP (Dead Simple OWL Design Patterns), and OTTR (Reasonable Ontology Templates).
6+
7+
## At a glance
8+
9+
| Feature | Hand-written OWL | ROBOT templates | DOSDP | OTTR | **LinkML-OWL** |
10+
|---|---|---|---|---|---|
11+
| Input format | OWL syntax | TSV | TSV/YAML | RDF/stOTTR | YAML, JSON, TSV, RDF |
12+
| Nested/hierarchical data | No | No | No | Yes | **Yes** |
13+
| Schema validation | OWL profile checks | None | Minimal | Type checking | **Full (JSON-Schema, SHACL, ShEx)** |
14+
| Cardinality constraints | N/A | None | None | Limited | **Required, multivalued, ranges** |
15+
| Documentation generation | N/A | None | None | None | **Markdown, JSON-Schema docs** |
16+
| Enum/value set support | N/A | Manual | Limited | N/A | **Semantic enums with `meaning`** |
17+
| Template language | N/A | String substitution | YAML-based | stOTTR | **Jinja2 + annotations** |
18+
| Ecosystem | Protege | ROBOT CLI | DOSDP-tools | Lutra | **LinkML toolchain** |
19+
20+
## Side-by-side: defining an anatomy class
21+
22+
### Goal
23+
24+
Define "lens of camera-type eye" as equivalent to "lens AND part-of some camera-type eye."
25+
26+
### Hand-written OWL (Functional Syntax)
27+
28+
```owl
29+
Prefix( rdfs: = <http://www.w3.org/2000/01/rdf-schema#> )
30+
Prefix( UBERON: = <http://purl.obolibrary.org/obo/UBERON_> )
31+
Prefix( BFO: = <http://purl.obolibrary.org/obo/BFO_> )
32+
Prefix( IAO: = <http://purl.obolibrary.org/obo/IAO_> )
33+
34+
Ontology(
35+
Declaration( Class( UBERON:0004801 ) )
36+
Declaration( Class( UBERON:0000389 ) )
37+
Declaration( Class( UBERON:0000019 ) )
38+
Declaration( ObjectProperty( BFO:0000050 ) )
39+
40+
AnnotationAssertion( rdfs:label UBERON:0004801
41+
"lens of camera-type eye" )
42+
AnnotationAssertion( IAO:0000115 UBERON:0004801
43+
"The transparent structure in the eye that focuses light." )
44+
45+
EquivalentClasses(
46+
UBERON:0004801
47+
ObjectIntersectionOf(
48+
UBERON:0000389
49+
ObjectSomeValuesFrom( BFO:0000050 UBERON:0000019 )
50+
)
51+
)
52+
)
53+
```
54+
55+
**Downsides:** Verbose. Every entity needs explicit declarations. Prefix management
56+
is manual. Easy to make syntax errors. No validation of data integrity. Adding 100
57+
classes means 100 copies of this pattern.
58+
59+
### ROBOT template
60+
61+
**Template (TSV):**
62+
63+
| ID | LABEL | DEFINITION | EquivalentTo |
64+
|---|---|---|---|
65+
| ID | LABEL | A IAO:0000115 | EC % |
66+
| UBERON:0004801 | lens of camera-type eye | The transparent structure... | UBERON:0000389 and (BFO:0000050 some UBERON:0000019) |
67+
68+
```bash
69+
robot template --template lens.tsv --output lens.owl
70+
```
71+
72+
**Downsides:** The Manchester Syntax expression in the `EquivalentTo` column is
73+
a raw string — no validation until ROBOT parses it. No schema for the TSV
74+
(columns are ad-hoc). Nested data (e.g. parts-with-counts) cannot be represented
75+
in a flat TSV. No reuse of column definitions across templates.
76+
77+
### DOSDP
78+
79+
**Pattern (YAML):**
80+
81+
```yaml
82+
pattern_name: anatomical_structure_part_of
83+
classes:
84+
anatomical_structure: UBERON:0000061
85+
whole: UBERON:0000061
86+
relations:
87+
part_of: BFO:0000050
88+
89+
vars:
90+
anatomical_structure: "'anatomical_structure'"
91+
whole: "'anatomical_structure'"
92+
93+
name:
94+
text: "%s of %s"
95+
vars:
96+
- anatomical_structure
97+
- whole
98+
99+
def:
100+
text: "A %s that is part of a %s."
101+
vars:
102+
- anatomical_structure
103+
- whole
104+
105+
equivalentTo:
106+
text: "'anatomical_structure' and 'part_of' some 'whole'"
107+
vars:
108+
- anatomical_structure
109+
- whole
110+
```
111+
112+
**Data (TSV):**
113+
114+
| defined_class | anatomical_structure | whole |
115+
|---|---|---|
116+
| UBERON:0004801 | UBERON:0000389 | UBERON:0000019 |
117+
118+
**Downsides:** Patterns are expressed using a custom YAML DSL. The OWL
119+
expression is still a string template. No schema validation of the input TSV.
120+
Cannot handle nested data or variable-length lists of differentiae.
121+
122+
### LinkML-OWL
123+
124+
**Schema (YAML):**
125+
126+
```yaml
127+
classes:
128+
DefinedAnatomicalStructure:
129+
slots:
130+
- id
131+
- label
132+
- definition
133+
- genus
134+
- differentia_part_of
135+
slot_usage:
136+
genus:
137+
slot_uri: rdfs:subClassOf
138+
range: AnatomicalStructure
139+
required: true
140+
annotations:
141+
owl: EquivalentClasses, IntersectionOf
142+
differentia_part_of:
143+
slot_uri: BFO:0000050
144+
range: AnatomicalStructure
145+
annotations:
146+
owl: EquivalentClasses, IntersectionOf, ObjectSomeValuesFrom
147+
```
148+
149+
**Data (YAML):**
150+
151+
```yaml
152+
- id: UBERON:0004801
153+
label: lens of camera-type eye
154+
definition: The transparent structure in the eye that focuses light.
155+
genus: UBERON:0000389
156+
differentia_part_of: UBERON:0000019
157+
```
158+
159+
```bash
160+
linkml-data2owl -s anatomy-schema.yaml -C DefinedAnatomicalStructure data.yaml -o lens.ofn
161+
```
162+
163+
**Advantages:**
164+
165+
- The schema *is* the documentation: slot names, ranges, cardinality, and descriptions are all formal
166+
- Input data is validated against the schema before OWL generation
167+
- The same schema generates JSON-Schema, SHACL shapes, SQL DDL, and Markdown docs
168+
- Nested/hierarchical data is fully supported
169+
- Semantic enums map directly to ontology terms
170+
- OWL mapping is declarative (annotation keywords), not string-based
171+
172+
## Side-by-side: disease by location (Mondo-style pattern)
173+
174+
DOSDP was originally designed for Mondo disease patterns. This comparison
175+
shows the same "disease by anatomical location" pattern across all approaches.
176+
177+
### Goal
178+
179+
Define "brain disease" as equivalent to "nervous system disorder AND disease-has-location some brain."
180+
181+
### ROBOT template
182+
183+
| ID | LABEL | DEFINITION | EquivalentTo |
184+
|---|---|---|---|
185+
| ID | LABEL | A IAO:0000115 | EC % |
186+
| MONDO:0005560 | brain disease | A disease affecting the brain. | MONDO:0005071 and (RO:0004026 some UBERON:0000955) |
187+
188+
One row per class. The Manchester Syntax in `EquivalentTo` is a raw string — a typo
189+
(e.g. misspelling a CURIE) is only caught when ROBOT tries to parse it.
190+
191+
### DOSDP
192+
193+
**Pattern:**
194+
195+
```yaml
196+
pattern_name: disease_by_location
197+
classes:
198+
disease: MONDO:0000001
199+
location: UBERON:0000061
200+
relations:
201+
disease_has_location: RO:0004026
202+
203+
vars:
204+
disease: "'disease'"
205+
location: "'location'"
206+
207+
name:
208+
text: "%s disease"
209+
vars:
210+
- location
211+
212+
equivalentTo:
213+
text: "'disease' and 'disease_has_location' some 'location'"
214+
vars:
215+
- disease
216+
- location
217+
```
218+
219+
**Data (TSV):**
220+
221+
| defined_class | disease | location |
222+
|---|---|---|
223+
| MONDO:0005560 | MONDO:0005071 | UBERON:0000955 |
224+
225+
This is the system DOSDP was designed for, and it works well for flat, single-pattern
226+
TSVs. However:
227+
228+
- Each pattern requires its own YAML + TSV pair
229+
- No type checking on the TSV columns — UBERON vs MONDO CURIEs are just strings
230+
- Adding a second differentia (e.g. + cause) requires a new pattern file
231+
- The OWL expression is still embedded as a string template
232+
233+
### LinkML-OWL
234+
235+
**Schema:**
236+
237+
```yaml
238+
classes:
239+
DiseaseByLocation:
240+
slots:
241+
- id
242+
- label
243+
- definition
244+
- subclass_of
245+
- location
246+
slot_usage:
247+
subclass_of:
248+
required: true
249+
annotations:
250+
owl: EquivalentClasses, IntersectionOf
251+
location:
252+
required: true
253+
slot_uri: RO:0004026
254+
range: Disease
255+
annotations:
256+
owl: EquivalentClasses, IntersectionOf, ObjectSomeValuesFrom
257+
```
258+
259+
**Data:**
260+
261+
```yaml
262+
- id: MONDO:0005560
263+
label: brain disease
264+
definition: A disease affecting the brain.
265+
subclass_of:
266+
- MONDO:0005071
267+
location: UBERON:0000955
268+
```
269+
270+
**Advantages over DOSDP here:**
271+
272+
- `location` has a declared `range` — the schema enforces that this slot takes anatomy CURIEs
273+
- Adding a second differentia is just adding another slot to the same class — no new pattern file
274+
- The same schema can generate JSON-Schema for validating the data TSV/YAML before OWL generation
275+
- Multiple metaclasses (DiseaseByLocation, DiseaseByAgent, DiseaseWithInheritance) coexist in one schema
276+
with shared slots, inheritance, and consistent validation
277+
278+
## Where other tools may be better
279+
280+
LinkML-OWL is not always the right choice:
281+
282+
- **Simple, flat term lists**: If you have a simple TSV of terms with labels and
283+
parent classes, ROBOT templates are simpler with less setup overhead.
284+
- **Existing DOSDP infrastructure**: Projects already using DOSDP-tools with
285+
established patterns may not benefit from migrating.
286+
- **Pure OWL editing**: For interactive, visual ontology editing, Protege
287+
remains the standard tool.
288+
- **RDF-native workflows**: If your source data is already RDF, tools like
289+
SPARQL CONSTRUCT or OTTR may integrate more naturally.
290+
291+
## When to choose LinkML-OWL
292+
293+
LinkML-OWL is most valuable when:
294+
295+
1. **Your source data is complex** — nested structures, variable-length lists,
296+
cross-references between entities
297+
2. **You need data validation** — catch errors before they become bad axioms
298+
3. **You generate multiple outputs** — the same schema can produce OWL, JSON-Schema,
299+
SQL, documentation, and SHACL shapes
300+
4. **You have design patterns that repeat** — define the pattern once in the schema,
301+
instantiate it many times in data
302+
5. **You want to auto-generate labels and definitions** — use `string_serialization`
303+
to populate annotation slots from other slot values
304+
6. **Your axioms are complex** — Jinja templates handle arbitrary OWL Functional Syntax,
305+
including GCIs, axiom annotations, and nested class expressions
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
- id: UBERON:0000468
2+
label: organism
3+
definition: A biological entity that is an individual living system.
4+
5+
- id: UBERON:0000033
6+
label: head
7+
definition: The upper part of the body.
8+
part_of:
9+
- UBERON:0000468
10+
11+
- id: UBERON:0000970
12+
label: eye
13+
definition: An organ of sight.
14+
synonym:
15+
- oculus
16+
part_of:
17+
- UBERON:0000033
18+
19+
- id: UBERON:0000019
20+
label: camera-type eye
21+
definition: An eye that forms an image through a single lens.
22+
part_of:
23+
- UBERON:0000033
24+
develops_from:
25+
- UBERON:0003072
26+
27+
- id: UBERON:0003072
28+
label: optic cup
29+
definition: An embryonic structure that develops into the eye.

0 commit comments

Comments
 (0)