Summary
I'd like to propose adding support to pycsw for two UK metadata profiles that are widely used in UK government and research catalogues but not currently covered:
- UK GEMINI 2.3 — the [Association for Geographic Information][agi]'s Geo-spatial Metadata Interoperability Initiative; the UK government's
recommended standard for describing geographic data, used across data.gov.uk, Defra Data Services Platform, Natural England, BGS, EIDC, etc.
- MEDIN 3.1.2 — the [Marine Environmental Data and Information Network][medin]'s discovery metadata standard, used by the MEDIN portal
and the UK marine Data Archive Centres (BODC, BGS, UKHO, MEDIN, etc.) MEDIN is explicitly a marine profile of GEMINI 2.3.
Both are technically constraint-based profiles of ISO 19115/19139, which is already supported in pycsw via the apiso profile. Both are validated via
Schematron rather than by extending the XSD. Both are published under [CC-BY 4.0][ccby].
Before opening any code PRs I'd like to agree the shape with maintainers, because the contribution touches the profile-plugin registration logic.
Background: how GEMINI and MEDIN relate to ISO 19139
Both standards:
- Use the existing ISO 19139 XML schema and namespace (
http://www.isotc211.org/2005/gmd). MEDIN's [MedinMetadataProfile_v3.1.2.xsd][medin-xsd] is a one-line wrapper that just xs:includes the gmd application schema and uses targetNamespace="http://www.isotc211.org/2005/gmd" — i.e. it adds nothing to the schema. GEMINI ships no XSD at all.
- Are rooted at
<gmd:MD_Metadata>, i.e. they share apiso's typename.
- Are enforced entirely via Schematron rules layered on top of ISO 19139:
- [GEMINI 2.3 Schematron][gemini-sch] (CC-BY 4.0, by BGS under contract to AGI)
- [MEDIN 3.1.2 Schematron][medin-sch] (CC-BY 4.0, by SeaZone Solutions for MEDIN)
- Stack: MEDIN explicitly maps onto and tightens GEMINI (the MEDIN repo ships
a MEDIN_3.1.2_GEMINI_2.3_INSPIRE_Mapping.xlsx), which in turn tightens
INSPIRE / ISO 19139.
Structurally this is similar to the optional INSPIRE extension already inside apiso, toggled by config['metadata']['inspire']['enabled'].
The architectural question
Because GEMINI, MEDIN, and apiso all share the same XML namespace and the same typename (gmd:MD_Metadata), they cannot cleanly coexist as three independent Profile subclasses without revisiting the registration logic.
Profile.__init__ does:
model['typenames'][self.typename] = self.repository
which overwrites on each load — so whichever profile is listed last in server.profiles: wins for typename-based dispatch.
CSW clients differentiate profiles by outputSchema, not typename. The practical question is therefore: what URI does each UK profile advertise as
its outputSchema?
- ISO 19139 /
apiso uses http://www.isotc211.org/2005/gmd.
- GEMINI does not formally define one. There's prior art in some UK
catalogues using https://www.agi.org.uk/gemini/2.3 or similar; happy to
coordinate with AGI for a blessed URI if useful.
- MEDIN similarly does not define one;
https://medin.org.uk/discovery-metadata/3.1.2
would be a candidate.
Design options
I see three viable shapes:
A. Two independent sibling profiles (ukgemini, medin), each mirroring iso19115p3.py's structure (~600 lines each).
- Pros: lowest risk to existing code; reviewable independently; matches the existing precedent of how
iso19115p3 was added.
- Cons: substantial duplication of
apiso's queryables — 90%+ of the gmd XPath mappings are identical across all three. Doesn't solve the typename
collision.
B. A lightweight intermediate base (e.g. iso19139_constrained) that apiso, ukgemini, and medin can compose from. The differences
(outputSchema URI, additional queryables, schematron file(s), extended- capabilities content) become subclass overrides.
- Pros: removes duplication; gives a clean home for future ISO-19139 national profiles (Spain NEM, Germany GDI-DE, Australia ANZLIC, etc.).
- Cons: meaningful change to
apiso internals; needs careful migration to avoid breaking existing apiso deployments.
C. A schematron-validation extension inside apiso, modelled on the existing INSPIRE block. Configurable via something like
config['metadata']['profiles']['ukgemini'] / ['medin'] toggles that load the relevant .sch files for transactional validation but reuse apiso's
outputSchema and queryables.
- Pros: smallest, least invasive PR.
- Cons: doesn't surface GEMINI/MEDIN as discoverable profiles in
GetCapabilities; reduces value for CSW clients that negotiate by
outputSchema.
I'd lean toward B done lightweight, staged across three PRs:
- Refactor: introduce the small base / extension mechanism with
apiso migrated to it as a no-op (existing apiso behaviour preserved, all
existing apiso tests still pass).
- Add
ukgemini as the first new consumer, with Schematron validation and the AGI sample records as test fixtures.
- Add
medin, sharing the GEMINI machinery and adding MEDIN-specific Schematron rules plus a small set of MEDIN-specific queryables (notably
vertical extent, which MEDIN makes mandatory).
But I'd very much welcome a steer before committing to a shape — happy to do A or C instead if that's preferred.
Specific questions for maintainers
- Preferred design option (A, B, C, or something else)?
- Synthetic outputSchema URIs — is there an established convention in pycsw for profiles whose source standard doesn't formally define one?
- Schematron at runtime — pycsw already depends on lxml, which provides
etree.Schematron. Acceptable to validate via lxml on transactional
inserts / harvest, or is there a preferred validation hook?
- Bundled Schematron files — both upstream Schematrons are CC-BY 4.0. compatible with MIT redistribution provided attribution is preserved.
Acceptable to bundle them under pycsw/plugins/profiles/<name>/schemas/, or should they be optional downloads at deploy time? My preference would be to bundle, with a NOTICE file recording attribution.
write_record synthesis — should the GEMINI/MEDIN profiles re-synthesize records from queryables for non-full esn (as apiso
does), or only echo stored XML? GEMINI/MEDIN have constraints on gmd:metadataStandardName / gmd:metadataStandardVersion that synthesis would need to respect, and a few mandatory elements (e.g. MEDIN vertical extent) that don't currently live in pycsw's core mappings.
Scope of this issue
This issue is to agree the shape only. Once there's a steer on the questions above I'll open the implementing PR(s) and link them back here.
References
pycsw
Summary
I'd like to propose adding support to pycsw for two UK metadata profiles that are widely used in UK government and research catalogues but not currently covered:
recommended standard for describing geographic data, used across data.gov.uk, Defra Data Services Platform, Natural England, BGS, EIDC, etc.
and the UK marine Data Archive Centres (BODC, BGS, UKHO, MEDIN, etc.) MEDIN is explicitly a marine profile of GEMINI 2.3.
Both are technically constraint-based profiles of ISO 19115/19139, which is already supported in pycsw via the
apisoprofile. Both are validated viaSchematron rather than by extending the XSD. Both are published under [CC-BY 4.0][ccby].
Before opening any code PRs I'd like to agree the shape with maintainers, because the contribution touches the profile-plugin registration logic.
Background: how GEMINI and MEDIN relate to ISO 19139
Both standards:
http://www.isotc211.org/2005/gmd). MEDIN's [MedinMetadataProfile_v3.1.2.xsd][medin-xsd] is a one-line wrapper that justxs:includes the gmd application schema and usestargetNamespace="http://www.isotc211.org/2005/gmd"— i.e. it adds nothing to the schema. GEMINI ships no XSD at all.<gmd:MD_Metadata>, i.e. they shareapiso's typename.a
MEDIN_3.1.2_GEMINI_2.3_INSPIRE_Mapping.xlsx), which in turn tightensINSPIRE / ISO 19139.
Structurally this is similar to the optional INSPIRE extension already inside
apiso, toggled byconfig['metadata']['inspire']['enabled'].The architectural question
Because GEMINI, MEDIN, and
apisoall share the same XML namespace and the same typename (gmd:MD_Metadata), they cannot cleanly coexist as three independentProfilesubclasses without revisiting the registration logic.Profile.__init__does:which overwrites on each load — so whichever profile is listed last in
server.profiles:wins for typename-based dispatch.CSW clients differentiate profiles by outputSchema, not typename. The practical question is therefore: what URI does each UK profile advertise as
its
outputSchema?apisouseshttp://www.isotc211.org/2005/gmd.catalogues using
https://www.agi.org.uk/gemini/2.3or similar; happy tocoordinate with AGI for a blessed URI if useful.
https://medin.org.uk/discovery-metadata/3.1.2would be a candidate.
Design options
I see three viable shapes:
A. Two independent sibling profiles (
ukgemini,medin), each mirroringiso19115p3.py's structure (~600 lines each).iso19115p3was added.apiso's queryables — 90%+ of the gmd XPath mappings are identical across all three. Doesn't solve the typenamecollision.
B. A lightweight intermediate base (e.g.
iso19139_constrained) thatapiso,ukgemini, andmedincan compose from. The differences(outputSchema URI, additional queryables, schematron file(s), extended- capabilities content) become subclass overrides.
apisointernals; needs careful migration to avoid breaking existing apiso deployments.C. A schematron-validation extension inside
apiso, modelled on the existing INSPIRE block. Configurable via something likeconfig['metadata']['profiles']['ukgemini']/['medin']toggles that load the relevant.schfiles for transactional validation but reuseapiso'soutputSchema and queryables.
GetCapabilities; reduces value for CSW clients that negotiate byoutputSchema.
I'd lean toward B done lightweight, staged across three PRs:
apisomigrated to it as a no-op (existing apiso behaviour preserved, allexisting apiso tests still pass).
ukgeminias the first new consumer, with Schematron validation and the AGI sample records as test fixtures.medin, sharing the GEMINI machinery and adding MEDIN-specific Schematron rules plus a small set of MEDIN-specific queryables (notablyvertical extent, which MEDIN makes mandatory).
But I'd very much welcome a steer before committing to a shape — happy to do A or C instead if that's preferred.
Specific questions for maintainers
etree.Schematron. Acceptable to validate via lxml on transactionalinserts / harvest, or is there a preferred validation hook?
Acceptable to bundle them under
pycsw/plugins/profiles/<name>/schemas/, or should they be optional downloads at deploy time? My preference would be to bundle, with aNOTICEfile recording attribution.write_recordsynthesis — should the GEMINI/MEDIN profiles re-synthesize records from queryables for non-fullesn (asapisodoes), or only echo stored XML? GEMINI/MEDIN have constraints on
gmd:metadataStandardName/gmd:metadataStandardVersionthat synthesis would need to respect, and a few mandatory elements (e.g. MEDIN vertical extent) that don't currently live in pycsw's core mappings.Scope of this issue
This issue is to agree the shape only. Once there's a steer on the questions above I'll open the implementing PR(s) and link them back here.
References
pycsw
apiso: https://github.com/geopython/pycsw/blob/master/pycsw/plugins/profiles/apiso/apiso.pyiso19115p3: https://github.com/geopython/pycsw/blob/master/pycsw/plugins/profiles/iso19115p3/iso19115p3.pyUK GEMINI
MEDIN
https://github.com/medin-marine/Discovery-Standard-public-content
[agi]: https://www.agi.org.uk/uk-gemini/
[medin]: https://medin.org.uk/data-standards/medin-discovery-metadata-standard
[ccby]: https://creativecommons.org/licenses/by/4.0/
[gemini-sch]: https://github.com/agiorguk/gemini-schematron
[medin-sch]: https://github.com/medin-marine/Discovery-Standard-public-content/tree/main/medin_schematron
[medin-xsd]: https://github.com/medin-marine/Discovery-Standard-public-content/blob/main/medin_xsd/MedinMetadataProfile_v3.1.2.xsd