Add support for ingesting an HCX (hierarchical CX2) network, resolve the chromosome for every gene referenced in the hierarchy, and annotate each hierarchy node with a tally of chromosome counts for all genes contained in that node’s subtree. Emit the updated hierarchy as CX2 wrapped in an updateNetwork action.
chromloc/annotate.pyis being added to replace the formerupdatenetworkdemo logic.cli.pywill expose HCX-specific options (interaction UUID, gene attrs, chrom map path).- Hierarchy handling, NDEx fetching, chromosome tallies, and styling are implemented per design.
- HCX hierarchy nodes carry
HCX::membersthat list node IDs belonging to a separate “interaction network”. - The NDEx UUID of that interaction network is read in priority order: node attribute
HCX::interactionNetworkUUID, else network attributeHCX::interactionNetworkUUID. - Gene identifiers come from those member node records in the interaction network (preferred attribute order:
represents, thenname, then configurable fallback). - Hierarchy structure follows the HCX spec (parent/child via hierarchy aspect). We treat leaves as biological entities (genes) and internal nodes as groupings.
- Species defaults to human (GRCh38); chromosome set assumed human (chr1–chr22, chrX, chrY, chrM). Keep option to override species for future-proofing, but default behavior is human.
- Chromosome resolution should work offline when given a local annotation file, and online (optional) via MyGene.info or Ensembl REST when permitted.
--interaction-gene-attrs: comma list of candidate attributes on the interaction network nodes (defaultrepresents,name).--interaction-uuid-attr: attribute name to read NDEx UUID when present on hierarchy nodes (defaultHCX::interactionNetworkUUID).--interaction-uuid-network-attr: network-level fallback attribute for UUID (default same as above).--gene-list-delim: delimiter when a single string holds multiple genes (default,).--species: organism code (defaulthuman/ GRCh38; controls chromosome list used for per-chromosome attributes).--chrom-map: path to local TSV/JSON gene→chromosome map (default to packaged JSON built bybuild_gene_chr_map.py).--cache-file: optional JSON cache for resolved genes → chromosome (primarily speeds repeated runs).--ndex-server/--ndex-username/--ndex-passwordor token env var for fetching interaction networks when not embedded.--progress: keep existing progress messages; extend with additional checkpoints.
- Implement a single resolver:
LocalResolverloads a gene→chromosome map from TSV (symbol\tchr) or JSON dict (the default path generated byscripts/build_gene_chr_map.py).
- Add normalization: uppercase symbols, strip version suffixes, handle aliases via optional HGNC alias file when provided.
- Caching layer writes/reads JSON to minimize repeat lookups within a run (mainly to avoid re-reading large maps).
- NDEx fetcher utility: small wrapper around
ndex2.client.Ndex2to pull interaction networks by UUID; respects server/auth flags and caches results on disk (optional) to avoid repeated downloads.
- Load hierarchy network with
RawCX2NetworkFactory. - Determine interaction network UUID for each hierarchy node:
- If node has
HCX::interactionNetworkUUID, use it; else use network attributeHCX::interactionNetworkUUID. - If no UUID is found, error (unless
--allow-missing-uuidis set).
- If node has
- Fetch interaction network (CX/HCX) from NDEx using the UUID (respect server/auth CLI options). Cache the fetched network by UUID to avoid repeat downloads.
- Index interaction nodes → genes:
- Build a map
interaction_node_id -> gene_idusing the first non-empty attribute from--interaction-gene-attrs. - Support
gene-list-delimwhen the attribute is a string containing multiple symbols.
- Build a map
- Extract hierarchy structure: read HCX hierarchy aspect to build parent→children adjacency and find root(s). Validate that metaData lists hierarchy elements.
- Collect node genes via HCX::members:
- For each hierarchy node, read
HCX::members(list of interaction node IDs). Translate each ID to gene(s) via the interaction map. Drop missing IDs with a warning counter.
- For each hierarchy node, read
- Post-order aggregation:
- Traverse hierarchy bottom-up, accumulating each child’s gene set into the parent (union or multiset? use multiset/counts to retain multiplicity; store both set for uniqueness and count for frequency).
- Store per-node:
gene_ids(set),gene_counts(Counter), andchrom_counts(dict of chr → count).
- Chromosome resolution:
- For every unique gene encountered across the hierarchy, resolve chromosome once via resolver + cache.
- Treat unknowns as
chrUnknownand track separately to avoid data loss.
- Tally building:
- For each node, translate its
gene_countsto chromosome counts using the shared resolution table. - Representation (CX2-friendly):
- Per-chromosome numeric node attributes:
chr1_count,chr2_count, …,chr22_count,chrX_count,chrY_count,chrM_count(integers, 0 if none). If species overrides human, generate the chromosome list from the resolver’s metadata instead. - Summary attribute
chromosomeCounts(list_of_string) e.g.,chr1=12,chrX=3,chrUnknown=1. - Optional
chromosomeCountsJson(string) with JSON-encoded map for consumers that prefer structured data.
- Per-chromosome numeric node attributes:
- Update
attributeDeclarationsto declare all new per-chromosome attributes plus the summary attributes.
- For each node, translate its
- Emit updated CX2 preserving all existing aspects; only add/modify node attributes and metaData counts.
- Warn (progress message) when hierarchy aspect is missing or malformed; fail fast unless
--allow-nonhierarchyis passed. - Error when
HCX::interactionNetworkUUIDis missing at both node and network level unless--allow-missing-uuidis set. - Warn when
HCX::membersreferences an interaction node ID not found in the fetched network; count and report these. - Warn when a member node lacks any usable gene attribute; count unresolved genes separately.
- Record the number of unresolved genes; add a network attribute
chromosomeMappingUnresolvedfor transparency. - Validate that attributeDeclarations include the new attributes; adjust metaData
elementCountwhere needed.
- Unit tests (pytest) for:
- Extraction of NDEx UUID from node vs network attributes.
- Mapping
HCX::membersIDs to genes given an interaction network fixture. - Gene extraction from nodes (single vs list attribute, delimiter handling).
- Resolver behavior with local map (including alias handling and normalization).
- Hierarchy aggregation (post-order) on a small synthetic HCX fixture.
- CX2 mutation: ensure node attributes are added and declared, metaData updated.
- Integration smoke test: run CLI on
foo.cx2(non-hierarchical) expecting a clear error message.
- Rename Python package directory from
updatenetworktochromloc(or similar concise, descriptive name aligned with the repo). - Rename
update.py→annotate.py(core logic) andcli.py→cli.py(keep name, update imports) to reflect chromosome-location annotation purpose. - Update
pyproject.tomlentry points and any imports to the new package/module names. - Keep console script name user-facing, e.g.,
cytoscape-chromloc.
- Add resolver module (
chromloc/resolvers.py) with Local/MyGene/Ensembl implementations + cache wrapper. - Add NDEx fetcher helper (
chromloc/ndex_fetch.py) to download and cache interaction networks by UUID. - Extend
annotate.pyto:- Parse CLI options (passed through
run_update). - Build hierarchy graph, aggregate genes, resolve chromosomes, and annotate nodes.
- Keep existing progress messaging pattern.
- Generate per-chromosome count attributes for the human chromosome list by default (extendable for other species).
- Parse CLI options (passed through
- Update
cli.pyfor new arguments and pass them intorun_update(now importing fromchromloc.annotate). - Update
requirements.txtto includendex2client (andrequestsif the fetcher needs it explicitly). - Write pytest suite under
tests/with fixtures for hierarchy CX2 snippets. - Document usage in
README.rst(new options, expected outputs, example command). - Add a small helper script
scripts/build_gene_chr_map.pythat:- Downloads/reads a public gene annotation source (e.g., NCBI gene info or Ensembl BioMart export) for human.
- Produces a UTF-8 TSV
symbol\tchromosomeand a JSON dictionary{symbol: chromosome}. - Normalizes symbols to uppercase, strips version suffixes, and skips non-standard chromosomes unless
--include-altis passed. - Accepts
--input(path or URL),--output-tsv,--output-json,--species(default human), and--allow-ambiguous(keep first mapping when multiple chromosomes exist). - Includes a brief README section on how to refresh the dataset and recommended refresh cadence.
- Add/modify
visualPropertiesaspect to define a node fill pie chart using the per-chromosome count attributes.- Use Cytoscape pie-chart custom graphics: map
chr1_count…chr22_count,chrX_count,chrY_count,chrM_countto pie slices in a fixed order for consistent color mapping. - Define a discrete palette (e.g., ColorBrewer qualitative set) and store in style properties.
- Ensure style references attributes by name; if attributes are missing or zero, slices render as 0.
- Use Cytoscape pie-chart custom graphics: map
- If an existing style is present, append a new style entry (do not overwrite). Provide a network attribute flag
chromosomePieStyleApplied=trueto avoid reapplying. - Keep sizing/opacity unchanged; only set node fill (custom graphics) to the pie chart. Text/labels remain as-is.
- Chromosome map staleness: provide documented script to refresh the local map and recommend cadence.
- Hierarchy variability: tolerate missing aspects by clear error; make attribute names configurable.
- CX2 type constraints: use
list_of_string+ JSON string to stay within supported types. - Performance on large trees: single-pass post-order traversal with memoized resolutions; batch NDEx fetches via caching.