Goal

Add support for ingesting an HCX (hierarchical CX2) network, resolve the chromosome for every gene referenced in the hierarchy, and annotate each hierarchy node with a tally of chromosome counts for all genes contained in that node’s subtree. Emit the updated hierarchy as CX2 wrapped in an updateNetwork action.

Current State (quick read)

chromloc/annotate.py is being added to replace the former updatenetwork demo logic.
cli.py will expose HCX-specific options (interaction UUID, gene attrs, chrom map path).
Hierarchy handling, NDEx fetching, chromosome tallies, and styling are implemented per design.

Assumptions & Open Questions

HCX hierarchy nodes carry HCX::members that list node IDs belonging to a separate “interaction network”.
The NDEx UUID of that interaction network is read in priority order: node attribute HCX::interactionNetworkUUID, else network attribute HCX::interactionNetworkUUID.
Gene identifiers come from those member node records in the interaction network (preferred attribute order: represents, then name, then configurable fallback).
Hierarchy structure follows the HCX spec (parent/child via hierarchy aspect). We treat leaves as biological entities (genes) and internal nodes as groupings.
Species defaults to human (GRCh38); chromosome set assumed human (chr1–chr22, chrX, chrY, chrM). Keep option to override species for future-proofing, but default behavior is human.
Chromosome resolution should work offline when given a local annotation file, and online (optional) via MyGene.info or Ensembl REST when permitted.

Proposed CLI/User Inputs

--interaction-gene-attrs: comma list of candidate attributes on the interaction network nodes (default represents,name).
--interaction-uuid-attr: attribute name to read NDEx UUID when present on hierarchy nodes (default HCX::interactionNetworkUUID).
--interaction-uuid-network-attr: network-level fallback attribute for UUID (default same as above).
--gene-list-delim: delimiter when a single string holds multiple genes (default ,).
--species: organism code (default human / GRCh38; controls chromosome list used for per-chromosome attributes).
--chrom-map: path to local TSV/JSON gene→chromosome map (default to packaged JSON built by build_gene_chr_map.py).
--cache-file: optional JSON cache for resolved genes → chromosome (primarily speeds repeated runs).
--ndex-server / --ndex-username / --ndex-password or token env var for fetching interaction networks when not embedded.
--progress: keep existing progress messages; extend with additional checkpoints.

Data Source Strategy

Implement a single resolver:
- LocalResolver loads a gene→chromosome map from TSV (symbol\tchr) or JSON dict (the default path generated by scripts/build_gene_chr_map.py).
Add normalization: uppercase symbols, strip version suffixes, handle aliases via optional HGNC alias file when provided.
Caching layer writes/reads JSON to minimize repeat lookups within a run (mainly to avoid re-reading large maps).
NDEx fetcher utility: small wrapper around ndex2.client.Ndex2 to pull interaction networks by UUID; respects server/auth flags and caches results on disk (optional) to avoid repeated downloads.

Hierarchy Processing Design

Load hierarchy network with RawCX2NetworkFactory.
Determine interaction network UUID for each hierarchy node:
- If node has HCX::interactionNetworkUUID, use it; else use network attribute HCX::interactionNetworkUUID.
- If no UUID is found, error (unless --allow-missing-uuid is set).
Fetch interaction network (CX/HCX) from NDEx using the UUID (respect server/auth CLI options). Cache the fetched network by UUID to avoid repeat downloads.
Index interaction nodes → genes:
- Build a map interaction_node_id -> gene_id using the first non-empty attribute from --interaction-gene-attrs.
- Support gene-list-delim when the attribute is a string containing multiple symbols.
Extract hierarchy structure: read HCX hierarchy aspect to build parent→children adjacency and find root(s). Validate that metaData lists hierarchy elements.
Collect node genes via HCX::members:
- For each hierarchy node, read HCX::members (list of interaction node IDs). Translate each ID to gene(s) via the interaction map. Drop missing IDs with a warning counter.
Post-order aggregation:
- Traverse hierarchy bottom-up, accumulating each child’s gene set into the parent (union or multiset? use multiset/counts to retain multiplicity; store both set for uniqueness and count for frequency).
- Store per-node: gene_ids (set), gene_counts (Counter), and chrom_counts (dict of chr → count).
Chromosome resolution:
- For every unique gene encountered across the hierarchy, resolve chromosome once via resolver + cache.
- Treat unknowns as chrUnknown and track separately to avoid data loss.
Tally building:
- For each node, translate its gene_counts to chromosome counts using the shared resolution table.
- Representation (CX2-friendly):
  - Per-chromosome numeric node attributes: chr1_count, chr2_count, …, chr22_count, chrX_count, chrY_count, chrM_count (integers, 0 if none). If species overrides human, generate the chromosome list from the resolver’s metadata instead.
  - Summary attribute chromosomeCounts (list_of_string) e.g., chr1=12, chrX=3, chrUnknown=1.
  - Optional chromosomeCountsJson (string) with JSON-encoded map for consumers that prefer structured data.
- Update attributeDeclarations to declare all new per-chromosome attributes plus the summary attributes.
Emit updated CX2 preserving all existing aspects; only add/modify node attributes and metaData counts.

Error Handling & Validation

Warn (progress message) when hierarchy aspect is missing or malformed; fail fast unless --allow-nonhierarchy is passed.
Error when HCX::interactionNetworkUUID is missing at both node and network level unless --allow-missing-uuid is set.
Warn when HCX::members references an interaction node ID not found in the fetched network; count and report these.
Warn when a member node lacks any usable gene attribute; count unresolved genes separately.
Record the number of unresolved genes; add a network attribute chromosomeMappingUnresolved for transparency.
Validate that attributeDeclarations include the new attributes; adjust metaData elementCount where needed.

Testing Plan

Unit tests (pytest) for:
- Extraction of NDEx UUID from node vs network attributes.
- Mapping HCX::members IDs to genes given an interaction network fixture.
- Gene extraction from nodes (single vs list attribute, delimiter handling).
- Resolver behavior with local map (including alias handling and normalization).
- Hierarchy aggregation (post-order) on a small synthetic HCX fixture.
- CX2 mutation: ensure node attributes are added and declared, metaData updated.
Integration smoke test: run CLI on foo.cx2 (non-hierarchical) expecting a clear error message.

Package/Module Renaming

Rename Python package directory from updatenetwork to chromloc (or similar concise, descriptive name aligned with the repo).
Rename update.py → annotate.py (core logic) and cli.py → cli.py (keep name, update imports) to reflect chromosome-location annotation purpose.
Update pyproject.toml entry points and any imports to the new package/module names.
Keep console script name user-facing, e.g., cytoscape-chromloc.

Implementation Steps

Add resolver module (chromloc/resolvers.py) with Local/MyGene/Ensembl implementations + cache wrapper.
Add NDEx fetcher helper (chromloc/ndex_fetch.py) to download and cache interaction networks by UUID.
Extend annotate.py to:
- Parse CLI options (passed through run_update).
- Build hierarchy graph, aggregate genes, resolve chromosomes, and annotate nodes.
- Keep existing progress messaging pattern.
- Generate per-chromosome count attributes for the human chromosome list by default (extendable for other species).
Update cli.py for new arguments and pass them into run_update (now importing from chromloc.annotate).
Update requirements.txt to include ndex2 client (and requests if the fetcher needs it explicitly).
Write pytest suite under tests/ with fixtures for hierarchy CX2 snippets.
Document usage in README.rst (new options, expected outputs, example command).
Add a small helper script scripts/build_gene_chr_map.py that:
- Downloads/reads a public gene annotation source (e.g., NCBI gene info or Ensembl BioMart export) for human.
- Produces a UTF-8 TSV symbol\tchromosome and a JSON dictionary {symbol: chromosome}.
- Normalizes symbols to uppercase, strips version suffixes, and skips non-standard chromosomes unless --include-alt is passed.
- Accepts --input (path or URL), --output-tsv, --output-json, --species (default human), and --allow-ambiguous (keep first mapping when multiple chromosomes exist).
- Includes a brief README section on how to refresh the dataset and recommended refresh cadence.

Visualization (pie chart style)

Add/modify visualProperties aspect to define a node fill pie chart using the per-chromosome count attributes.
- Use Cytoscape pie-chart custom graphics: map chr1_count…chr22_count, chrX_count, chrY_count, chrM_count to pie slices in a fixed order for consistent color mapping.
- Define a discrete palette (e.g., ColorBrewer qualitative set) and store in style properties.
- Ensure style references attributes by name; if attributes are missing or zero, slices render as 0.
If an existing style is present, append a new style entry (do not overwrite). Provide a network attribute flag chromosomePieStyleApplied=true to avoid reapplying.
Keep sizing/opacity unchanged; only set node fill (custom graphics) to the pie chart. Text/labels remain as-is.

Risks & Mitigations

Chromosome map staleness: provide documented script to refresh the local map and recommend cadence.
Hierarchy variability: tolerate missing aspects by clear error; make attribute names configurable.
CX2 type constraints: use list_of_string + JSON string to stay within supported types.
Performance on large trees: single-pass post-order traversal with memoized resolutions; batch NDEx fetches via caching.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goal

Current State (quick read)

Assumptions & Open Questions

Proposed CLI/User Inputs

Data Source Strategy

Hierarchy Processing Design

Error Handling & Validation

Testing Plan

Package/Module Renaming

Implementation Steps

Visualization (pie chart style)

Risks & Mitigations

FilesExpand file tree

design.md

Latest commit

History

design.md

File metadata and controls

Goal

Current State (quick read)

Assumptions & Open Questions

Proposed CLI/User Inputs

Data Source Strategy

Hierarchy Processing Design

Error Handling & Validation

Testing Plan

Package/Module Renaming

Implementation Steps

Visualization (pie chart style)

Risks & Mitigations