Expand distributed indexing, match numpy indexing scheme by ClaudiaComito · Pull Request #938 · helmholtz-analytics/heat

ClaudiaComito · 2022-03-24T05:23:18Z

Description

This pull request introduces a significant overhaul of distributed indexing within dndarray.py, specifically targeting the __getitem__ and __setitem__ methods. The primary objective is to achieve full NumPy indexing compliance in a distributed environment while minimizing MPI overhead and memory footprint.

The logic has been refactored to identify zero-communication paths ("early out"), and route heavy unordered advanced indexing through (hopefully?) optimized communication.

The following table shows the distribution semantics of the DNDarray indexing operations. The first column shows the operation, the second column shows the distribution semantics of the key, and the third column shows the distribution semantics of the value. The last column shows the distribution semantics of the result.

Array distributed	Operation	Key distributed	Value distributed	Result distributed
no	`array[key]`	no	--	no
no	`array[key]`	yes	--	yes
yes	`array[key]`	no	--	no
yes	`array[key]`	yes	--	yes
no	`array[key] = value`	no	no	no
no	`array[key] = value`	no	yes (but why?)	no
no	`array[key] = value`	yes (but why?)	no	no
no	`array[key] = value`	yes	yes	no
yes	`array[key] = value`	no	no	yes
yes	`array[key] = value`	no	yes	yes
yes	`array[key] = value`	yes	no	yes
yes	`array[key] = value`	yes	yes	yes

Routing logic

The flowchart (DRAFT) maps out the MPI routing decisions based on the evaluated state of the indexing key.

graph TD
    classDef default fill:#ffffff,stroke:#ced4da,stroke-width:1px,color:#212529,rx:4px,ry:4px;
    classDef terminal fill:#343a40,stroke:#343a40,stroke-width:2px,color:#ffffff,rx:15px,ry:15px;
    classDef decision fill:#e3f2fd,stroke:#4dabf7,stroke-width:2px,color:#000000;
    classDef highlight fill:#e8f5e9,stroke:#69b3a2,stroke-width:2px,color:#212529;

    Start(["Start: arr[key] or arr[key] = value"]):::terminal
    Norm["Normalize Key (e.g., Bool Masks -> Int Indices)"]
    ProcessKey["__process_key() Expands dims, aligns shapes"]
    StateCalc{"Calculate State: split_key_is_ordered"}:::decision
    
    Start --> Norm
    Norm --> ProcessKey
    ProcessKey --> StateCalc
    
    Branch1{"Single item on split axis? (root != None)"}:::decision
    Branch0{"Operation?"}:::decision
    BranchNeg1{"Operation?"}:::decision
    
    StateCalc -- "1: Ordered / Ascending" --> Branch1
    StateCalc -- "0: Unordered / Random" --> Branch0
    StateCalc -- "-1: Descending Slice" --> BranchNeg1
    
    subgraph Ordered ["Fast Path: Ordered Indexing (split_key_is_ordered = 1)"]
        style Ordered fill:#f8f9fa,stroke:#dee2e6,stroke-width:1px,stroke-dasharray: 5 5,color:#495057
        Branch1 -- "Yes: Get" --> RootGet["Root fetches local data"]
        RootGet --> Bcast["MPI.Bcast to all ranks"]
        
        Branch1 -- "Yes: Set" --> RootSet["Root updates local data in-place"]
        
        Branch1 -- "No" --> FastLocal["Pure Basic Slicing: Apply locally, NO MPI needed"]:::highlight
    end
    
    subgraph Descending ["Descending Slices (split_key_is_ordered = -1)"]
        style Descending fill:#f8f9fa,stroke:#dee2e6,stroke-width:1px,stroke-dasharray: 5 5,color:#495057
        BranchNeg1 -- "Set" --> FlipVal["Flip 'value' array"]
        FlipVal --> MatchDist["Align distribution map"]
        MatchDist --> SetLocal["-1 Local Set"]
        
        BranchNeg1 -- "Get" --> UnorderedFallback["Converts to arange -> falls back to unordered"]
    end
    
    subgraph Unordered ["Heavy Path: Unordered Advanced Indexing (split_key_is_ordered = 0)"]
        style Unordered fill:#f8f9fa,stroke:#dee2e6,stroke-width:1px,stroke-dasharray: 5 5,color:#495057
        
        Branch0 -- "Get" --> G_Allgather["MPI.Allgather: Share recv_counts"]
        G_Allgather --> G_SendIdx["MPI.Isend/Recv: Send requested indices to owning ranks"]
        G_SendIdx --> G_Fetch["Owning ranks fetch local data"]
        G_Fetch --> G_SendData["MPI.Isend/Recv: Send requested data back"]
        G_SendData --> G_Reconstruct["Reconstruct recv_buf on original rank"]
        
        Branch0 -- "Set" --> S_CheckVal{"Is 'value' distributed?"}:::decision
        S_CheckVal -- "No / Scalar" --> S_LocalMask["_advanced_setitem_unordered_local (Apply locally)"]
        
        S_CheckVal -- "Yes" --> S_Align["Redistribute 'value' to match 'key' distribution"]
        S_Align --> S_AllToAll["MPI.Alltoallv: Exchange data AND indices"]
        S_AllToAll --> S_ApplyRecv["Apply received data to local elements"]
    end

    Bcast --> End(["Return / Complete"]):::terminal
    RootSet --> End
    FastLocal --> End
    SetLocal --> End
    UnorderedFallback -.-> Branch0
    G_Reconstruct --> End
    S_LocalMask --> End
    S_ApplyRecv --> End

Main changes

abstracts key parsing and alignment into a centralized private method that handles dimension expansion, shape broadcasting, and classifies the state of the indexing operation to determine network routing.
enforces standard last-assignment-wins semantics for advanced indexing duplicates on cuda tensors by generating linear indices and mapping local occurrence priorities (thanks @Hakdag97 ).
intercepts multidimensional and single-dimensional boolean masks early in the pipeline, converting them to explicit integer configurations locally to prevent unnecessary cross-rank broadcasting.
maps and isolates zero-communication assignments during slice operations, executing completely local pytorch tensor modifications when the requested indices and data already reside on the active rank.
structures unordered read requests by compiling global communication matrices, enabling the dispatch of non-blocking Isend and Recv calls strictly between nodes that own the requested indices and those requesting them.
forces distribution alignment during set operations if the right-hand side assignment value is also distributed, utilizing an Alltoallv operation to shuffle payload data and target indices concurrently.
introduces a value broadcasting helper function to natively squeeze or expand the dimensions of scalar or tensor payloads to match the specific dimensional footprint of the target slice before assignment occurs.

To Be Continued...

Memory footprint

Scaling behaviour

Issue/s resolved: #914 #918

Changes proposed:

feature extension in __process_key, getitem, and setitem methods
edge case handling
extensive comparison to numpy API in unittests

Type of change

Memory requirements

Performance

Due Diligence

All split configurations tested
Multiple dtypes tested in relevant functions
Documentation updated (if needed)
Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

yes / no

skip ci

…ndexing

…y slice-indexing. UNTESTED

…sition in the index_proxy

…ays (#937) * Create ci.yaml * Update ci.yaml * Update ci.yaml * Create CITATION.cff * Update CITATION.cff * Update ci.yaml different python and pytorch versions * Update ci.yaml * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Delete pre-commit.yml * Update ci.yaml * Update CITATION.cff * Update tutorial.ipynb delete example with different split axis * Delete logo_heAT.pdf Removal of old logo * ht.nonzero() returns tuple of 1-D arrays instead of n-D arrays * Updated documentation and Unit-tests * replace x.larray with local_x * Code fixes * Fix return type of nonzero function and gout value * Made sure DNDarray meta-data is available to the tuple members * Transpose before if-branching + adjustments to accomodate it * Fixed global shape assignment * Updated changelog Co-authored-by: mtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Coquelin <[email protected]> Co-authored-by: Markus Goetz <[email protected]> Co-authored-by: Claudia Comito <[email protected]>

…pe and new split axis

…r boolean mask

…splits

…oltz-analytics/heat into 914_adv-indexing-outshape-outsplit

ClaudiaComito · 2026-03-18T09:43:32Z

pass key to process_key immediately, separate by view/copy output

JuanPedroGHM · 2026-03-18T10:32:01Z

Create different functions for scalar:

Scalar
Mask
Slice
Advanced indexing
Distributed key

Process key should return the tuple with the key information, and send to the separate functions.

Additionally, unnecessary function definitions in __getitem/__setitem.

Make table and diagram available in a markdown file in the docs folder for now.

brownbaerchen

I haven't looked at the actual advanced indexing stuff. But I think we would do well to clean up a bit before getting into the details of this. There have been some changes mixed in that, to me, seem unrelated to advanced indexing in heat. They should be moved to separate PRs if we want to keep them. Other changes are cosmetic or temporary and should be removed entirely.

brownbaerchen · 2026-04-21T15:26:04Z

Can we remove this change from this PR?

brownbaerchen · 2026-04-21T15:27:03Z

Let's remove this change from this PR (see #2216 for a similar case)

brownbaerchen · 2026-04-21T15:28:12Z

Not sure about these changes, but some them can be removed from this PR, right?

brownbaerchen · 2026-04-21T15:28:36Z

Let's remove this change from this PR.

brownbaerchen · 2026-04-21T15:29:12Z

Let's remove these changes from this PR.

brownbaerchen · 2026-04-21T15:30:33Z

These changes seem unrelated to advanced indexing. Maybe put in a separate PR.

brownbaerchen · 2026-04-21T15:31:31Z

Is this needed here?

brownbaerchen · 2026-04-21T15:32:51Z

Let's remove this from the PR

brownbaerchen · 2026-04-21T15:33:35Z

This seems unrelated to advanced indexing. Maybe move to a separate PR?

brownbaerchen · 2026-04-21T15:33:57Z

This seems unrelated to advanced indexing. Maybe move to separate PR?

ClaudiaComito and others added 11 commits February 17, 2022 13:40

Broken. __getitem__ refactoring in prep for distributed/non-ordered i…

445fc94

…ndexing

Preprocess key, workaround torch_proxy for advanced indexing, simplif…

6641d1e

…y slice-indexing. UNTESTED

put advanced index shape in the dimensions name to get the correct po…

cd78ecb

…sition in the index_proxy

first changes to setitem

7d97ea2

Expand __process_key() to address advanced indexing.

0c37abf

Address boolean indexing

b1508b9

separate advanced indexing on dim 0 from adv ind across dimensions

ae5af94

Merge branch 'main' into 914_adv-indexing-outshape-outsplit

ace900a

Replace sanitize_in with try:...except: construct

0a8cb35

nonzero(): do not assume input DNDarray is load-balanced

6c7c10a

Memory management

fb3524b

ClaudiaComito mentioned this pull request Mar 25, 2022

fix #925: ht.nonzero() returns tuple of 1-D arrays instead of n-D arrays #937

Merged

4 tasks

Mystic-Slice and others added 10 commits April 8, 2022 10:56

Merge branch 'main' into 914_adv-indexing-outshape-outsplit

8485a31

calculate output_shape, split axis bookkeeping for advanced indexing

a52e518

__process_key() to return expanded array, expanded key, output gsha…

5995639

…pe and new split axis

in , copy before manipulations

3830e62

nonzero() to return tuple of 1D arrays, stable distributed results

82b2508

update __process_key(), get rid of recursive calls, __getitem__ broken

aafaf99

deal with scalar key, local and distributed cases

b746872

test getitem separately, follow numpy Indexing on ndarray examples

00fe538

test for 0-dim DNDarray key

4360bd1

This was referenced Aug 30, 2022

[Bug]: Indexing with 0-dimensional key #1019

Closed

[Bug]: Slice error when array contains an axis of length 0 #1012

Closed

ClaudiaComito added 6 commits August 31, 2022 09:31

Expand __process_key() to deal with distributed boolean mask

231c1de

Expand test_getitem for distributed single-element indexing, non-dist…

f19f902

…r boolean mask

Add check for matching boolean index / indexed array shapes

7ed435f

Only sort result if input.split != 0

0da7f56

BROKEN: distributed boolean indexing to return stable result for all …

e55c7f9

…splits

Add tests for distributed boolean indexing

75d9314

brownbaerchen mentioned this pull request Feb 19, 2026

[Bug]: Unexpected masking behavior #2135

Open

brownbaerchen mentioned this pull request Mar 3, 2026

Fix bug in Heat.indexing.nonzero #2138

Merged

7 tasks

ClaudiaComito modified the milestones: 1.8.0, 1.9.0 Mar 3, 2026

ClaudiaComito added 3 commits March 18, 2026 06:05

Merge branch 'main' into 914_adv-indexing-outshape-outsplit

1c5090f

Merge branch '914_adv-indexing-outshape-outsplit' of github.com:helmh…

82fd6bf

…oltz-analytics/heat into 914_adv-indexing-outshape-outsplit

fix attribute assignment in DNDarray calls)

f445652

ClaudiaComito added 2 commits March 18, 2026 11:36

fix position of split argument

117bebb

add as_tuple argument for nonzero

a5c8788

brownbaerchen requested changes Apr 21, 2026

View reviewed changes

github-project-automation Bot moved this from Merge queue to In Progress in Roadmap Apr 21, 2026

brownbaerchen mentioned this pull request Apr 21, 2026

Expand indexing tests #2257

Open

ClaudiaComito added 16 commits May 6, 2026 10:59

Merge branch 'main' into 914_adv-indexing-outshape-outsplit

1afc837

move helper functions out of __getitem__/__setitem__

d680e25

move most key sanitation to __process_key()

1108548

unbind instead of torch.split if as_tuple

66c23ed

extract distr logic unordered key from getitem

0651339

extract distr logic for unordered value from setitem

0a748de

process_key returns namedtuple, getitem acts as dispatch

a945528

remove redundant _is_basic_component check

e525f28

reorganize dispatching order, remove redundant checks

1331187

refactor setitem - dispatch to appropriate helpers

15ee181

fix misidentification of adv ind as mask-like

a6ccbf9

disentangle local from distributed masking

fde5b60

refactor distributed boolean mask fast-path

6470339

fast-track local bool mask in tuple key

29c9885

do not fast-track mask getter with split>0

fe15b3f

fix types call

1564745

Conversation

ClaudiaComito commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Routing logic

Main changes

Memory footprint

Scaling behaviour

Changes proposed:

Type of change

Memory requirements

Performance

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

Uh oh!

ClaudiaComito commented Mar 18, 2026

Uh oh!

JuanPedroGHM commented Mar 18, 2026

Uh oh!

brownbaerchen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ClaudiaComito commented Mar 24, 2022 •

edited

Loading