Extend xarray method coverage on Variable / LinearExpression

## Summary

`Variable`, `LinearExpression`/`QuadraticExpression` and `Constraint` each expose a hand-picked subset of `xarray.Dataset` methods (via `varwrap` / `exprwrap` / `conwrap`). The three lists have drifted apart, several common arithmetic methods are missing entirely, and a few existing ones diverge from xarray — especially around datetime indexes.

This issue collects the analysis and a design sketch for closing the gaps. All of the proposed methods preserve linearity (none multiplies two variables), so they are mostly bookkeeping.

---

## A. Cross-class inconsistencies (existing methods)

The same wrapper is exposed on some classes but not others, with no clear reason:

| method | Variable | Expression | Constraint |
|---|---|---|---|
| `compute` | yes | no | no |
| `chunk` (dask) | no | yes | yes |
| `reindex` / `reindex_like` | no | yes | yes |
| `astype` | no | yes | no |
| `drop` / `drop_vars` | no | yes | no |
| `reset_index` | no | yes | no |
| `rename_dims` | no | yes | yes |
| `shift` fill_value default | yes (`variables.py:1268`) | no (`expressions.py:1474`) | no |

The two that actually bite users:

- **`Variable.reindex` / `reindex_like` missing** — you can reindex an expression or constraint onto a master index but not a variable. Natural operation for datetime work (aligning a variable to a full snapshot index).
- **`Expression.shift` / `Constraint.shift` have no `fill_value`** while `Variable.shift` defaults to the linopy fill value. Shifted slots get `coeffs=NaN`, `vars=-1`; it mostly survives downstream `fillna(0)`, but it is the root of the `diff` issue below.

## B. Existing methods lacking features vs. xarray (datetime focus)

1. **`diff` (`expressions.py:1238`)** — implemented as `self - self.shift({dim: n})`. Two divergences from `xarray.diff`:
   - **Length**: xarray trims the dimension to `N-n`; linopy keeps `N`, so the first `n` rows are spurious (`self[0] - empty`, a leading garbage term still carrying the original coordinate label).
   - **No `label` argument** — xarray has `label="upper"|"lower"`; linopy is hard-wired to `"upper"`.
2. **`groupby` (`expressions.py:1258`)** — typed only for `DataArray`/`Series`/`DataFrame`. The string-accessor form (`groupby("time.month")`, `"time.season"`) only reaches the slow fallback; the fast reindex summation (`expressions.py:229`) requires `Series`/`DataFrame`/`DataArray`. `groupby_bins` not exposed. `LinearExpressionGroupby` only implements `.sum()` / `.map()` / `.roll()` — no `.mean()` / `.first()` / `.last()` / `.count()`.
3. **`rolling` (`expressions.py:1289`)** — `LinearExpressionRolling` only implements `.sum()` (`expressions.py:307`). No `.mean()`, no `.construct()`. Integer-window only.
4. **No frequency-aware `shift`** — neither linopy nor xarray support `shift(time="1D")`; pandas does. See the datetime-shift section below.

`sel`/`isel` delegate cleanly, so datetime partial-string indexing (`sel(time="2030")`, `method="nearest"`) already works.

## C. Missing methods, ranked by usefulness

Only linearity-preserving methods are candidates. Excluded as non-linear/meaningless: `max`, `min`, `median`, `std`, `var`, `prod`, `cumprod`, `quantile`, `rank`, `idxmax/idxmin`, `argmax/argmin`, `clip`.

**Tier 1 — high value**
- `mean` — linear (`sum/n`), ubiquitous, missing on both `Variable` and `LinearExpression`.
- `resample` — datetime aggregation (hourly to daily). Core energy-modeling need. Absent.
- `reindex` / `reindex_like` on `Variable` — fixes the asymmetry in A.
- `weighted` — `expr.weighted(w).sum()`; maps onto PyPSA `snapshot_weightings`.

**Tier 2 — useful**
- `coarsen` — positional block aggregation (`time=24`).
- `rolling().mean()` and `groupby().mean()` — extend the existing reducers.
- `transpose` — missing on both; needs care to keep `_term`/`_factor` last.
- `compute` on Expression, `chunk` on Variable — close the dask asymmetry from A.
- `dropna` — drop missing coordinate slices.

**Tier 3 — nice-to-have / niche**
- `astype` / `reset_index` / `rename_dims` on `Variable` (consistency).
- `sortby`, `squeeze`, `head`/`tail`/`thin`, `pad`.
- `interp` / `interp_like` (genuinely linear but niche).
- `groupby_bins`.

---

## Implementation design notes (Tier 1 + Tier 2)

### Why none of this is hard

A linopy linear expression is just **a list of terms plus a constant** — `3*x[a] + 2*x[b] + 5` — stored as three aligned arrays: `coeffs`, `vars` (integer labels), `const`.

Every method below does one of three harmless things to that list:

1. **Regroup terms** — collect terms from several cells into one. (`sum`, `resample`, `coarsen`, `groupby`)
2. **Copy terms** — the same term appears in several output cells. (`rolling`)
3. **Move or drop whole cells** — pick, fill, or discard cells without touching the terms inside. (`reindex`, `shift`, `dropna`, `transpose`)

and some also **rescale** — multiply every coefficient and the constant by a number/array. (`mean`, `weighted`, the `.mean()` variants)

None of this multiplies two variables together, so the result is always still a valid linear (or quadratic) expression. linopy already owns every building block: regrouping is the term-stacking trick behind `.sum()`; copying is the window trick behind `.rolling().sum()`; moving cells is ordinary `.sel()`/`.reindex()`; rescaling is ordinary `*`/`/`.

### Tier 1

- **`mean`** = `sum` divided by how many things were summed: `gen.mean("time") == gen.sum("time") / 3`. Divide by the count of **non-missing** entries (matches xarray `skipna=True` and linopy's own `sum`, which already skips `-1` variables). All-missing slice -> `0/0 = NaN`, as xarray.
- **`resample`** aggregates a datetime axis by calendar period. Each period's expression = sum of that period's terms = a `groupby` keyed by "which period". Ask pandas for the period label of each timestamp, reuse the existing `groupby(...).sum()` fast path; `.mean()` divides by the count. Decide: keep empty periods as `0` (xarray parity) or drop them. Forward `closed`/`label`/`origin`.
- **`reindex` / `reindex_like` (Variable)** — new slots get the missing-sentinel (`labels=-1`, bounds `NaN`), same fill Expression/Constraint already use. One line each: `reindex = varwrap(Dataset.reindex, fill_value=FILL_VALUE)`.
- **`weighted`** — `gen.weighted(w).sum("time") == (gen * w).sum("time")`. Pure sugar over `*` and `sum`; `.mean()` also divides by `w.sum()`. Just a small wrapper object.

### Tier 2

- **`coarsen`** — `resample`'s positional cousin (group every N rows). Reshape into blocks via xarray's `coarsen().construct()`, then sum each block; `.mean()` divides by block size. Handle `boundary="trim"|"pad"`.
- **`rolling().mean()` / `groupby().mean()`** — rolling/groupby sum already exist; `.mean()` divides by the window size / group size (or the valid count at window edges).
- **`transpose`** — pure axis reorder, no arithmetic. Must reorder only real dims and keep the internal `_term`/`_factor` last. Plain wrap for `Variable`.
- **`compute` / `chunk`** — no math, dask plumbing; add only to make the three classes consistent.
- **`dropna`** — drops "missing" coordinate slices. "Missing" in linopy is **not** raw `NaN`: for `Variable` it is `labels == -1`; for an expression it is `isnull()` (all terms empty AND `const` NaN). Build the drop-mask from `isnull()`.

### Datetime-aware `shift`

Today `shift` is integer/positional only — `shift(time=1)` moves everything one row. Fine for a regular hourly index, used for storage balance `soc[t] == soc[t-1] + charge[t]`. But on **irregular** snapshots (variable time resolution, clustered representative periods) "one row" is not "one hour". There are three distinct operators:

1. **Integer shift** — what exists today.
2. **Time shift** — move by an actual duration, `shift(time="1h")`: for each timestamp find the row exactly one hour earlier; no such row -> missing. The correct operator for irregular grids and what storage/ramping constraints want. Basically a `reindex` onto "time minus offset" labels, so nearly free once `Variable` has `reindex`. Collapses to integer shift on a regular index.
3. **Index shift** — keep the data, relabel the time axis by an offset (pandas `shift(freq=...)`).

pandas handles month-end/DST for offsets like `"1ME"`. Adding datetime shift is also a good moment to fix the missing `fill_value` on `Expression.shift` / `Constraint.shift` (A above).

---

## Open questions

- `mean`: divide by the non-missing count, or by the raw length?
- `resample` / `coarsen`: keep empty periods, or drop them?
- `rolling().mean()` at the edges: divide by the window size, or by the valid count?
- `shift(time="1h")`: time shift or index shift (or expose both)?
- Add the new methods to `QuadraticExpression` and `Constraint` too, or just `Variable` / `LinearExpression`?
- Should this ship as one PR per tier, or per method?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend xarray method coverage on Variable / LinearExpression #703

Summary

A. Cross-class inconsistencies (existing methods)

B. Existing methods lacking features vs. xarray (datetime focus)

C. Missing methods, ranked by usefulness

Implementation design notes (Tier 1 + Tier 2)

Why none of this is hard

Tier 1

Tier 2

Datetime-aware `shift`

Open questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

method	Variable	Expression	Constraint
`compute`	yes	no	no
`chunk` (dask)	no	yes	yes
`reindex` / `reindex_like`	no	yes	yes
`astype`	no	yes	no
`drop` / `drop_vars`	no	yes	no
`reset_index`	no	yes	no
`rename_dims`	no	yes	yes
`shift` fill_value default	yes (`variables.py:1268`)	no (`expressions.py:1474`)	no

Extend xarray method coverage on Variable / LinearExpression #703

Description

Summary

A. Cross-class inconsistencies (existing methods)

B. Existing methods lacking features vs. xarray (datetime focus)

C. Missing methods, ranked by usefulness

Implementation design notes (Tier 1 + Tier 2)

Why none of this is hard

Tier 1

Tier 2

Datetime-aware shift

Open questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Datetime-aware `shift`