Summary
Variable, LinearExpression/QuadraticExpression and Constraint each expose a hand-picked subset of xarray.Dataset methods (via varwrap / exprwrap / conwrap). The three lists have drifted apart, several common arithmetic methods are missing entirely, and a few existing ones diverge from xarray — especially around datetime indexes.
This issue collects the analysis and a design sketch for closing the gaps. All of the proposed methods preserve linearity (none multiplies two variables), so they are mostly bookkeeping.
A. Cross-class inconsistencies (existing methods)
The same wrapper is exposed on some classes but not others, with no clear reason:
| method |
Variable |
Expression |
Constraint |
compute |
yes |
no |
no |
chunk (dask) |
no |
yes |
yes |
reindex / reindex_like |
no |
yes |
yes |
astype |
no |
yes |
no |
drop / drop_vars |
no |
yes |
no |
reset_index |
no |
yes |
no |
rename_dims |
no |
yes |
yes |
shift fill_value default |
yes (variables.py:1268) |
no (expressions.py:1474) |
no |
The two that actually bite users:
Variable.reindex / reindex_like missing — you can reindex an expression or constraint onto a master index but not a variable. Natural operation for datetime work (aligning a variable to a full snapshot index).
Expression.shift / Constraint.shift have no fill_value while Variable.shift defaults to the linopy fill value. Shifted slots get coeffs=NaN, vars=-1; it mostly survives downstream fillna(0), but it is the root of the diff issue below.
B. Existing methods lacking features vs. xarray (datetime focus)
diff (expressions.py:1238) — implemented as self - self.shift({dim: n}). Two divergences from xarray.diff:
- Length: xarray trims the dimension to
N-n; linopy keeps N, so the first n rows are spurious (self[0] - empty, a leading garbage term still carrying the original coordinate label).
- No
label argument — xarray has label="upper"|"lower"; linopy is hard-wired to "upper".
groupby (expressions.py:1258) — typed only for DataArray/Series/DataFrame. The string-accessor form (groupby("time.month"), "time.season") only reaches the slow fallback; the fast reindex summation (expressions.py:229) requires Series/DataFrame/DataArray. groupby_bins not exposed. LinearExpressionGroupby only implements .sum() / .map() / .roll() — no .mean() / .first() / .last() / .count().
rolling (expressions.py:1289) — LinearExpressionRolling only implements .sum() (expressions.py:307). No .mean(), no .construct(). Integer-window only.
- No frequency-aware
shift — neither linopy nor xarray support shift(time="1D"); pandas does. See the datetime-shift section below.
sel/isel delegate cleanly, so datetime partial-string indexing (sel(time="2030"), method="nearest") already works.
C. Missing methods, ranked by usefulness
Only linearity-preserving methods are candidates. Excluded as non-linear/meaningless: max, min, median, std, var, prod, cumprod, quantile, rank, idxmax/idxmin, argmax/argmin, clip.
Tier 1 — high value
mean — linear (sum/n), ubiquitous, missing on both Variable and LinearExpression.
resample — datetime aggregation (hourly to daily). Core energy-modeling need. Absent.
reindex / reindex_like on Variable — fixes the asymmetry in A.
weighted — expr.weighted(w).sum(); maps onto PyPSA snapshot_weightings.
Tier 2 — useful
coarsen — positional block aggregation (time=24).
rolling().mean() and groupby().mean() — extend the existing reducers.
transpose — missing on both; needs care to keep _term/_factor last.
compute on Expression, chunk on Variable — close the dask asymmetry from A.
dropna — drop missing coordinate slices.
Tier 3 — nice-to-have / niche
astype / reset_index / rename_dims on Variable (consistency).
sortby, squeeze, head/tail/thin, pad.
interp / interp_like (genuinely linear but niche).
groupby_bins.
Implementation design notes (Tier 1 + Tier 2)
Why none of this is hard
A linopy linear expression is just a list of terms plus a constant — 3*x[a] + 2*x[b] + 5 — stored as three aligned arrays: coeffs, vars (integer labels), const.
Every method below does one of three harmless things to that list:
- Regroup terms — collect terms from several cells into one. (
sum, resample, coarsen, groupby)
- Copy terms — the same term appears in several output cells. (
rolling)
- Move or drop whole cells — pick, fill, or discard cells without touching the terms inside. (
reindex, shift, dropna, transpose)
and some also rescale — multiply every coefficient and the constant by a number/array. (mean, weighted, the .mean() variants)
None of this multiplies two variables together, so the result is always still a valid linear (or quadratic) expression. linopy already owns every building block: regrouping is the term-stacking trick behind .sum(); copying is the window trick behind .rolling().sum(); moving cells is ordinary .sel()/.reindex(); rescaling is ordinary *//.
Tier 1
mean = sum divided by how many things were summed: gen.mean("time") == gen.sum("time") / 3. Divide by the count of non-missing entries (matches xarray skipna=True and linopy's own sum, which already skips -1 variables). All-missing slice -> 0/0 = NaN, as xarray.
resample aggregates a datetime axis by calendar period. Each period's expression = sum of that period's terms = a groupby keyed by "which period". Ask pandas for the period label of each timestamp, reuse the existing groupby(...).sum() fast path; .mean() divides by the count. Decide: keep empty periods as 0 (xarray parity) or drop them. Forward closed/label/origin.
reindex / reindex_like (Variable) — new slots get the missing-sentinel (labels=-1, bounds NaN), same fill Expression/Constraint already use. One line each: reindex = varwrap(Dataset.reindex, fill_value=FILL_VALUE).
weighted — gen.weighted(w).sum("time") == (gen * w).sum("time"). Pure sugar over * and sum; .mean() also divides by w.sum(). Just a small wrapper object.
Tier 2
coarsen — resample's positional cousin (group every N rows). Reshape into blocks via xarray's coarsen().construct(), then sum each block; .mean() divides by block size. Handle boundary="trim"|"pad".
rolling().mean() / groupby().mean() — rolling/groupby sum already exist; .mean() divides by the window size / group size (or the valid count at window edges).
transpose — pure axis reorder, no arithmetic. Must reorder only real dims and keep the internal _term/_factor last. Plain wrap for Variable.
compute / chunk — no math, dask plumbing; add only to make the three classes consistent.
dropna — drops "missing" coordinate slices. "Missing" in linopy is not raw NaN: for Variable it is labels == -1; for an expression it is isnull() (all terms empty AND const NaN). Build the drop-mask from isnull().
Datetime-aware shift
Today shift is integer/positional only — shift(time=1) moves everything one row. Fine for a regular hourly index, used for storage balance soc[t] == soc[t-1] + charge[t]. But on irregular snapshots (variable time resolution, clustered representative periods) "one row" is not "one hour". There are three distinct operators:
- Integer shift — what exists today.
- Time shift — move by an actual duration,
shift(time="1h"): for each timestamp find the row exactly one hour earlier; no such row -> missing. The correct operator for irregular grids and what storage/ramping constraints want. Basically a reindex onto "time minus offset" labels, so nearly free once Variable has reindex. Collapses to integer shift on a regular index.
- Index shift — keep the data, relabel the time axis by an offset (pandas
shift(freq=...)).
pandas handles month-end/DST for offsets like "1ME". Adding datetime shift is also a good moment to fix the missing fill_value on Expression.shift / Constraint.shift (A above).
Open questions
mean: divide by the non-missing count, or by the raw length?
resample / coarsen: keep empty periods, or drop them?
rolling().mean() at the edges: divide by the window size, or by the valid count?
shift(time="1h"): time shift or index shift (or expose both)?
- Add the new methods to
QuadraticExpression and Constraint too, or just Variable / LinearExpression?
- Should this ship as one PR per tier, or per method?
Summary
Variable,LinearExpression/QuadraticExpressionandConstrainteach expose a hand-picked subset ofxarray.Datasetmethods (viavarwrap/exprwrap/conwrap). The three lists have drifted apart, several common arithmetic methods are missing entirely, and a few existing ones diverge from xarray — especially around datetime indexes.This issue collects the analysis and a design sketch for closing the gaps. All of the proposed methods preserve linearity (none multiplies two variables), so they are mostly bookkeeping.
A. Cross-class inconsistencies (existing methods)
The same wrapper is exposed on some classes but not others, with no clear reason:
computechunk(dask)reindex/reindex_likeastypedrop/drop_varsreset_indexrename_dimsshiftfill_value defaultvariables.py:1268)expressions.py:1474)The two that actually bite users:
Variable.reindex/reindex_likemissing — you can reindex an expression or constraint onto a master index but not a variable. Natural operation for datetime work (aligning a variable to a full snapshot index).Expression.shift/Constraint.shifthave nofill_valuewhileVariable.shiftdefaults to the linopy fill value. Shifted slots getcoeffs=NaN,vars=-1; it mostly survives downstreamfillna(0), but it is the root of thediffissue below.B. Existing methods lacking features vs. xarray (datetime focus)
diff(expressions.py:1238) — implemented asself - self.shift({dim: n}). Two divergences fromxarray.diff:N-n; linopy keepsN, so the firstnrows are spurious (self[0] - empty, a leading garbage term still carrying the original coordinate label).labelargument — xarray haslabel="upper"|"lower"; linopy is hard-wired to"upper".groupby(expressions.py:1258) — typed only forDataArray/Series/DataFrame. The string-accessor form (groupby("time.month"),"time.season") only reaches the slow fallback; the fast reindex summation (expressions.py:229) requiresSeries/DataFrame/DataArray.groupby_binsnot exposed.LinearExpressionGroupbyonly implements.sum()/.map()/.roll()— no.mean()/.first()/.last()/.count().rolling(expressions.py:1289) —LinearExpressionRollingonly implements.sum()(expressions.py:307). No.mean(), no.construct(). Integer-window only.shift— neither linopy nor xarray supportshift(time="1D"); pandas does. See the datetime-shift section below.sel/iseldelegate cleanly, so datetime partial-string indexing (sel(time="2030"),method="nearest") already works.C. Missing methods, ranked by usefulness
Only linearity-preserving methods are candidates. Excluded as non-linear/meaningless:
max,min,median,std,var,prod,cumprod,quantile,rank,idxmax/idxmin,argmax/argmin,clip.Tier 1 — high value
mean— linear (sum/n), ubiquitous, missing on bothVariableandLinearExpression.resample— datetime aggregation (hourly to daily). Core energy-modeling need. Absent.reindex/reindex_likeonVariable— fixes the asymmetry in A.weighted—expr.weighted(w).sum(); maps onto PyPSAsnapshot_weightings.Tier 2 — useful
coarsen— positional block aggregation (time=24).rolling().mean()andgroupby().mean()— extend the existing reducers.transpose— missing on both; needs care to keep_term/_factorlast.computeon Expression,chunkon Variable — close the dask asymmetry from A.dropna— drop missing coordinate slices.Tier 3 — nice-to-have / niche
astype/reset_index/rename_dimsonVariable(consistency).sortby,squeeze,head/tail/thin,pad.interp/interp_like(genuinely linear but niche).groupby_bins.Implementation design notes (Tier 1 + Tier 2)
Why none of this is hard
A linopy linear expression is just a list of terms plus a constant —
3*x[a] + 2*x[b] + 5— stored as three aligned arrays:coeffs,vars(integer labels),const.Every method below does one of three harmless things to that list:
sum,resample,coarsen,groupby)rolling)reindex,shift,dropna,transpose)and some also rescale — multiply every coefficient and the constant by a number/array. (
mean,weighted, the.mean()variants)None of this multiplies two variables together, so the result is always still a valid linear (or quadratic) expression. linopy already owns every building block: regrouping is the term-stacking trick behind
.sum(); copying is the window trick behind.rolling().sum(); moving cells is ordinary.sel()/.reindex(); rescaling is ordinary*//.Tier 1
mean=sumdivided by how many things were summed:gen.mean("time") == gen.sum("time") / 3. Divide by the count of non-missing entries (matches xarrayskipna=Trueand linopy's ownsum, which already skips-1variables). All-missing slice ->0/0 = NaN, as xarray.resampleaggregates a datetime axis by calendar period. Each period's expression = sum of that period's terms = agroupbykeyed by "which period". Ask pandas for the period label of each timestamp, reuse the existinggroupby(...).sum()fast path;.mean()divides by the count. Decide: keep empty periods as0(xarray parity) or drop them. Forwardclosed/label/origin.reindex/reindex_like(Variable) — new slots get the missing-sentinel (labels=-1, boundsNaN), same fill Expression/Constraint already use. One line each:reindex = varwrap(Dataset.reindex, fill_value=FILL_VALUE).weighted—gen.weighted(w).sum("time") == (gen * w).sum("time"). Pure sugar over*andsum;.mean()also divides byw.sum(). Just a small wrapper object.Tier 2
coarsen—resample's positional cousin (group every N rows). Reshape into blocks via xarray'scoarsen().construct(), then sum each block;.mean()divides by block size. Handleboundary="trim"|"pad".rolling().mean()/groupby().mean()— rolling/groupby sum already exist;.mean()divides by the window size / group size (or the valid count at window edges).transpose— pure axis reorder, no arithmetic. Must reorder only real dims and keep the internal_term/_factorlast. Plain wrap forVariable.compute/chunk— no math, dask plumbing; add only to make the three classes consistent.dropna— drops "missing" coordinate slices. "Missing" in linopy is not rawNaN: forVariableit islabels == -1; for an expression it isisnull()(all terms empty ANDconstNaN). Build the drop-mask fromisnull().Datetime-aware
shiftToday
shiftis integer/positional only —shift(time=1)moves everything one row. Fine for a regular hourly index, used for storage balancesoc[t] == soc[t-1] + charge[t]. But on irregular snapshots (variable time resolution, clustered representative periods) "one row" is not "one hour". There are three distinct operators:shift(time="1h"): for each timestamp find the row exactly one hour earlier; no such row -> missing. The correct operator for irregular grids and what storage/ramping constraints want. Basically areindexonto "time minus offset" labels, so nearly free onceVariablehasreindex. Collapses to integer shift on a regular index.shift(freq=...)).pandas handles month-end/DST for offsets like
"1ME". Adding datetime shift is also a good moment to fix the missingfill_valueonExpression.shift/Constraint.shift(A above).Open questions
mean: divide by the non-missing count, or by the raw length?resample/coarsen: keep empty periods, or drop them?rolling().mean()at the edges: divide by the window size, or by the valid count?shift(time="1h"): time shift or index shift (or expose both)?QuadraticExpressionandConstrainttoo, or justVariable/LinearExpression?