-
Notifications
You must be signed in to change notification settings - Fork 18
Constant column check with nunique #119
Description
Try to respond to as many of the following as possible
Generally describe the pandas behavior that the linter should check for and why that is a problem. Links to resources, recommendations, docs appreciated
The linter should check for nunique being compared to 1. The detected pattern is less performant because it does not leverage short-circuiting when multiple unique values are found, and simply continues counting..
def setup(n):
return pd.Series(list(range(n)))def setup(n):
return pd.Series([1] * (n - 1) + [2])Suggest specific syntax or pattern(s) that should trigger the linter (e.g., .iat)
df.column.nunique() == 1df.column.nunique() != 1df.column.nunique(dropna=True) == 1df.column.nunique(dropna=True) != 1df.column.nunique(dropna=False) == 1df.column.nunique(dropna=False) != 1
Suggest specific syntax or pattern(s) that the linter should allow (e.g., .iloc)
Note that the solution is simple when there are no NaN values:
(series.values[0] == series.values).all()And needs some additional logic when NaN/NA values are present.
For dropna=True
v = series.values
v = remove_na_arraylike(v)
if v.shape[0] == 0:
return False
(v[0] == v).all()For dropna=False
v = s.values
if v.shape[0] == 0:
return False
(v[0] == v).all() or not pd.notna(v).any()if included in pandas:
series.is_constant()Suggest a specific error message that the linter should display (e.g., "Use '.iloc' instead of '.iat'. If speed is important, use numpy indexing")
Consider checking equality to first element instead of .nunique() == 1 for checking for a constant column.
Are you willing to try to implement this check?
- Yes
- No
- Maybe, with some guidance

