Constant column check with `nunique`

*Try to respond to as many of the following as possible*

**Generally describe the `pandas` behavior that the linter should check for and why that is a problem.**  *Links to resources, recommendations, docs appreciated*

The linter should check for nunique being compared to 1. The detected pattern is less performant because it does not leverage short-circuiting when multiple unique values are found, and simply continues counting..

![perf_short_circuit](https://github.com/deppen8/pandas-vet/assets/9756388/4349e85d-0cf2-4ea9-acc0-4e2fb545fdb8)

```python
def setup(n):
    return pd.Series(list(range(n)))
```

![perf_worst](https://github.com/deppen8/pandas-vet/assets/9756388/6f283921-8bac-4ac2-a0ee-72e3c215b80c)

```python
def setup(n):
    return pd.Series([1] * (n - 1) + [2])
```

- [In the wild](https://github.com/search?q=%22nunique%28%29+%3D%3D+1%22+language%3APython+&type=code)
- [SO Thread](https://stackoverflow.com/questions/54405704/check-if-all-values-in-dataframe-column-are-the-same), [SO Thread 2](https://stackoverflow.com/questions/20209600/pandas-dataframe-remove-constant-column)

**Suggest specific syntax or pattern(s) that should trigger the linter** *(e.g., `.iat`)*

- `df.column.nunique() == 1`
- `df.column.nunique() != 1`
- `df.column.nunique(dropna=True) == 1`
- `df.column.nunique(dropna=True) != 1`
- `df.column.nunique(dropna=False) == 1`
- `df.column.nunique(dropna=False) != 1`
 
**Suggest specific syntax or pattern(s) that the linter should allow** *(e.g., `.iloc`)*

Note that the solution is simple when there are no NaN values:
```python
(series.values[0] == series.values).all()
```

And needs some additional logic when NaN/NA values are present.

For `dropna=True`
```python
v = series.values
v = remove_na_arraylike(v)
if v.shape[0] == 0:
    return False
(v[0] == v).all()
```

For `dropna=False`
```python
v = s.values
if v.shape[0] == 0:
    return False
(v[0] == v).all() or not pd.notna(v).any()
```

[if included](https://github.com/pandas-dev/pandas/issues/54033) in pandas:
```python
series.is_constant()
```

**Suggest a specific error message that the linter should display** *(e.g., "Use '.iloc' instead of '.iat'. If speed is important, use numpy indexing")*

Consider checking equality to first element instead of `.nunique() == 1`  for checking for a constant column.

**Are you willing to try to implement this check?**
- [ ] Yes
- [x] No
- [ ] Maybe, with some guidance


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant column check with `nunique` #119

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Constant column check with nunique #119

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Constant column check with `nunique` #119