Skip to content

Constant column check with nunique #119

@sbrugman

Description

@sbrugman

Try to respond to as many of the following as possible

Generally describe the pandas behavior that the linter should check for and why that is a problem. Links to resources, recommendations, docs appreciated

The linter should check for nunique being compared to 1. The detected pattern is less performant because it does not leverage short-circuiting when multiple unique values are found, and simply continues counting..

perf_short_circuit

def setup(n):
    return pd.Series(list(range(n)))

perf_worst

def setup(n):
    return pd.Series([1] * (n - 1) + [2])

Suggest specific syntax or pattern(s) that should trigger the linter (e.g., .iat)

  • df.column.nunique() == 1
  • df.column.nunique() != 1
  • df.column.nunique(dropna=True) == 1
  • df.column.nunique(dropna=True) != 1
  • df.column.nunique(dropna=False) == 1
  • df.column.nunique(dropna=False) != 1

Suggest specific syntax or pattern(s) that the linter should allow (e.g., .iloc)

Note that the solution is simple when there are no NaN values:

(series.values[0] == series.values).all()

And needs some additional logic when NaN/NA values are present.

For dropna=True

v = series.values
v = remove_na_arraylike(v)
if v.shape[0] == 0:
    return False
(v[0] == v).all()

For dropna=False

v = s.values
if v.shape[0] == 0:
    return False
(v[0] == v).all() or not pd.notna(v).any()

if included in pandas:

series.is_constant()

Suggest a specific error message that the linter should display (e.g., "Use '.iloc' instead of '.iat'. If speed is important, use numpy indexing")

Consider checking equality to first element instead of .nunique() == 1 for checking for a constant column.

Are you willing to try to implement this check?

  • Yes
  • No
  • Maybe, with some guidance

Metadata

Metadata

Assignees

No one assigned

    Labels

    new checkNew check for the linter

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions