Skip to content

Perf: Replace O(n²) lookups with Dictionary in CategoryAxis.GroupData#371

Open
PaulAndersonS wants to merge 1 commit into
mainfrom
paulandersons/curly-funicular
Open

Perf: Replace O(n²) lookups with Dictionary in CategoryAxis.GroupData#371
PaulAndersonS wants to merge 1 commit into
mainfrom
paulandersons/curly-funicular

Conversation

@PaulAndersonS
Copy link
Copy Markdown
Collaborator

Root Cause of the Issue

The CategoryAxis.GroupData() method had two O(n²) performance bottlenecks:

  1. List.Contains() in a loop (line 45): For each x-value, groupingValues.Contains() performs a linear scan of the list to check for duplicates — O(n) per call × n calls = O(n²).

  2. List.IndexOf() in a LINQ query (lines 73, 77): For each value in a series, distinctXValues.IndexOf(val) performs a linear search — O(n) per call × n calls = O(n²).

For charts with large datasets (thousands of data points), these patterns cause noticeable delays during chart initialization and data grouping.

Description of Change

  • Replaced List.Contains() with HashSet.Add() — provides O(1) amortized lookup for deduplication while maintaining insertion order via the parallel list.
  • Built a Dictionary<string, int> index lookup — maps each distinct x-value to its index for O(1) lookups instead of O(n) List.IndexOf() calls.
  • Used explicit pattern matching for ActualXValues type checks (is List<double> instead of as List<double>).
  • Pre-allocated lists with known capacity to reduce reallocations.

Complexity improvement: O(n²) → O(n) for the grouping operation.

Unit Tests Added

5 new unit tests covering:

  • Duplicate values produce correct indexes
  • Double (numeric) x-values are handled correctly
  • Multiple series with overlapping values deduplicate properly
  • Single series produces sequential indexes
  • Large dataset (1000+ items) completes with correct results

Issues Fixed

N/A — Proactive performance improvement

Screenshots

N/A — No visual changes

The GroupData() method used List.Contains() and List.IndexOf() inside
loops, resulting in O(n²) time complexity for data grouping. This is
especially impactful for charts with large datasets.

Changes:
- Replace List.Contains() with HashSet.Add() for O(1) deduplication
- Build a Dictionary<string, int> for O(1) index lookups instead of
  O(n) List.IndexOf() calls per element
- Use explicit null-safe pattern matching for ActualXValues
- Pre-allocate lists with known capacity

Added 5 unit tests covering:
- Duplicate values
- Double (numeric) X values
- Multiple series with overlapping values
- Single series
- Large dataset (1000+ items)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant