Skip to content

🚀 feat(celery-library): add Redis secondary indexes for O(log N) task queries#9025

Draft
giancarloromeo wants to merge 12 commits intoITISFoundation:masterfrom
giancarloromeo:feat/add-redis-secondary-indexes-for-task-queries-speedup
Draft

🚀 feat(celery-library): add Redis secondary indexes for O(log N) task queries#9025
giancarloromeo wants to merge 12 commits intoITISFoundation:masterfrom
giancarloromeo:feat/add-redis-secondary-indexes-for-task-queries-speedup

Conversation

@giancarloromeo
Copy link
Copy Markdown
Contributor

@giancarloromeo giancarloromeo commented Apr 14, 2026

What do these changes do?

This pull request refactors the way tasks are indexed and listed in the Redis backend for the celery library. The main improvement is the introduction of a ZSET-based indexing system for task ownership, which replaces the previous approach that relied on scanning Redis keys. This change significantly improves the efficiency and scalability of task listing operations.

Related issue/s

How to test

cd package/celery-library
make install-dev
pytest -vv --pdb tests/unit/test_redis_store.py

Dev-ops

No changes.

@giancarloromeo giancarloromeo added this to the Etna milestone Apr 14, 2026
@giancarloromeo giancarloromeo self-assigned this Apr 14, 2026
@giancarloromeo giancarloromeo changed the title 🚀 feat(celery-library): aadd Redis secondary indexes for O(log N) task queries 🚀 feat(celery-library): add Redis secondary indexes for O(log N) task queries Apr 14, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 84.84848% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.48%. Comparing base (16dd1a9) to head (1d2bf43).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9025      +/-   ##
==========================================
+ Coverage   87.35%   87.48%   +0.12%     
==========================================
  Files        2056     2044      -12     
  Lines       80971    80196     -775     
  Branches     1451     1387      -64     
==========================================
- Hits        70731    70156     -575     
+ Misses       9829     9650     -179     
+ Partials      411      390      -21     
Flag Coverage Δ
integrationtests 63.91% <100.00%> (+<0.01%) ⬆️
unittests 86.35% <84.84%> (+0.12%) ⬆️
Components Coverage Δ
pkg_aws_library 95.29% <ø> (ø)
pkg_celery_library ∅ <ø> (∅)
pkg_dask_task_models_library 79.65% <ø> (ø)
pkg_models_library 93.07% <100.00%> (+0.43%) ⬆️
pkg_notifications_library 84.48% <ø> (ø)
pkg_postgres_database 89.29% <ø> (ø)
pkg_service_integration 72.81% <ø> (ø)
pkg_service_library 70.47% <0.00%> (+0.06%) ⬆️
pkg_settings_library 90.48% <ø> (ø)
pkg_simcore_sdk 85.94% <ø> (ø)
agent 93.26% <ø> (ø)
api_server 91.39% <60.00%> (-0.02%) ⬇️
autoscaling 95.48% <ø> (ø)
catalog 92.10% <ø> (ø)
clusters_keeper 98.59% <ø> (ø)
dask_sidecar 91.74% <ø> (+0.21%) ⬆️
datcore_adapter 97.95% <ø> (ø)
director 75.45% <ø> (-0.09%) ⬇️
director_v2 91.65% <ø> (+0.05%) ⬆️
dynamic_scheduler 95.77% <ø> (-0.20%) ⬇️
dynamic_sidecar 87.50% <ø> (ø)
efs_guardian 89.73% <ø> (ø)
invitations 90.93% <ø> (ø)
payments 92.87% <ø> (ø)
resource_usage_tracker 92.87% <ø> (+0.10%) ⬆️
storage 86.72% <100.00%> (-0.15%) ⬇️
webclient ∅ <ø> (∅)
webserver 86.78% <100.00%> (+0.02%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16dd1a9...1d2bf43. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@giancarloromeo giancarloromeo requested a review from Copilot April 14, 2026 19:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Celery Redis backend task listing to use secondary indexes (Redis ZSETs) instead of SCAN, aiming to make task queries scale better as the number of tasks grows.

Changes:

  • Added owner-based ZSET index key construction and indexing on task/group creation.
  • Updated list_tasks() to query ZSET members and fetch task metadata via pipelined HGETs (with stale-member pruning).
  • Added unit tests verifying list_tasks() no longer uses scan_iter, supports wildcard filtering, and prunes stale index entries.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
packages/celery-library/src/celery_library/backends/_redis.py Introduces ZSET-based owner indexes and updates task listing/removal to use them instead of key scanning.
packages/celery-library/tests/unit/test_redis_store.py Adds unit tests covering ZSET-backed listing behavior, wildcard filtering, deletion cleanup, and stale index pruning.

Comment thread packages/celery-library/tests/unit/test_redis_store.py Outdated
Comment on lines +131 to +141
index_score = datetime.now(tz=UTC).timestamp()

pipe = self._redis_client_sdk.redis.pipeline()
pipe.hset(
name=redis_key,
key=_CELERY_TASK_EXEC_METADATA_KEY,
value=execution_metadata.model_dump_json(),
)
for index_key in _build_redis_owner_index_keys_for_task(task_key):
pipe.zadd(index_key, {task_key: index_score})
await pipe.execute()
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZSET index keys/members are never given a TTL, so when the task hash expires naturally the index entries can remain indefinitely unless list_tasks() happens to be called (and only for that exact queried index key). This risks unbounded Redis memory growth over time.

Consider encoding an expiry timestamp in the ZSET score (e.g. now + expiry) and pruning via ZREMRANGEBYSCORE during listing, and/or applying/refreshing an EXPIRE on the index keys when adding members so old indexes can age out even if list_tasks() is never called.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My friend, your're absolutely right. Claude is the guilty.

Comment on lines +72 to +85
def _build_redis_owner_index_keys_for_task(task_or_group_key: TaskKey | GroupKey) -> list[str]:
owner_tokens = [
token.split("=", maxsplit=1) for token in _without_uuid_token(task_or_group_key).split(_CELERY_TASK_DELIMTATOR)
]
wildcard_value = json_dumps(WILDCARD)

keys: list[str] = []
for mask in product((False, True), repeat=len(owner_tokens)):
query_owner_key = _CELERY_TASK_DELIMTATOR.join(
f"{key}={wildcard_value if use_wildcard else value}"
for (key, value), use_wildcard in zip(owner_tokens, mask, strict=True)
)
keys.append(_build_redis_owner_index_key(query_owner_key))
return keys
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_build_redis_owner_index_keys_for_task creates 2**N index keys per task where N is the number of owner-metadata fields (excluding uuid). Since OwnerMetadata supports arbitrary extra fields, this can become very expensive in both write amplification and total Redis key count for schemas with more than a handful of fields.

It would help to either (a) bound the number of indexed fields / supported wildcard fields, or (b) switch to an indexing strategy that doesn't require generating all wildcard combinations (e.g. index only specific dimensions that are actually queried).

Copilot uses AI. Check for mistakes.
Comment on lines 108 to 112
for index_key in _build_redis_owner_index_keys_for_task(group_key):
pipe.zadd(index_key, {group_key: index_score})

# group tasks
# group sub-tasks: store hash only, no ZSET index (filtered out in list_tasks)
for task_key, (task_execution_metadata, _) in zip(task_keys, execution_metadata.tasks, strict=True):
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says group sub-tasks have "no ZSET index", but in the current flow TaskManager.submit_group() calls RedisTaskStore.create_task() for each sub-task before create_group(), and create_task() now adds every task key to the ZSET indexes. That means group sub-tasks will still be indexed (and only filtered out later via ExecutorType.GROUP_TASK), which adds a lot of extra ZSET writes for large groups.

If the intent is to avoid indexing group sub-tasks, consider adding a way to skip indexing in create_task() (or a dedicated method for group sub-tasks) and/or removing sub-task entries from the indexes once they’re re-labeled as GROUP_TASK.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

@giancarloromeo giancarloromeo added the t:maintenance Some planned maintenance work label Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a:celery-library t:maintenance Some planned maintenance work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants