Skip to content

[WIP] Optimize task and job viewsets#10446

Open
zhiltsov-max wants to merge 2 commits intodevelopfrom
zm/optimize-viewsets2
Open

[WIP] Optimize task and job viewsets#10446
zhiltsov-max wants to merge 2 commits intodevelopfrom
zm/optimize-viewsets2

Conversation

@zhiltsov-max
Copy link
Copy Markdown
Contributor

@zhiltsov-max zhiltsov-max commented Apr 2, 2026

Motivation and context

Improved DB queries in Task and Job endpoints. The main focus was to remove extra requests from the non-CRUD endpoints in these endpoint groups (e.g. /metadata, /annotations). The CRUD endpoints were also slightly optimized.

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

@zhiltsov-max zhiltsov-max requested a review from SpecLad as a code owner April 2, 2026 17:08
@zhiltsov-max
Copy link
Copy Markdown
Contributor Author

@archibald1418, you can check perf tests on this branch.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on reducing database query overhead in the Task and Job API endpoints by shifting heavier related-object loading away from base querysets (especially for list/count paths) and into page-scoped prefetching.

Changes:

  • Refactors TaskViewSet and JobViewSet querysets to avoid heavy joins/prefetches for non-CRUD endpoints and list COUNT queries.
  • Updates Task/Job metadata endpoints to reuse get_object() (permission-checked) and then populate relations via prefetch_related_objects, removing the extra .get(pk=...) query.
  • Adds page-scoped prefetching and lightweight bulk-loading in list serializers to prevent N+1 queries while keeping list COUNT queries cheaper.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
cvat/apps/engine/views.py Reworks viewset query construction and metadata endpoints to reduce redundant DB hits and limit heavy relation loading to where it’s needed.
cvat/apps/engine/serializers.py Adds page-only prefetching/bulk-loading in list serializers to avoid N+1 queries without inflating list COUNT queries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

'segment__task__project__annotation_guide',
)
queryset = Job.objects.select_related(
# prefetch data for permission checks
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment says "prefetch data for permission checks", but this queryset only uses select_related on assignee/segment/task/project and does not fetch task.data. Please update the comment to reflect what is actually loaded (or include task.data if that was the intent).

Suggested change
# prefetch data for permission checks
# load related objects commonly used by permission checks

Copilot uses AI. Check for mistakes.
Comment on lines +2497 to +2500
)
storages = {
s.id: s for s in models.Storage.objects.filter(id__in=page_storage_ids)
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

page_storage_ids includes None when a task has no source/target storage, which causes an unnecessary DB query (id__in with NULL). Filter out null IDs and skip the Storage.objects.filter(...) query entirely when the set is empty.

Suggested change
)
storages = {
s.id: s for s in models.Storage.objects.filter(id__in=page_storage_ids)
}
if v is not None
)
storages = (
{
s.id: s for s in models.Storage.objects.filter(id__in=page_storage_ids)
}
if page_storage_ids else {}
)

Copilot uses AI. Check for mistakes.
Comment on lines +1466 to 1489
def prefetch():
data_queryset = (
models.Data.objects
.select_related("validation_layout", "video")
.prefetch_related(
Prefetch(
'images',
queryset=(
models.Image.objects
.prefetch_related('related_files')
.order_by('frame')
)
)
)
)

prefetch_related_objects(
[db_task],
"segment_set",
Prefetch("data", queryset=data_queryset)
)

prefetch()

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested prefetch() function is defined and immediately invoked once. Inlining this logic would simplify control flow and make it clearer what queries are being executed for this endpoint.

Suggested change
def prefetch():
data_queryset = (
models.Data.objects
.select_related("validation_layout", "video")
.prefetch_related(
Prefetch(
'images',
queryset=(
models.Image.objects
.prefetch_related('related_files')
.order_by('frame')
)
)
)
)
prefetch_related_objects(
[db_task],
"segment_set",
Prefetch("data", queryset=data_queryset)
)
prefetch()
data_queryset = (
models.Data.objects
.select_related("validation_layout", "video")
.prefetch_related(
Prefetch(
'images',
queryset=(
models.Image.objects
.prefetch_related('related_files')
.order_by('frame')
)
)
)
)
prefetch_related_objects(
[db_task],
"segment_set",
Prefetch("data", queryset=data_queryset)
)

Copilot uses AI. Check for mistakes.
Comment on lines 1969 to +1990
@@ -1967,7 +1981,13 @@ def metadata(self, request: ExtendedRequest, pk: int):
)
)
)
).get(pk=pk)

prefetch_related_objects(
[db_job],
Prefetch("segment__task__data", queryset=data_queryset)
)

prefetch()
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: the nested prefetch() helper is only called once. Consider inlining to reduce indirection and keep the metadata endpoint logic easier to follow.

Copilot uses AI. Check for mistakes.
@archibald1418
Copy link
Copy Markdown
Contributor

perf testing:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants