Conversation
|
@archibald1418, you can check perf tests on this branch. |
There was a problem hiding this comment.
Pull request overview
This PR focuses on reducing database query overhead in the Task and Job API endpoints by shifting heavier related-object loading away from base querysets (especially for list/count paths) and into page-scoped prefetching.
Changes:
- Refactors
TaskViewSetandJobViewSetquerysets to avoid heavy joins/prefetches for non-CRUD endpoints and list COUNT queries. - Updates Task/Job
metadataendpoints to reuseget_object()(permission-checked) and then populate relations viaprefetch_related_objects, removing the extra.get(pk=...)query. - Adds page-scoped prefetching and lightweight bulk-loading in list serializers to prevent N+1 queries while keeping list COUNT queries cheaper.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
cvat/apps/engine/views.py |
Reworks viewset query construction and metadata endpoints to reduce redundant DB hits and limit heavy relation loading to where it’s needed. |
cvat/apps/engine/serializers.py |
Adds page-only prefetching/bulk-loading in list serializers to avoid N+1 queries without inflating list COUNT queries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'segment__task__project__annotation_guide', | ||
| ) | ||
| queryset = Job.objects.select_related( | ||
| # prefetch data for permission checks |
There was a problem hiding this comment.
The inline comment says "prefetch data for permission checks", but this queryset only uses select_related on assignee/segment/task/project and does not fetch task.data. Please update the comment to reflect what is actually loaded (or include task.data if that was the intent).
| # prefetch data for permission checks | |
| # load related objects commonly used by permission checks |
| ) | ||
| storages = { | ||
| s.id: s for s in models.Storage.objects.filter(id__in=page_storage_ids) | ||
| } |
There was a problem hiding this comment.
page_storage_ids includes None when a task has no source/target storage, which causes an unnecessary DB query (id__in with NULL). Filter out null IDs and skip the Storage.objects.filter(...) query entirely when the set is empty.
| ) | |
| storages = { | |
| s.id: s for s in models.Storage.objects.filter(id__in=page_storage_ids) | |
| } | |
| if v is not None | |
| ) | |
| storages = ( | |
| { | |
| s.id: s for s in models.Storage.objects.filter(id__in=page_storage_ids) | |
| } | |
| if page_storage_ids else {} | |
| ) |
| def prefetch(): | ||
| data_queryset = ( | ||
| models.Data.objects | ||
| .select_related("validation_layout", "video") | ||
| .prefetch_related( | ||
| Prefetch( | ||
| 'images', | ||
| queryset=( | ||
| models.Image.objects | ||
| .prefetch_related('related_files') | ||
| .order_by('frame') | ||
| ) | ||
| ) | ||
| ) | ||
| ) | ||
|
|
||
| prefetch_related_objects( | ||
| [db_task], | ||
| "segment_set", | ||
| Prefetch("data", queryset=data_queryset) | ||
| ) | ||
|
|
||
| prefetch() | ||
|
|
There was a problem hiding this comment.
The nested prefetch() function is defined and immediately invoked once. Inlining this logic would simplify control flow and make it clearer what queries are being executed for this endpoint.
| def prefetch(): | |
| data_queryset = ( | |
| models.Data.objects | |
| .select_related("validation_layout", "video") | |
| .prefetch_related( | |
| Prefetch( | |
| 'images', | |
| queryset=( | |
| models.Image.objects | |
| .prefetch_related('related_files') | |
| .order_by('frame') | |
| ) | |
| ) | |
| ) | |
| ) | |
| prefetch_related_objects( | |
| [db_task], | |
| "segment_set", | |
| Prefetch("data", queryset=data_queryset) | |
| ) | |
| prefetch() | |
| data_queryset = ( | |
| models.Data.objects | |
| .select_related("validation_layout", "video") | |
| .prefetch_related( | |
| Prefetch( | |
| 'images', | |
| queryset=( | |
| models.Image.objects | |
| .prefetch_related('related_files') | |
| .order_by('frame') | |
| ) | |
| ) | |
| ) | |
| ) | |
| prefetch_related_objects( | |
| [db_task], | |
| "segment_set", | |
| Prefetch("data", queryset=data_queryset) | |
| ) |
| @@ -1967,7 +1981,13 @@ def metadata(self, request: ExtendedRequest, pk: int): | |||
| ) | |||
| ) | |||
| ) | |||
| ).get(pk=pk) | |||
|
|
|||
| prefetch_related_objects( | |||
| [db_job], | |||
| Prefetch("segment__task__data", queryset=data_queryset) | |||
| ) | |||
|
|
|||
| prefetch() | |||
There was a problem hiding this comment.
Same as above: the nested prefetch() helper is only called once. Consider inlining to reduce indirection and keep the metadata endpoint logic easier to follow.

Motivation and context
Improved DB queries in Task and Job endpoints. The main focus was to remove extra requests from the non-CRUD endpoints in these endpoint groups (e.g.
/metadata,/annotations). The CRUD endpoints were also slightly optimized.How has this been tested?
Checklist
developbranchLicense
Feel free to contact the maintainers if that's a concern.