I need an explanation

1. The output of a detection model typically includes coordinates, categories, and confidence scores. I would like to understand how the model passes the image content within the predicted bounding boxes into the encoder via differentiable operations to compute the post-regression (post-reg) loss. In the current implementation, the code performs Non-Maximum Suppression (NMS) followed by direct image cropping based on the predicted coordinates. As far as I know, direct cropping is a non-differentiable operation. How is the gradient backpropagated to the detection head in this case?
2. 

<img width="1068" height="341" alt="Image" src="https://github.com/user-attachments/assets/7dff3dbf-eaef-4c89-ba2a-e7116e236775" />
Could you explain why lreg is removed when calculating the final loss in the YOLOv5 loss.py file? It appears that lreg is not being utilized for optimization. Furthermore, even if it were included in the optimization objective, it seems unable to backpropagate gradients due to the non-differentiable nature of the cropping process mentioned in the first point. What is the underlying rationale for this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I need an explanation #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

I need an explanation #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions