Double constraints for updating actor?

Hi ShangtongZhang, 

I'm confused about the **adaptive KL divergence** that you used in your code in order to update the actor model (two separate actor and critic model case). In your code, you use both **object clip** and the **adaptive approx-kl**, and if the $\text{approx-kl} \le 1.5 \times \text{target-kl}$, the actor model is updated. After reading the [PPO](https://arxiv.org/abs/1707.06347), I saw that adaptive KL should belong to TRPO instead cause TRPO has a constraint at Equation 4. Since along with the two constraints including both clip and adaptive KL actor finds it very hard to be updated. 

To my viewpoint, I think you are using CLIP and TRPO $L^{KLPEN}$ at the same time, and the $L^{KLPEN}$ should be constructed as
```python
surr = ratio * advantage

if klaffter <= 0.66 * target_kl:
   kl_coef /= 2
elif klafter > 1.5 * target_kl:
   kl_coef *= 2
else:
   print("KL is close enough")

actor_loss = surr - kl_coef * klafter

# Backwarding the actor loss ...
```
![image](https://user-images.githubusercontent.com/92308124/223311863-39d2b188-7dcd-4d49-b4e8-297e834cd0b9.png)

After calculating the KL coefficient $\beta$, it's used for calculating the loss and the gradient in Equation 8

![image](https://user-images.githubusercontent.com/92308124/223312396-123879c1-de95-45af-94d8-6c47a7c8f876.png)

And, only **$L^{KLPEN}$** or **$L^{CLIP}$** is used in training

![image](https://user-images.githubusercontent.com/92308124/223315267-cd1d7c7a-6ba9-4c74-83b3-be1cae0c9060.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double constraints for updating actor? #109

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Double constraints for updating actor? #109

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions