I chose Ant as the environment. After finishing the training, I wrote a script to load the latest model and visualized it. I found that the trained agent did not touch the ground, but because the torque was too large, the agent flew up. When I modified the original xml file, the agent stopped flying when performing random actions and maintained its normal behavior. But when the modified agent is used for training, it is found that the algorithm has lost its original effect and the reward value has no upward trend. How should this problem be solved? Looking forward to your reply, thank you~
I chose Ant as the environment. After finishing the training, I wrote a script to load the latest model and visualized it. I found that the trained agent did not touch the ground, but because the torque was too large, the agent flew up. When I modified the original xml file, the agent stopped flying when performing random actions and maintained its normal behavior. But when the modified agent is used for training, it is found that the algorithm has lost its original effect and the reward value has no upward trend. How should this problem be solved? Looking forward to your reply, thank you~