A pytorch implementation of MADDPG (multi-agent deep deterministic policy gradient)
In my understanding, the actor does not have any loss. The chain rule is needed to update the actor, but why there is the loss for the actor in your code? Could you give me the hint why you update your actor in this way?
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by choasLC and has received 0 comments.