A pytorch implementation of MADDPG (multi-agent deep deterministic policy gradient)
Hello,I use your program and run 2000 episodes,but compare with your reslut , my reward didn't have the same obvious effect. The reward variation tendency didn't rise. I don't change the code, just set max_steps = 100. I don't konw why , did I missing somthing ? ? I run the program on virtual machine , and didn't use GPU , 2000 epiode use approximately 50 hours, it's too slowly, To get the result how much time you spend on training process ? ? 
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be resolved. The issue was opened by xuemei-ye and has received 7 comments.