A pytorch implementation of MADDPG (multi-agent deep deterministic policy gradient)
看您这边个人介绍是在网易,所以直接中文提问了。有下面三个问题不是很清楚,麻烦您有时间解答一下: 1、为什么critic的网络结构是先把obs观测值升高到1024? 2、直接与原始的action变量进行拼接,而不是通过一次全连接变换? 3、我测试了不升高obs的维度,即尝试((obs->256)+action)->128->64,结果也表现较好,这里有相应解释吗?
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be resolved. The issue was opened by DKuan and has received 2 comments.