Hi. Thanks for the great work. Can the author provides some insight on the inference speed of Llada versus an 8B autoregressive LM? It appears Llada is slower in some of my initial tests, which could be due to the fact that it cannot leverage KV-cache since attention mask is non-causal. However, one of the alleged advantage of diffusion-LM is that they are supposed to be faster?
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by jacklishufan and has received 0 comments.