Great paper guys! I have been brain-storming on Diffusion based LLMs. I believe **Dynamic Entropy-Based Token Pruning** for LLaDA would accelerate inference by pruning tokens during reverse diffusion using prediction entropy. Early phase **t∈[0.7,1])**: prune high-entropy tokens (H>2.0) like ambiguous modifiers. Mid-phase **(t∈[0.3, 0.7])**: target moderate entropy (H>1.5) for redundant function words. Late phase (t<0.3): disable pruning to retain syntax-critical tokens. Pruned gaps are filled post-generation by a lightweight ARM infiller trained to reconstruct masked spans using LLaDA’s bidirectional context. I may be wrong but would love to hear from you.
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by MrAnayDongre and has received 1 comments.