Hi, thank you for your open-source code. How much GPU consumption is required during training? Is it necessary to add additional deepspeed or checkpoint to save training memory consumption?
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be resolved. The issue was opened by zhw-zhang and has received 3 comments.