Hi, thanks for the great work. I'm trying to run the training code, but when I do pre_train with multiple 4090 gpus, it always gets stuck and no report is shown. But when training with multiple 3090 and single 4090 everything is fine. I strongly suspect that a deadlock occurs in 4090. I narrow the problem down to `deepspeed.initialize` in agent.py. But I don't know how to solve this problem. Any response will be greatly appreciated.
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be resolved. The issue was opened by volcverse and has received 2 comments.