Simulate distributed transformer training runtimes