Need to replace the deprecated distribute wrapper found here https://github.com/facebookresearch/FAMBench/blob/main/benchmarks/bevt/ootb/run_bevt_train.sh#L52
with torchrun.
See here: https://pytorch.org/docs/stable/distributed.html
The deprecated command leads to a syntax error in BEVT, sending in --local-rank but expects --local_rank