A PyTorch wrapper of parallel exclusive scan in CUDA