[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer