Hi, Thanks for open-sourcing this great project. I am curious about how to implement a text2video version of SVD. Given an input image and a prompt, how to generate a video? Can I simply replace the `encoder_hidden_states `with the text embedding to finetune SVD? Thanks!
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be resolved. The issue was opened by Len-Li and has received 2 comments.